Explanation of Vision Transformer with implementation | by Hiroaki Kubo

First, we reshape the picture right into a sequence of flattened 2D patches. The code is as follows. image_size means width and peak of picture. The code is as follows.

image_size = 224
channel_size = 3
picture = Picture.open('pattern.png').resize((image_size, image_size))
X = T.PILToTensor()(picture)  # Form is [channel_size,image_size,image_size]
patch_size = 16
patches = (
X.unfold(0, channel_size, channel_size)
.unfold(1, patch_size, patch_size)
.unfold(2, patch_size, patch_size)
)  # Form is [1,image_size/patch_size,image_size/patch_size,channel_size,patch_size,patch_size]
patches = (
patches.contiguous()
.view(patches.dimension(0), -1, channel_size * patch_size * patch_size)
.float()
)  # Form is [1, Number of patches, channel_size*patch_size*patch_size]

Lastly, we create a matrix during which single patch has channel_size*patch_size*patch_size data.

Subsequent, Transformer makes use of fixed latent vector dimension D via all of its layers, so we map the patches to D dimensions with a trainable linear projection (Eq. 1).

We initilize E as follows.

self.E = nn.Parameter(
torch.randn(patch_size * patch_size * channel_size, embedding_dim)
)

We reshape the patches by calculaduring matrix product.

patch_embeddings = torch.matmul(patches, self.E)

Source link

Explanation of Vision Transformer with implementation | by Hiroaki Kubo | Nov, 2024

Coinbase Is Embarrassing Itself By Not Buying Bitcoin

The Startup Magazine 5 People That Will Benefit From Using a Data Removal Service

The Startup Magazine 5 People That Will Benefit From Using a Data Removal Service

Leave a Reply Cancel reply

POPULAR POSTS

10 Ways To Get a Free DoorDash Gift Card

They Combed the Co-ops of Upper Manhattan With $700,000 to Spend

Saal.AI and Cisco Systems Inc Ink MoU to Explore AI and Big Data Innovations at GITEX Global 2024

Exxon foe Engine No. 1 to build fossil fuel plants with Chevron

They Wanted a House in Chicago for Their Growing Family. Would $650,000 Be Enough?

Categories

Connect With Us

Recent Posts

Bitcoin is Changing: Saylor Targets $21M by 2046

Russia watches on as ally Iran is pummeled