Main

type

5 (blog/news article)

status

21 (imported old-v2, waiting for another import)

review version

0

cleanup version

0

pending deletion

0 (-)

created at

2025-09-16 22:58:59

updated at

2026-01-10 23:43:51

Address

url

https://nlp.seas.harvard.edu/2018/04/03/attention.html

url length

54

url crc

48805

url crc32

3458383525

location type

1 (url matches target location, page_location is empty)

canonical status

30 (canonical url is different, page_canonical_page_id points to it)

canonical page id

2747202696

Source

domain id

4464635

domain tld

2295

domain parts

0

originating warc id

-

originating url

https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-33/segments/1754151576670.96/warc/CC-MAIN-20250814162913-20250814192913-00994.warc.gz

source type

11 (CommonCrawl)

Server response

server ip

185.199.108.153

Publication date

2025-07-18 05:23:33

Fetch attempts

0

Original html size

193628

Normalized and saved size

190330

Content

title

The Annotated Transformer

excerpt

content

———————- There is now a new version of this blog post updated for modern PyTorch. ———————- from IPython.display import Image Image(filename='images/aiayn.png') The Transformer from “Attention is All You Need” has been on a lot of people’s minds over the last year. Besides producing major improvements in translation quality, it provides a new architecture for many other NLP tasks. The paper itself is very clearly written, but the conventional wisdom has been that it is quite difficult to implement correctly. In this post I present an “annotated” version of the paper in the form of a line-by-line implementation. I have reordered and deleted some sections from the original paper and added comments throughout. This document itself is a working notebook, and should be a completely usable implementation. In total there are 400 lines of library code which can process 27,000 tokens per second on 4 GPUs. To follow along you will first need to install PyTorch. The complete notebook...

author

updated

1768944998

Text analysis

block type

0

extracted fields

104

extracted bits

title
full content
content was extracted heuristically

detected location

0

detected language

1 (English)

category id

-

index version

1

paywall score

0

spam phrases

0

Text statistics

text nonlatin

0

text cyrillic

0

text characters

33565

text words

7428

text unique words

1339

text lines

1

text sentences

203

text paragraphs

1

text words per sentence

36

text matched phrases

0

text matched dictionaries

0