Main

type

0 (not classified)

status

21 (imported old-v2, waiting for another import)

review version

1

cleanup version

0

pending deletion

0 (-)

created at

2025-01-06 14:21:21

updated at

2026-01-22 07:36:14

Address

url

https://ai.meta.com/research/publications/byte-latent-transformer-patches-scale-better-than-tokens/

url length

99

url crc

14881

url crc32

3942332961

location type

1 (url matches target location, page_location is empty)

canonical status

2 (missing canonical tag in html)

canonical page id

-

Source

domain id

8558762

domain tld

2211

domain parts

3

originating warc id

-

originating url

https://ai.meta.com/

source type

21 (Discord)

Server response

server ip

157.240.229.17

Publication date

2025-07-15 09:35:16

Fetch attempts

0

Original html size

179540

Normalized and saved size

60777

Content

title

excerpt

content

LlamaBlogTry Meta AINLPByte Latent Transformer: Patches Scale Better Than TokensDecember 12, 2024AbstractWe introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency and robustness. BLT encodes bytes into dynamically sized patches, which serve as the primary units of computation. Patches are segmented dynamically based on the entropy of the next byte, allocating more compute and model capacity where increased data complexity demands it. We present the first flop controlled scaling study of byte-level models up to 8B parameters with 4T training bytes. Our results demonstrate the feasibility of scaling models trained on raw bytes without a fixed-vocabulary. Both training and inference efficiency improve due to dynamically selecting long patches when data is predictable, along with qualitative improvements on reasoning and long tail gener...

author

updated

1769884461

Text analysis

block type

0

extracted fields

96

extracted bits

full content
content was extracted heuristically

detected location

0

detected language

1 (English)

category id

Zastosowania AI (149)

index version

1

paywall score

0

spam phrases

0

Text statistics

text nonlatin

0

text cyrillic

0

text characters

2689

text words

376

text unique words

271

text lines

1

text sentences

7

text paragraphs

1

text words per sentence

53

text matched phrases

0

text matched dictionaries

0