id
type
0 (not classified)
status
21 (imported old-v2, waiting for another import)
review version
1
cleanup version
0
pending deletion
0 (-)
created at
2025-01-06 14:21:21
updated at
2026-01-22 07:36:14
url
https://ai.meta.com/research/publications/byte-latent-transformer-patches-scale-better-than-tokens/
url length
99
url crc
14881
url crc32
3942332961
location type
1 (url matches target location, page_location is empty)
canonical status
2 (missing canonical tag in html)
canonical page id
-
domain id
domain tld
2211
domain parts
3
originating warc id
-
originating url
https://ai.meta.com/
source type
21 (Discord)
server ip
Publication date
2025-07-15 09:35:16
Fetch attempts
0
Original html size
179540
Normalized and saved size
60777
title
excerpt
content
LlamaBlogTry Meta AINLPByte Latent Transformer: Patches Scale Better Than TokensDecember 12, 2024AbstractWe introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency and robustness. BLT encodes bytes into dynamically sized patches, which serve as the primary units of computation. Patches are segmented dynamically based on the entropy of the next byte, allocating more compute and model capacity where increased data complexity demands it. We present the first flop controlled scaling study of byte-level models up to 8B parameters with 4T training bytes. Our results demonstrate the feasibility of scaling models trained on raw bytes without a fixed-vocabulary. Both training and inference efficiency improve due to dynamically selecting long patches when data is predictable, along with qualitative improvements on reasoning and long tail gener...
author
updated
1769884461
block type
0
extracted fields
96
extracted bits
full content
content was extracted heuristically
detected location
0
detected language
1 (English)
category id
index version
1
paywall score
0
spam phrases
0
text nonlatin
0
text cyrillic
0
text characters
2689
text words
376
text unique words
271
text lines
1
text sentences
7
text paragraphs
1
text words per sentence
53
text matched phrases
0
text matched dictionaries
0
links self subdomains
2
links other subdomains
0
links other domains
6
links spam adult
0
links spam random
0
links spam expired
0
links ext activities
0
links ext ecommerce
0
links ext finance
0
links ext crypto
0
links ext booking
0
links ext news
0
links ext leaks
0
links ext ugc
55
links ext klim
0
links ext generic
0
image author
featured image