id
type
0 (not classified)
status
30 (imported + raw text content deleted)
review version
0
cleanup version
0
pending deletion
0 (-)
created at
2025-10-07 05:37:37
updated at
2025-10-07 05:37:37
url
https://annual.wikimedia.org/2016/fact-1.html
url length
45
url crc
25123
url crc32
3275776547
location type
1 (url matches target location, page_location is empty)
canonical status
2 (missing canonical tag in html)
canonical page id
-
domain id
domain tld
2688
domain parts
0
originating warc id
-
originating url
https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-33/segments/1754151567216.67/warc/CC-MAIN-20250813090531-20250813120531-00697.warc.gz
source type
11 (CommonCrawl)
server ip
Publication date
2025-08-13 10:57:24
Fetch attempts
0
Original html size
27335
Normalized and saved size
26268
title
Half of refugees are school-age
excerpt
content
Half of refugees are school-age. That means 10 million children are away from their homes, their communities, and their traditional education. Each refugee child’s experience is unique, but every single one loses time from their important learning years. Many of them face the added pressure of being surrounded by new languages and cultures. And these aren’t the only children lacking high-quality educational resources around the world. “I believe education is crucial for a culture of freedom and success, and I think using Wikipedia is a great opportunity to create innovation in this area.” —Wikimedian Roxana Sordo Wikimedia’s vision is that every person should have access to all knowledge. Wikipedia, Wikibooks, and the rest of the Wikimedia projects are built to provide access to information for as many people as possible, whenever they need it. The Wikipedia community includes many people dedicated to expanding education and sharing knowledge. This y...
author
updated
1763374004
block type
0
extracted fields
105
extracted bits
featured image
title
full content
content was extracted heuristically
detected location
0
detected language
1 (English)
category id
index version
2025110801
paywall score
0
spam phrases
0
text nonlatin
0
text cyrillic
0
text characters
1518
text words
285
text unique words
178
text lines
1
text sentences
14
text paragraphs
1
text words per sentence
20
text matched phrases
9
text matched dictionaries
1
links self subdomains
0
links other subdomains
0
links other domains
1
links spam adult
0
links spam random
0
links spam expired
0
links ext activities
1
links ext ecommerce
0
links ext finance
0
links ext crypto
0
links ext booking
0
links ext news
0
links ext leaks
0
links ext ugc
7
links ext klim
0
links ext generic
0
image author