Main

type

0 (not classified)

status

30 (imported + raw text content deleted)

review version

0

cleanup version

0

pending deletion

0 (-)

created at

2025-10-07 10:58:59

updated at

2025-10-07 10:59:00

Address

url

https://annual.wikimedia.org/2016/fact-7.html

url length

45

url crc

33086

url crc32

353993022

location type

1 (url matches target location, page_location is empty)

canonical status

2 (missing canonical tag in html)

canonical page id

-

Source

domain id

8314142

domain tld

2688

domain parts

0

originating warc id

-

originating url

https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-33/segments/1754151567216.67/warc/CC-MAIN-20250813090531-20250813120531-00603.warc.gz

source type

11 (CommonCrawl)

Server response

server ip

208.80.154.224

Publication date

2025-08-13 11:01:51

Fetch attempts

0

Original html size

26674

Normalized and saved size

25607

Content

title

Most Wikipedia articles are in languages other than English

excerpt

content

Wikipedia’s first edit was in January 2001. In its early days, Wikipedia only existed in English—but that didn’t last for long. By March, there was Japanese, Catalan, and German. By May, eleven more, including Hebrew, Arabic, and Hungarian. At the end of 2001, Wikipedia had 20,000 articles and 18 language versions. Today, most Wikipedia articles are in languages other than English—87% to be exact. From Abkhaz to Zulu, along with 281 more. We strongly believe that language should not be a barrier to good information, and that knowledge should be available to all people in the language of their choice. One effective way people contribute important information to Wikipedia is by translating content between languages. Using tools developed by the Wikimedia Language team, volunteers can quickly build articles by using existing pages in other languages as their guide. By July 2016, volunteer translators worked their way through 100,000 articles. The Medical Transl...

author

updated

1762357999

Text analysis

block type

0

extracted fields

105

extracted bits

featured image
title
full content
content was extracted heuristically

detected location

0

detected language

1 (English)

category id

Pozostałe (16)

index version

2025110801

paywall score

0

spam phrases

0

Text statistics

text nonlatin

0

text cyrillic

0

text characters

1122

text words

212

text unique words

134

text lines

1

text sentences

14

text paragraphs

1

text words per sentence

15

text matched phrases

0

text matched dictionaries

0