Main

type

0 (not classified)

status

21 (imported old-v2, waiting for another import)

review version

0

cleanup version

0

pending deletion

0 (-)

created at

2025-10-08 12:31:22

updated at

2025-10-08 12:31:22

Address

url

https://annual.wikimedia.org/2016/fact-4.html

url length

45

url crc

62352

url crc32

2475553680

location type

1 (url matches target location, page_location is empty)

canonical status

2 (missing canonical tag in html)

canonical page id

-

Source

domain id

8314142

domain tld

2688

domain parts

0

originating warc id

-

originating url

https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-33/segments/1754151567216.67/warc/CC-MAIN-20250813090531-20250813120531-00150.warc.gz

source type

11 (CommonCrawl)

Server response

server ip

208.80.154.224

Publication date

2025-08-13 10:59:31

Fetch attempts

0

Original html size

26710

Normalized and saved size

25643

Content

title

There are more than 1,650 languages spoken in India

excerpt

content

Of the ten most-spoken languages in the world, three are Indic. India also has the second-highest number of English speakers worldwide—an estimated two times as many as the United Kingdom. In 2001, the national census of India documented 122 major languages spoken across the nation with a further 1,599 other language groups. Out of respect for this diversity, the Constitution of India lists no national language, and many Indians feel that the act of translating among their languages is part of the nation’s cultural heritage. Languages are not just words on pages. Languages hold and form meaning. Wikipedia exists in nearly 300 languages today, with the possibility of infinitely more in the future. Wikimedians in India currently work across 23 languages. From Hindi to Odia, Punjabi to Bengali, Nepalese to Tamil, these Indic-language speakers organize to create and curate free knowledge resources that may not exist anywhere else on the internet—or in the physi...

author

updated

1762260119

Text analysis

block type

0

extracted fields

105

extracted bits

featured image
title
full content
content was extracted heuristically

detected location

0

detected language

1 (English)

category id

Pozostałe (16)

index version

2025103102

paywall score

0

spam phrases

0

Text statistics

text nonlatin

0

text cyrillic

0

text characters

1291

text words

247

text unique words

150

text lines

1

text sentences

16

text paragraphs

1

text words per sentence

15

text matched phrases

0

text matched dictionaries

0