Page #2817584539 - NetAtlas

Main

id

2817584539

type

0 (not classified)

status

21 (imported old-v2, waiting for another import)

review version

0

cleanup version

0

pending deletion

0 (-)

created at

2025-10-25 06:12:32

updated at

2025-10-25 06:12:33

Address

url

https://opir.columbia.edu/understanding-columbias-common-data-set

url length

65

url crc

15531

url crc32

2579119275

location type

1 (url matches target location, page_location is empty)

canonical status

10 (verified canonical url)

canonical page id

2817584539

Source

domain id

3528414

domain tld

2295

domain parts

0

originating warc id

-

originating url

https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-33/segments/1754151280504.46/warc/CC-MAIN-20250811162052-20250811192052-00515.warc.gz

source type

11 (CommonCrawl)

Server response

server ip

162.159.138.64

Publication date

2025-08-11 17:14:18

Fetch attempts

0

Original html size

110467

Normalized and saved size

92501

Content

title

Understanding Columbia's Common Data Set | Columbia OPIR

excerpt

content

author

updated

1763389761

Text analysis

block type

0

extracted fields

8

extracted bits

title

detected location

0

detected language

1 (English)

category id

Edukacja (47)

index version

2025110801

paywall score

0

spam phrases

0

Text statistics

text nonlatin

0

text cyrillic

0

text characters

32034

text words

5590

text unique words

1220

text lines

247

text sentences

224

text paragraphs

77

text words per sentence

24

text matched phrases

175

text matched dictionaries

6

Link statistics

links self subdomains

0

links other subdomains

0

links other domains

5

links spam adult

0

links spam random

0

links spam expired

0

links ext activities

48

links ext ecommerce

0

links ext finance

0

links ext crypto

0

links ext booking

0

links ext news

0

links ext leaks

0

links ext ugc

0

links ext klim

0

links ext generic

3

Featured image

image author

featured image

Page details