id
type
0 (not classified)
status
30 (imported + raw text content deleted)
review version
0
cleanup version
0
pending deletion
0 (-)
created at
2025-11-13 04:27:04
updated at
2025-11-13 04:27:05
url
http://nypolicedispatch.com/cuba-ny-police-officers-recognized-for-life-saving-actions/
url length
87
url crc
12179
url crc32
991702931
location type
1 (url matches target location, page_location is empty)
canonical status
10 (verified canonical url)
canonical page id
domain id
domain tld
2211
domain parts
0
originating warc id
-
originating url
https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-33/segments/1754151280022.18/warc/CC-MAIN-20250808100457-20250808130457-00760.warc.gz
source type
11 (CommonCrawl)
server ip
Publication date
2025-08-08 11:12:55
Fetch attempts
0
Original html size
579987
Normalized and saved size
565216
title
excerpt
content
author
updated
1765054244
block type
0
extracted fields
0
extracted bits
–
detected location
0
detected language
1 (English)
category id
224
index version
2025123101
paywall score
0
spam phrases
29
text nonlatin
738
text cyrillic
0
text characters
65535
text words
24824
text unique words
1406
text lines
4135
text sentences
224
text paragraphs
255
text words per sentence
110
text matched phrases
38
text matched dictionaries
18
links self subdomains
0
links other subdomains
148
links other domains
255
links spam adult
17
links spam random
32
links spam expired
0
links ext activities
10
links ext ecommerce
4
links ext finance
1
links ext crypto
0
links ext booking
0
links ext news
0
links ext leaks
1
links ext ugc
33
links ext klim
0
links ext generic
8
image author
featured image