id
type
0 (not classified)
status
21 (imported old-v2, waiting for another import)
review version
0
cleanup version
0
pending deletion
0 (-)
created at
2025-10-17 01:45:29
updated at
2025-10-17 01:45:29
pol page id
pol status
0
pol hosts ticketing
pol hosts ecommerce
pol hosts finance
pol hosts crypto
pol hosts leak
pol hosts devel
pol hosts ugc
linkedin.com
pol hosts klim
pol hosts builders
pol hosts self subdomains
pol hosts other subdomains
pol hosts other domains
pol updated
1763205718
url
https://alfa-solutions.it/media/publicazione/2024/publicazione-article-page.html
url length
80
url crc
29193
url crc32
2274652681
location type
1 (url matches target location, page_location is empty)
canonical status
10 (verified canonical url)
canonical page id
domain id
domain tld
380
domain parts
0
originating warc id
-
originating url
https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-33/segments/1754151280951.94/warc/CC-MAIN-20250812141533-20250812171533-00888.warc.gz
source type
11 (CommonCrawl)
server ip
Publication date
2025-08-12 15:41:15
Fetch attempts
0
Original html size
98875
Normalized and saved size
94857
title
Publicazione Article Page
excerpt
content
Lorem ipsum dolor sit amet consectetur. Cursus mauris elit tortor tellus cursus nulla tincidunt. Odio urna eget ullamcorper accumsan eget. Suspendisse scelerisque sed in amet. Sit tortor aenean at et gravida. Egestas sed in pellentesque proin risus. Pretium malesuada aliquam vitae et aliquam proin sapien porttitor ullamcorper. Mauris tellus elit elementum lectus neque. Massa sit porttitor enim dignissim pellentesque nunc vitae diam cursus. Eu pulvinar vel risus rutrum. 
 
 Diam eu proin consequat erat volutpat semper consectetur velit. Fringilla nunc rhoncus egestas massa nulla risus nulla. Sed imperdiet cursus amet purus. Quis proin convallis vel proin vestibulum ac sit montes. Quam lobortis vitae id leo facilisis pharetra et arcu nulla. Etiam vel ut ornare ac integer amet senectus praesent.
 ...
author
updated
1763205718
block type
0
extracted fields
104
extracted bits
title
full content
content was extracted heuristically
detected location
0
detected language
4 (Italian)
category id
Lorem ipsum (237)
index version
2025110801
paywall score
0
spam phrases
0
text nonlatin
0
text cyrillic
0
text characters
2835
text words
524
text unique words
233
text lines
1
text sentences
40
text paragraphs
1
text words per sentence
13
text matched phrases
22
text matched dictionaries
1
links self subdomains
0
links other subdomains
0
links other domains
0
links spam adult
0
links spam random
0
links spam expired
0
links ext activities
0
links ext ecommerce
0
links ext finance
0
links ext crypto
0
links ext booking
0
links ext news
0
links ext leaks
0
links ext ugc
1
links ext klim
0
links ext generic
0