id
type
0 (not classified)
status
30 (imported + raw text content deleted)
review version
0
cleanup version
0
pending deletion
0 (-)
created at
2025-10-13 11:55:52
updated at
2025-10-13 11:55:52
url
https://www.bci.cl/viewer/?url=https%3A%2F%2Fbci-cdn.azureedge.net%2Fuploads%2Fe17abed3-a178-4fc9-bc42-02c174e75452%2Foriginal%2FCertificadoSI-0047-2020_IN_2022-10-26.pdf
url length
170
url crc
11569
url crc32
2841062705
location type
1 (url matches target location, page_location is empty)
canonical status
2 (missing canonical tag in html)
canonical page id
-
domain id
domain tld
152
domain parts
0
originating warc id
-
originating url
https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-33/segments/1754151281020.56/warc/CC-MAIN-20250813024931-20250813054931-00505.warc.gz
source type
11 (CommonCrawl)
server ip
Publication date
2025-08-13 04:14:17
Fetch attempts
0
Original html size
59275
Normalized and saved size
14552
title
PDF.js viewer
excerpt
content
Find: Previous Next Highlight all Match case Presentation Mode Open Print Download Go to First Page Go to Last Page Rotate Clockwise Rotate Counterclockwise Enable hand tool Document Properties… Find Previous Next Page: Tools Zoom Out Zoom In Automatic ...
author
updated
1762889778
block type
0
extracted fields
104
extracted bits
title
full content
content was extracted heuristically
detected location
0
detected language
1 (English)
category id
Pozostałe (16)
index version
2025110801
paywall score
0
spam phrases
0
text nonlatin
0
text cyrillic
0
text characters
445
text words
87
text unique words
65
text lines
1
text sentences
1
text paragraphs
1
text words per sentence
87
text matched phrases
0
text matched dictionaries
0
image author
featured image