id
type
0 (not classified)
status
21 (imported old-v2, waiting for another import)
review version
0
cleanup version
0
pending deletion
0 (-)
created at
2025-11-26 16:03:18
updated at
2025-11-26 16:03:18
url
https://secure.telegraph.co.uk/secure/newsletter/arsenal/?icid=registration_newsletter_inarticlelink_generic_Sam_Dean_PremLeague_NL18
url length
133
url crc
33391
url crc32
1672708719
location type
1 (url matches target location, page_location is empty)
canonical status
2 (missing canonical tag in html)
canonical page id
-
domain id
domain tld
826
domain parts
0
originating warc id
-
originating url
https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-33/segments/1754151279936.52/warc/CC-MAIN-20250807023708-20250807053708-00726.warc.gz
source type
11 (CommonCrawl)
server ip
Publication date
2025-08-07 03:21:29
Fetch attempts
0
Original html size
6897
Normalized and saved size
747
title
Access Restricted
excerpt
content
TELEGRAPH MEDIA GROUP HOLDINGS Ltd Access Restricted Thank you for your interest. Unauthorised access is prohibited. To access this content, you must have prior permission and a valid contract. Please contact our team at syndication@telegraph.co.uk to discuss licensing options.
author
updated
2025-12-30 23:41:13
block type
0
extracted fields
104
extracted bits
title
full content
content was extracted heuristically
detected location
0
detected language
1 (English)
category id
Other [en] (231)
index version
2025123101
paywall score
0
spam phrases
0
text nonlatin
0
text cyrillic
0
text characters
232
text words
42
text unique words
37
text lines
1
text sentences
4
text paragraphs
1
text words per sentence
10
text matched phrases
0
text matched dictionaries
0
image author
featured image