id
type
5 (blog/news article)
status
20 (imported old-v1, waiting for another import)
review version
0
cleanup version
0
pending deletion
0 (-)
created at
2025-08-23 22:20:40
updated at
2025-08-23 22:20:41
url
https://aleagues.com.au/news/adelaide-ignore-brisbanes-alm-form/
url length
64
url crc
41254
url crc32
3182666022
location type
1 (url matches target location, page_location is empty)
canonical status
10 (verified canonical url)
canonical page id
domain id
domain tld
36
domain parts
0
originating warc id
-
originating url
https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-33/segments/1754151579063.98/warc/CC-MAIN-20250815204238-20250815234238-00980.warc.gz
source type
11 (CommonCrawl)
server ip
Publication date
2025-08-15 21:38:39
Fetch attempts
0
Original html size
248089
Normalized and saved size
103693
block type
0
extracted fields
13
extracted bits
–
detected location
0
detected language
1 (English)
category id
-
index version
1
paywall score
0
spam phrases
0
text nonlatin
1
text cyrillic
0
text characters
4851
text words
958
text unique words
377
text lines
227
text sentences
28
text paragraphs
11
text words per sentence
34
text matched phrases
0
text matched dictionaries
0
links self subdomains
0
links other subdomains
9 - cdn.parsely.com, cdn.optimizely.com, players.brightcove.net, allaccess.keepup.com.au, secure.widget.cloud.opta.net
links other domains
40 - adelaideunited.com.au, aucklandfc.co.nz, brisbaneroar.com.au, canberraunited.com.au, ccmariners.com.au, macarthurfc.com.au, melbournecityfc.com.au, melbournevictory.com.au, newcastlejetsfc.com.au, perthglory.com.au, sydneyfc.com, wellingtonphoenix.com, wswanderersfc.com.au, wufc.com.au, isuzuute.com.au, ninjakitchen.com.au, sharkclean.com.au, bit.ly, etoro.com, mcdonalds.com.au, boost.com.au, mitre-sports.com.au, ihgplc.com, 10play.com.au, paramountplus.com, sky.co.nz
links spam adult
0
links spam random
0
links spam expired
0
links ext activities
0
links ext ecommerce
links ext finance
0
links ext crypto
0
links ext booking
0
links ext news
0
links ext leaks
0
links ext ugc
5 - facebook.com, instagram.com, twitter.com, tiktok.com, youtube.com
links ext klim
0
links ext generic
1
status
0
updated
2025-08-25 14:46:41