id
type
0 (not classified)
status
21 (imported old-v2, waiting for another import)
review version
0
cleanup version
0
pending deletion
0 (-)
created at
2025-11-20 03:18:09
updated at
2025-11-20 03:18:10
url
https://ftp.ludd.ltu.se/mirrors/archlinux/?sort=1
url length
49
url crc
32808
url crc32
741441576
location type
1 (url matches target location, page_location is empty)
canonical status
2 (missing canonical tag in html)
canonical page id
-
domain id
domain tld
752
domain parts
0
originating warc id
-
originating url
https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-33/segments/1754151279938.13/warc/CC-MAIN-20250807054540-20250807084540-00898.warc.gz
source type
11 (CommonCrawl)
server ip
Publication date
2025-08-07 06:13:05
Fetch attempts
0
Original html size
5191
Normalized and saved size
4197
title
Index of /mirrors/archlinux/
excerpt
content
NameLast modifiedSize../wsl/2025-08-01 12:18:32-sources/2023-07-02 12:10:06-pool/2023-05-23 10:50:22-other/2024-12-01 18:40:26-multilib-testing-debug/2021-12-08 21:50:01-multilib-testing/2010-09-11 18:28:50-multilib-staging-debug/2021-12-08 21:50:00-multilib-staging/2012-01-14 20:09:19-multilib-debug/2021-12-08 21:50:00-multilib/2010-08-24 20:14:16-latest/2025-04-12 13:29:09-lastupdate2025-08-07 02:51:1211 Blastsync2025-08-07 06:01:0211 Bkde-unstable-debug/2021-12-08 21:49:58-kde-unstable/2009-12-18 18:36:26-iso/2025-08-01 16:12:36-images/2025-08-01 15:00:12-gnome-unstable-debug/2021-12-08 21:49:58-gnome-unstable/2010-02-12 22:35:31-extra-testing-debug/2023-05-19 14:38:18-extra-testing/2023-05-19 14:41:54-extra-staging-debug/2023-05-19 14:38:18-extra-staging/2023-05-19 14:41:55-extra-debug/2021-12-08 21:49:56-extra/2010-09-22 06:59:16-core-testing-debug/2023-05-19 14:38:21-core-testing/2023-05-19 14:48:38-core-staging-debug/2023-05-19 14:38:21-core-staging/2023-05-19 14:48:38-core-debu...
author
updated
1764965506
block type
0
extracted fields
104
extracted bits
title
full content
content was extracted heuristically
detected location
0
detected language
126 (language undetectable (empty document, too short, or engines disagree))
category id
Pozostałe (16)
index version
2025123101
paywall score
0
spam phrases
0
text nonlatin
0
text cyrillic
0
text characters
822
text words
131
text unique words
81
text lines
1
text sentences
1
text paragraphs
1
text words per sentence
131
text matched phrases
0
text matched dictionaries
0
image author
featured image