id
related bits
0
processing priority
4
site type
3 (personal blog or private political site, e.g. Blogspot, Substack, also small blogs on own domains)
review version
11
html import
20 (imported)
first seen date
2024-08-26 17:27:15
expired found date
-
created at
2024-08-26 17:27:15
updated at
2025-04-22 13:56:48
length
20
crc
7141
tld
2211
nm parts
0
nm random digits
0
nm rare letters
0
is subdomain of id
13642151 (wordpress.com)
previous id
0
replaced with id
0
related id
-
dns primary id
0
dns alternative id
0
lifecycle status
0 (unclassified, or currently active)
deleted subdomains
0
page imported products
0
page imported random
0
page imported parking
0
count skipped due to recent timeouts on the same server IP
0
count content received but rejected due to 11-799
0
count dns errors
0
count cert errors
0
count timeouts
0
count http 429
0
count http 404
0
count http 403
0
count http 5xx
0
next operation date
-
server bits
—
server ip
-
mp import status
20
mp rejected date
-
mp saved date
-
mp size orig
134253
mp size raw text
18436
mp inner links count
0
mp inner links status
1 (no links)
title
snail
description
techie librarian; meatier than a seahorse
site name
snail
author
updated
2026-03-02 00:54:52
raw text
snail | techie librarian; meatier than a seahorse Skip to primary content Skip to secondary content snail techie librarian; meatier than a seahorse Search Main menu who am i flicks books home Post navigation ← Older posts messing with pdfs Posted on 17 July 2024 by snail Reply A key outcome of attending VALA was that it gave me an environment to think about stuff outside my usual boxes. I haven’t ruminated on tech stuff well in a while. A key area I have long wanted to develop is finding better access models for content on harvested web sites . I, via work, started harvesting government websites in 2014, and a key issue then was alternative approaches to collecting digital content eg annual reports. This has remained an itch. I’ve been thinking and experimenting around this topic sporadically, at times very sporadically, for some years. Mostly badly I suspect but it’s been useful for me. For a long time, I thought the key approach was to ...
redirect type
30 (window.location)
block type
0 (no issues)
detected language
1 (English)
category id
index version
1
spam phrases
0
text nonlatin
0
text cyrillic
0
text characters
14174
text words
3233
text unique words
1032
text lines
241
text sentences
145
text paragraphs
41
text words per sentence
22
text matched phrases
0
text matched dictionaries
0
links self subdomains
0
links other subdomains
links other domains
120 - snail.ws, vala.org.au, terrypratchett.com
links spam adult
0
links spam random
0
links spam expired
0
links ext activities
0
links ext ecommerce
0
links ext finance
0
links ext crypto
0
links ext booking
0
links ext news
2
links ext leaks
0
links ext ugc
33 - s0.wp.com, wp.me, s1.wp.com, en.wikipedia.org, flickr.com, netpreserveblog.wordpress.com, wordpress.com
links ext klim
0
links ext generic
1
dol status
0
dol updated
2026-03-02 00:54:52
rss status
32 (unknown)
rss found date
2024-08-28 06:02:04
rss size orig
36115
rss items
10
rss spam phrases
0
rss detected language
1 (English)
inbefore feed id
-
inbefore status
0 (new)
sitemap path
sitemap status
30 (processing completed, results pushed to table crawler_sitemaps.ext_domain_sitemap_lists)
sitemap review version
1
sitemap urls count
662
sitemap urls adult
0
sitemap filtered products
0
sitemap filtered videos
0
sitemap found date
2024-08-28 04:27:43
sitemap process date
2024-08-28 04:27:44
sitemap first import date
-
sitemap last import date
-