digitalpebble.blogspot.com

Main

id

314005938

name

digitalpebble.blogspot.com · homepage snapshot

related bits

0

processing priority

4

site type

3 (personal blog or private political site, e.g. Blogspot, Substack, also small blogs on own domains)

review version

11

html import

20 (imported)

Events

first seen date

2024-10-17 13:13:01

expired found date

-

created at

2024-10-17 13:13:00

updated at

2025-12-25 08:53:17

Domain name statistics

length

26

crc

64119

tld

2211

nm parts

0

nm random digits

0

nm rare letters

0

Connections

is subdomain of id

69893241 (blogspot.com)

previous id

0

replaced with id

0

related id

-

dns primary id

0

dns alternative id

0

lifecycle status

0 (unclassified, or currently active)

Subdomains and pages

deleted subdomains

0

page imported products

0

page imported random

0

page imported parking

0

Error counters

count skipped due to recent timeouts on the same server IP

0

count content received but rejected due to 11-799

0

count dns errors

0

count cert errors

0

count timeouts

0

count http 429

0

count http 404

0

count http 403

0

count http 5xx

0

next operation date

-

Server

server bits

—

server ip

-

Mainpage statistics

mp import status

20

mp rejected date

-

mp saved date

-

mp size orig

161124

mp size raw text

24254

mp inner links count

57

mp inner links status

10 (links queued, awaiting import)

Open Graph

title

DigitalPebble's Blog

description

DigitalPebble Ltd is a consulting company specialised in linguistic engineering, document management, information retrieval and extraction. Our expertise is based on open source solutions, such as Luc

image

site name

author

updated

2026-03-10 05:17:10

raw text

DigitalPebble's Blog Thursday 23 November 2023 Meet the StormCrawler users: Q&A with the OpenWebSearch.eu project It has been a while since our first “ Meet the StormCrawler users” blog and since StormCrawler is still going strong and used by a wide variety of users, we are delighted to put the spotlight on one of the most exciting projects that uses it. Our guests today are Michael Dinzinger and Saber Zerhoudi, both from the University of Passau in Germany. Can you please introduce yourselves and the project you are working on? Hello, we are Saber and Michael, both PhD students in Passau. Since September 2022, we have been working on OpenWebSearch.eu , a European research project, in which people from now more than 15 participating institutes collaborate on building an Open Web Index . Our task here at Uni Passau is the collaborative and resource-efficient crawling, which is the first technical step in building the Index (see figure below). The end res...

Text analysis

redirect type

0 (-)

block type

0 (no issues)

detected language

1 (English)

category id

Java (133)

index version

1

spam phrases

0

Text statistics

text nonlatin

0

text cyrillic

0

text characters

18781

text words

3566

text unique words

1122

text lines

418

text sentences

171

text paragraphs

57

text words per sentence

20

text matched phrases

0

text matched dictionaries

0

Link statistics

links self subdomains

0

links other subdomains

2 - nutch.apache.org, tika.apache.org

links other domains

19 - digitalpebble.com, openwebsearch.eu, uni-passau.de, stormcrawler.net, urlfrontier.net, opensearch.net, commoncrawl.org, suma-ev.social, jmir.org, nlnet.nl, ngi.eu

links spam adult

0

links spam random

0

links spam expired

0

links ext activities

0

links ext ecommerce

0

links ext finance

0

links ext crypto

0

links ext booking

0

links ext news

0

links ext leaks

0

links ext ugc

73 - blogger.com, linkedin.com, twitter.com, t.co, en.wikipedia.org

links ext klim

0

links ext generic

0

dol status

0

dol updated

2026-03-10 05:17:10

RSS

rss path

https://digitalpebble.blogspot.com/feeds/posts/default

rss status

32 (unknown)

rss found date

2024-10-17 13:13:03

rss size orig

522712

rss items

25

rss spam phrases

0

rss detected language

1 (English)

inbefore feed id

-

inbefore status

0 (new)

Sitemap

sitemap path

https://digitalpebble.blogspot.com/sitemap.xml

sitemap status

40 (completed successful import of reports.txt file to table in_pages)

sitemap review version

2

sitemap urls count

73

sitemap urls adult

0

sitemap filtered products

0

sitemap filtered videos

1

sitemap found date

2024-10-17 13:13:02

sitemap process date

2024-10-17 13:13:10

sitemap first import date

-

sitemap last import date

2025-12-25 08:53:17