Main

type

5 (blog/news article)

status

21 (imported old-v2, waiting for another import)

review version

0

cleanup version

0

pending deletion

0 (-)

created at

2025-11-10 20:41:47

updated at

2025-11-10 20:41:47

Address

url

https://www.botify.com/blog/google-is-blind

url length

43

url crc

39980

url crc32

883268652

location type

1 (url matches target location, page_location is empty)

canonical status

10 (verified canonical url)

canonical page id

2950601138

Source

domain id

40468202

domain tld

2211

domain parts

2

originating warc id

-

originating url

https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-33/segments/1754151280040.84/warc/CC-MAIN-20250808192612-20250808222612-00969.warc.gz

source type

11 (CommonCrawl)

Server response

server ip

52.85.132.88

Publication date

2025-08-08 19:38:17

Fetch attempts

0

Original html size

67994

Normalized and saved size

45279

Content

title

Holy Crap, Google Is Blind! | Botify

excerpt

content

December 12, 2012Annabelle BouardDirector of Education & Training Services Hey there Botify community! Today's post is a little provocative: is Google blind?Let us first reflect on our findings from the first post: we saw that the Botify crawler allowed us to understand how the structure of a site was built. The distribution of pages by category and by depth were the following:This first bit of information allows us to understand how the pages were shared but could not deduce the number of pages actually crawled by Google.We had a feeling that Google certainly does not crawl all pages and we questioned the efficacy of the Top Products pages. We analysed server logs recovered from the website studied. The objective was to determine the passage of the Google bot and to compare them to pages found by the Botify crawler in the structure of the site. (30 days of logs has been used).Only 38% of pages present in the structure are crawled by Google in a 30 day time frame.The illustration...

author

updated

1764904054

Text analysis

block type

0

extracted fields

105

extracted bits

featured image
title
full content
content was extracted heuristically

detected location

0

detected language

1 (English)

category id

Edukacja (47)

index version

2025110801

paywall score

0

spam phrases

0

Text statistics

text nonlatin

0

text cyrillic

0

text characters

2387

text words

544

text unique words

227

text lines

1

text sentences

14

text paragraphs

1

text words per sentence

38

text matched phrases

1

text matched dictionaries

1