Main

type

0 (not classified)

status

21 (imported old-v2, waiting for another import)

review version

0

cleanup version

0

pending deletion

0 (-)

created at

2025-10-08 21:14:24

updated at

2025-10-08 21:14:25

Address

url

https://blog.tasuki.org/on-urls/

url length

32

url crc

4441

url crc32

1364398425

location type

1 (url matches target location, page_location is empty)

canonical status

2 (missing canonical tag in html)

canonical page id

-

Source

domain id

43450448

domain tld

2688

domain parts

0

originating warc id

-

originating url

https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-33/segments/1754151281028.48/warc/CC-MAIN-20250813055759-20250813085759-00997.warc.gz

source type

11 (CommonCrawl)

Server response

server ip

188.166.140.8

Publication date

2025-08-13 06:00:17

Fetch attempts

0

Original html size

2888

Normalized and saved size

2888

Content

title

On URLs — tasuki’s blog

excerpt

content

People don’t pay any attention to URLs. Recently, a highly intelligent friend of mine inadverently posted the following link on Facebook: https://www.givingwhatwecan.org/get-involved/how-rich-am-i/?country=NLD&income=56000&adults=1&children=0 URLs are the building blocks of the world wide web. When a website changes its URLs, you get those nasty 404 errors. It would be good if website maintainers paid a little more attention to that. As a website user, when you’re sharing a URL, look at it briefly. Does it actually contain the information you want it to contain? Can other people view this URL? Pro tip: use incognito mode to find out. Does it include superfluous information you were not intending to share? E.g. when searching google, you can easily end up with URLs such as https://www.google.com/search?q=previous+search#q=current+search Keep URLs simple & working. Look at them briefly when sharing. It’s no rocket science.

author

Jan Hermann

updated

2025-11-04 20:22:27

Text analysis

block type

0

extracted fields

108

extracted bits

article author
title
full content
content was extracted heuristically

detected location

0

detected language

1 (English)

category id

Pozostałe (16)

index version

2025103102

paywall score

0

spam phrases

0

Text statistics

text nonlatin

0

text cyrillic

0

text characters

742

text words

159

text unique words

113

text lines

1

text sentences

13

text paragraphs

1

text words per sentence

12

text matched phrases

0

text matched dictionaries

0