Main

type

0 (not classified)

status

21 (imported old-v2, waiting for another import)

review version

0

cleanup version

0

pending deletion

0 (-)

created at

2025-11-03 20:55:05

updated at

2025-11-03 20:55:06

Address

url

http://adamcsanders.com/projects/livejournal-blog-analysis/

url length

59

url crc

46084

url crc32

1587262468

location type

1 (url matches target location, page_location is empty)

canonical status

2 (missing canonical tag in html)

canonical page id

-

Source

domain id

92876155

domain tld

2211

domain parts

0

originating warc id

-

originating url

https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-33/segments/1754151280119.22/warc/CC-MAIN-20250810024619-20250810054619-00210.warc.gz

source type

11 (CommonCrawl)

Server response

server ip

52.217.66.99

Publication date

2025-08-10 03:41:39

Fetch attempts

0

Original html size

4144

Normalized and saved size

3530

Content

title

Adam C. Sanders  »  Projects  »  Livejournal Blog Analysis

excerpt

content

Livejournal Blog Analysis » « Mar 24, 2008 In the Spring of 2008, I set out to take the pulse of the internet. My goal was to examine the ways in which the internet changed on a regular basis. The object of this research After only a small amount of effort I had created a script that was capable of searching thousands of blogs at a time for an array of about 12 regular expressions each corresponding to a particular type of sentence in which I was interested. The script would run autonomously 4 times each day for an indefinite period of time. I have visited over 150,000 blogs to date and have collected close to 400,000 distinct feelings (among many many other things). It is difficult to determine what exactly to do with such information, but I do plan on doing an extensive visualization at some point. To date the most I've done with this dataset was to simply plot the linkages between individual blogs in GraphViz. The image for this...

author

updated

2025-12-02 17:18:10

Text analysis

block type

0

extracted fields

104

extracted bits

title
full content
content was extracted heuristically

detected location

0

detected language

1 (English)

category id

Pozostałe (16)

index version

2025110801

paywall score

0

spam phrases

0

Text statistics

text nonlatin

0

text cyrillic

0

text characters

858

text words

191

text unique words

121

text lines

1

text sentences

9

text paragraphs

1

text words per sentence

21

text matched phrases

0

text matched dictionaries

0