id
type
0 (not classified)
status
21 (imported old-v2, waiting for another import)
review version
0
cleanup version
0
pending deletion
0 (-)
created at
2025-11-03 20:55:05
updated at
2025-11-03 20:55:06
url
http://adamcsanders.com/projects/livejournal-blog-analysis/
url length
59
url crc
46084
url crc32
1587262468
location type
1 (url matches target location, page_location is empty)
canonical status
2 (missing canonical tag in html)
canonical page id
-
domain id
domain tld
2211
domain parts
0
originating warc id
-
originating url
https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-33/segments/1754151280119.22/warc/CC-MAIN-20250810024619-20250810054619-00210.warc.gz
source type
11 (CommonCrawl)
server ip
Publication date
2025-08-10 03:41:39
Fetch attempts
0
Original html size
4144
Normalized and saved size
3530
title
Adam C. Sanders » Projects » Livejournal Blog Analysis
excerpt
content
Livejournal Blog Analysis » « Mar 24, 2008 In the Spring of 2008, I set out to take the pulse of the internet. My goal was to examine the ways in which the internet changed on a regular basis. The object of this research After only a small amount of effort I had created a script that was capable of searching thousands of blogs at a time for an array of about 12 regular expressions each corresponding to a particular type of sentence in which I was interested. The script would run autonomously 4 times each day for an indefinite period of time. I have visited over 150,000 blogs to date and have collected close to 400,000 distinct feelings (among many many other things). It is difficult to determine what exactly to do with such information, but I do plan on doing an extensive visualization at some point. To date the most I've done with this dataset was to simply plot the linkages between individual blogs in GraphViz. The image for this...
author
updated
2025-12-02 17:18:10
block type
0
extracted fields
104
extracted bits
title
full content
content was extracted heuristically
detected location
0
detected language
1 (English)
category id
Pozostałe (16)
index version
2025110801
paywall score
0
spam phrases
0
text nonlatin
0
text cyrillic
0
text characters
858
text words
191
text unique words
121
text lines
1
text sentences
9
text paragraphs
1
text words per sentence
21
text matched phrases
0
text matched dictionaries
0
image author
featured image
/media/upload_files/livejournal_labeled_jpg_350x350_q100.jpg