Main

type

5 (blog/news article)

status

21 (imported old-v2, waiting for another import)

review version

0

cleanup version

0

pending deletion

0 (-)

created at

2025-11-17 20:52:19

updated at

2025-11-17 20:52:21

Address

url

https://gregreda.com/2015/02/15/web-scraping-finding-the-api/

url length

61

url crc

7356

url crc32

1368530108

location type

1 (url matches target location, page_location is empty)

canonical status

2 (missing canonical tag in html)

canonical page id

-

Source

domain id

49335166

domain tld

2211

domain parts

0

originating warc id

-

originating url

https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-33/segments/1754151279968.16/warc/CC-MAIN-20250807151203-20250807181203-00708.warc.gz

source type

11 (CommonCrawl)

Server response

server ip

3.162.112.127

Publication date

2025-08-07 15:59:37

Fetch attempts

0

Original html size

15715

Normalized and saved size

15715

Content

title

Web Scraping 201: finding the API

excerpt

content

This is part of a series of posts I have written about web scraping with Python. Web Scraping 101 with Python, which covers the basics of using Python for web scraping. Web Scraping 201: Finding the API, which covers when sites load data client-side with Javascript. Asynchronous Scraping with Python, showing how to use multithreading to speed things up. Scraping Pages Behind Login Forms, which shows how to log into sites using Python. Update: Sorry folks, it looks like the NBA doesn't make shot log data accessible anymore. The same principles of this post still apply, but the particular example used is no longer functional. I do not intend to rewrite this post. Previously, I explained how to scrape a page where the data is rendered server-side. However, the increasing popularity of Javascript frameworks such as AngularJS coupled with RESTful APIs means that fewer sites are generated server-side and are instead being rendered client-side. In this post, I’ll give a brief overv...

author

Greg Reda

updated

1764202987

Text analysis

block type

0

extracted fields

109

extracted bits

featured image
article author
title
full content
content was extracted heuristically

detected location

0

detected language

1 (English)

category id

230

index version

2025123101

paywall score

0

spam phrases

0

Text statistics

text nonlatin

0

text cyrillic

0

text characters

5658

text words

1248

text unique words

436

text lines

1

text sentences

86

text paragraphs

1

text words per sentence

14

text matched phrases

1

text matched dictionaries

3