id
type
5 (blog/news article)
status
21 (imported old-v2, waiting for another import)
review version
0
cleanup version
0
pending deletion
0 (-)
created at
2025-10-29 06:25:28
updated at
2025-10-29 06:25:29
url
https://2021.ai/news/covid-19-and-why-public-datasets-are-so-important
url length
70
url crc
49675
url crc32
310821387
location type
1 (url matches target location, page_location is empty)
canonical status
2 (missing canonical tag in html)
canonical page id
-
domain id
domain tld
660
domain parts
0
originating warc id
-
originating url
https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-33/segments/1754151280216.62/warc/CC-MAIN-20250811003938-20250811033938-00954.warc.gz
source type
11 (CommonCrawl)
server ip
Publication date
2025-08-11 01:31:35
Fetch attempts
0
Original html size
38218
Normalized and saved size
35136
title
COVID-19 and why public datasets are so important
excerpt
content
In the wake of the global spread of coronavirus/COVID-19, it is more important than ever to have access to public data. While we are all trying to tackle the outbreak, data and specifically public data is one of the most important means to fight back with technology and ensure trust during these uncertain times. This post aims at highlighting why public data is important to secure transparency, and how open research data and the possibility to collaborate can be crucial for discovering new insights to fight the virus.Accessibility of public data for data science projectsPublic data is meant to be accessible to everyone. However, in many cases, the data that should be available are merely summarized, or available in formats that are difficult to interpret by a machine. Public data is vital for obtaining a deeper understanding and uncovering its underlying secrets, which leads to new machine learning models that can be used to make valuable predictions about the future or try to save hum...
author
updated
1762409354
block type
0
extracted fields
104
extracted bits
title
full content
content was extracted heuristically
detected location
0
detected language
1 (English)
category id
index version
2025110801
paywall score
0
spam phrases
0
text nonlatin
0
text cyrillic
0
text characters
6463
text words
1284
text unique words
515
text lines
1
text sentences
25
text paragraphs
1
text words per sentence
51
text matched phrases
22
text matched dictionaries
3
links self subdomains
0
links other subdomains
4
links other domains
9
links spam adult
0
links spam random
0
links spam expired
0
links ext activities
1
links ext ecommerce
0
links ext finance
0
links ext crypto
0
links ext booking
0
links ext news
3
links ext leaks
0
links ext ugc
2
links ext klim
0
links ext generic
0