#indexing_the_web

Stéphane Bortzmeyer @stephane CC BY-SA 11/03/2020

2

2

“We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone. You Need years of free web page data to help change the world.”
[I didn’t test myself.]
►https://commoncrawl.org
#OpenData #Web_crawling #indexing_the_Web

Stéphane Bortzmeyer @stephane CC BY-SA
- BigGrizzly @biggrizzly CC BY-NC-SA 11/03/2020
  
  200Go (compressés) rien que pour la liste des URL de l’index de février 2020... une paille :-)))
  
  BigGrizzly @biggrizzly CC BY-NC-SA
Écrire un commentaire