Improved ways to operate a rude #crawler marginalia.nu
▻https://www.marginalia.nu/log/a_115_rude_crawler
Tech news is abuzz with rude AI crawlers that forge their user-agent and ignore robots.txt. In my opinion, if this is all the AI startups can muster, they’re losing their touch. wget can do this. You need to up your game, get that crawler really rolling coal. Flagrant disregard for externalities is an important signal to the investors that your AI startup is the one.
In that spirit, here are some advanced tips on how to be a much worse netizen.
En lien avec ▻https://seenthis.net/messages/1104052