Weekly Single Link #3

This week's Single Link is about AI.

Specifically, the web crawler bots employed by AI companies & startups to siphon up content from the internet. I've been reading quite a few posts (blogs and social media) about people observing those web crawlers either scraping entire websites while thoroughly ignoring robots.txt files, or even DDoSing forums and self-hosted services.

This is what was happening with Xe Iaso's self-hosted Gitea instance. The solution they came up with is Block AI scrapers with Anubis. This is the week's link !!

Anubis is a middleware that forces any browser or bots to solve a puzzle with JavaScript. It's pretty neat, and it has a cool illustration that you'll see if you visit the test page.

It is the first of its kind. Other methods always seem to be a poison attack, akin to tarpits that serve bad content. Anubis is maybe what I would call a naïve captcha ? Much less problematic version of Cloudflares in terms of usability, but probably more costly for the hoster.

PS; I know this is not just one link. But I need to set up the context. It's fine. The objective is to reduce flooding readers with links. I'm mostly keeping to that rule.

↑ Back to the top