Specifically, the practice of automated bots scraping content for "learning" and subsequent profit, often without proper attribution or compensation to the original creators, feels fundamentally unfair.
User-agent: GPTbot Disallow: /Google's Gemini AI, with its paid version built upon a foundation of scraped data, is a prime example that highlights this issue. It's this very concern that has driven me to take action and implement changes to my website. I believe content creators should have control over how their work is used.
While I understand the need for data in AI development, the current system often feels exploitative. That's why I've decided to add specific exclusions to my robots.txt file, effectively blocking several AI bots from scraping my content.
My robots.txt file now includes the following directives:
User-agent: ChatGPT-User Disallow: /
User-agent: Google-Extended Disallow: /
User-agent: anthropic-ai Disallow: /
User-agent: Omgilibot Disallow: /
User-agent: Omgili Disallow: /
User-agent: CCBot Disallow: /
User-agent: PerplexityBot Disallow: /
User-agent: PiplBot Disallow: /
User-agent: Claude-Web Disallow: /
User-agent: FacebookBot Disallow: /
User-agent: Applebot-Extended Disallow: /
User-agent: PetalBot Disallow: /
User-agent: uptimerobot Disallow: /
User-agent: viberbot Disallow: /
User-agent: YaK Disallow: /
User-agent: Yandex Disallow: /
User-agent: Amazonbot
Disallow: /
This list targets a range of bots known to be used for data collection. By adding these directives, I'm explicitly stating that I do not authorise these bots to access and scrape my website's content.
I understand that this might not be a perfect solution. New bots will likely emerge, and some existing ones might find ways around these restrictions. However, I believe it's a step in the right direction.
It's about asserting my rights as a content creator and making a conscious decision about how my work is utilised.
This isn't about being anti-AI. I recognise the potential benefits of AI and am fascinated by its development. However, it's crucial that this development happens ethically and respects the rights of content creators.
I hope that by sharing my approach, I can encourage others to consider similar actions and contribute to a broader conversation about responsible AI practices.
We need to find a balance that allows for innovation while protecting the value and ownership of creative work. It's time for a more transparent and equitable approach to data collection in the age of AI.
![]() |
| Blocking The Bots |

Musings on life, local happenings, and the world as seen through my lens. I'm Sean, and this is my little corner of the Internet.
No comments
Post a Comment
Notice:
Comments are moderated and may not appear immediately. Please keep your comments respectful, and relevant to the post. Spam will not be tolerated. My site. My rules.