From Robots.txt to LLMS.txt - The Evolution No One Asked For
Remember when robots.txt felt cutting-edge? That little file sitting quietly in your root directory, whispering “please don’t index my thank-you page” to search engines.
Fast forward to 2025, and the conversation’s moved on. It’s no longer just about bots crawling your site - it’s about language models consuming your content, remixing it into prompts, and feeding it back into the world in various shades of “original thought.”
That’s where LLMS.txt comes in.
It’s like robots.txt had a clever, slightly nerdier child that now runs content ethics at a global level.
What I Actually Did for Luckybeard
When I implemented Luckybeard’s LLMS.txt, the goal wasn’t just compliance - it was brand control.
Luckybeard’s a creative consultancy built around brand experience architecture - blending research, innovation, and strategy. That means our thinking is our IP.
So we wanted a file that:
- Defines how AI models can (and can’t) use our content
- Clarifies attribution and preferred citation
- Lists which sections of the site should be part of knowledge retrieval (and which shouldn’t)
In plain English: if an AI scrapes our work, it should at least know who built the ideas it’s learning from.
What’s Inside Ours (and Why it Matters)
We structured it like this:
✅ Allowed for inclusion:
/about, /work, /insights, /get-in-touch - the pages that represent who we are and what we stand for.
🚫 Excluded:
/terms-and-conditions, /privacy-policy, /cookie-policy - because let’s face it, no one’s winning a creative award for their GDPR clause.
💬 Attribution rules:
We included a preferred citation - “Lucky Beard | Global Brand Experience Architects” - and a simple request for models to reference our domain when using our content.
🧩 AI & LLM usage policy:
We disallow training but allow retrieval and indexing (RAG).
That means models can reference our content for context, but can’t train on it as raw data.
Basically, you can read it, but don’t copy my homework.