The last time a new technology captured this much attention was when the iPhone launched in 2007. People camped on sidewalks and entire industries formed around the Apple ecosystem.
AI is having its iPhone moment now.
Most teams already use LLMs in some form, and if you work in marketing or SEO, you’re probably feeling the pressure to use more AI in your workflow.
But behind the productivity boost sits a darker reality: hallucinated confidence, bots that ignore robots.txt, and a growing surveillance and accountability gap.
In this AMA, Jamie Indigo breaks down the dangers of AI no one is talking about, plus practical ways to protect your brand visibility.
Breaking down misconceptions
1. What is the biggest gap between how SEOs perceive AI and the way it works behind the scenes?
This is the right place to start because we need to step back and challenge our assumptions. There are three major gaps in how SEOs think about AI search.
We assume:
- It's a search engine
- That does exactly what we ask
- And uses traditional search mechanics
The truth is, it doesn’t.
Search engines are information retrieval systems. LLMs, on the other hand, are trained models built on a corpus of data, sometimes layered with retrieval-augmented generation.
It’s a completely different foundation.
Think of a Furby from the 90s. It came preloaded with a small vocabulary and learned patterns based on repetition. If you kept repeating a phrase, it would echo back in strange ways.
Image source: The Toy Shop
Large language models work on similar principles. They rely on parameters and pattern recognition, not understanding.
We also tend to trust them too much. You might say, “Go to this page and complete this task,” and instead of performing the action, the model generates something that looks helpful.
But it hallucinates a lot and fills in gaps with probabilistic guesses because that’s what it’s designed to do.
Apple’s AI research paper, The Illusion of Thinking, outlines several key weaknesses:
- Accuracy collapses as task complexity increases
- Effort decreases as difficulty rises because tokenization costs money
- Instructions aren’t followed consistently
- Reasoning becomes unreliable
Dan Petrovic’s AI rankings volatility tracker shows just how unstable this environment is. In many cases, eight out of ten results shift daily, with volatility hovering around 80% or more.
These systems are built on search technologies, and while some of the underlying components overlap, the mechanics differ.
2. We’re entering an era of black box optimization. Why did you call it tech mad cow disease?
Mad cow disease was a prion disease in the 90s caused by cattle consuming feed made from other cattle. It was a system feeding on itself, and that’s what I see happening with AI.
Marketers are consuming AI output and turning it into strategy, often without understanding how the models work. They publish the content at scale, which the model crawls, ingests, and trains on.
Basically, we are feeding the machine its own byproducts, just like mad cow disease.
The scale makes this worse. The leap in published content in 2024 was roughly equivalent to the combined growth we saw from 2010 to 2018.
We are flooding the web with machine-generated content, and those same machines are training on it. That’s how you end up with systems falling for white text on white backgrounds and other tactics to game visibility.
3. What are the hidden risks of using LLMs to create content?
LLMs rearrange existing language patterns, which means they can’t create original content.
When your content comes from the same statistical pool as everyone else using the same model, it becomes interchangeable. And if your site is filled with interchangeable content, why would a search engine invest resources in crawling and indexing it?
Google’s Martin Splitt has warned about scaled AI content and site reputation abuse. In a past webinar, he explained that quality detection happens in multiple stages. If Google can determine that content is low quality early in the pipeline, it may skip rendering altogether, meaning your content won’t be indexed or ranked.
Once you fall into that sea of sameness, climbing out becomes harder. You’ll need users talking, engaging, and linking to you to get out of it.
Why take the chance when you can avoid it entirely?
Crawling, indexing, and technical blind spots
4. LLMs are ignoring robots.txt. What does it mean when crawlers no longer honor that covenant?
Robots.txt was never an official enforcement mechanism. It was a mutual agreement based on a simple understanding: you can crawl my site, but you must follow these rules. Here’s what you’re allowed to access, and here’s what you can’t.
Sadly, we’ve seen AI crawlers bypassing these restrictions. For example, Cloudflare documented cases in which Perplexity appeared to rotate user agents to circumvent blocks.
Source: Cloudflare
You also have to account for indirect ingestion. Many models rely heavily on Common Crawl, originally built for academic use. If you block AI-specific user agents while leaving Common Crawl open, your content may still end up in training datasets.
However, the real issue is consent. Robots.txt was built on good faith, but AI crawlers operate in a competitive environment where incentives don’t always align with publisher interests.
5. How can log files help identify LLM crawlers or suspicious access patterns?
Log files show you what’s actually happening, not what dashboards assume is happening. They are typically available at your server or CDN level. You’ll need a tool to read them properly. Screaming Frog offers a log file analyzer, and many enterprise CDNs, like Akamai, include built-in options.
Log files allow you to:
- Identify which bots are hitting your site
- See what resources they request
- Detect unusual spikes or patterns
- Adjust rules proactively
AI bots can be aggressive and resource-heavy. They will crawl staging environments, internal resources, and publicly exposed development areas. Security by obscurity does not work. If it’s accessible on the public internet, assume they’ll find it.
However, most AI crawlers use dedicated user agents, which makes them identifiable. They include:
- Training crawlers collecting data to expand the model corpora
- User-initiated crawls triggered by real-time queries in tools using retrieval-augmented generation
The behaviors look different in logs, and you need to know which one you’re seeing.
Log analysis also reveals capability mismatches. For example, if a crawler like Perplexity does not render JavaScript, it has no reason to request JavaScript or CSS files. Hence, it’s a red flag if it’s accessing restricted paths.
You cannot rely on them to behave, which requires you to curate what they’re allowed to access and enforce those boundaries yourself.
6. For SEOs who want to protect their websites from unwanted AI crawlers, what are three things they can do today?
Start with your log files:
Break requests down by hostname, file type, and URL structure. Identify what’s being accessed and where boundaries need to be enforced.
Use the tools you already have:
Robots.txt still controls crawling behavior, even if it doesn’t guarantee compliance. Use robots.txt to block crawlers, scripts, API endpoints, staging areas, and other sensitive files. Combine with directives like noindex and indexifembedded for non-HTML assets.
The last thing you want is an AI tool citing your brand and linking users to a raw JSON API dump. It doesn’t serve your audience or your business.
Add friction where necessary:
Require authentication for sensitive areas. Use firewall controls from providers like Cloudflare or Akamai. Tools like the open-source AI firewall Anubis can help filter bot interactions and limit automated abuse.
If you want to get creative, remember that many AI crawlers don’t render JavaScript. Content generated client-side may not be visible to them at all.
Technical optimization for LLMs
7. How can you “see your site the way an LLM sees it”? What tools or tactics help uncover blind spots?
Use Google Chrome:
If you want to emulate the “revolutionary” experience of an AI crawler, start in Chrome. Go to Privacy and security and click Site settings. Scroll down to the content and click JavaScript.
Block JavaScript and then reload the page.
Now you’re looking at your site without client-side rendering.
Use Screaming Frog:
You can take this further with tools like Screaming Frog by running two separate crawls:
- One rendering JavaScript like Googlebot
- One without rendering, similar to many LLM crawlers
Then compare the rendered content and links to understand changes.
Ask LLMs:
Another useful tactic is to ask LLMs directly what they think your page is about. If you’ve lost contextual content due to rendering gaps, the model’s interpretation may narrow or skew.
There are also emerging tools, like Agentic Evals, that evaluate pages from the model’s perspective to measure conceptual completeness.
Source: Anthropic.com
Think of it like this: if two people are discussing Star Wars, you expect natural references to Wookiees, Sith, Ewoks, and Darth Vader. If someone suddenly brings up Spock, you know the conversation has drifted.
The same principle applies here. A well-structured page should naturally reflect the core entities and relationships within its topic. If those signals are missing, the model may misinterpret the page entirely.
8. Once you’ve identified a misinterpretation, what’s your process for fixing it?
At a high level, start with access. If the misinterpretation is due to key content generated via JavaScript that isn’t visible to certain crawlers, that’s an engineering problem, and you’ll need dev support to make the content accessible.
Your hero content should be available to both traditional search crawlers and AI crawlers. If the primary value proposition disappears without rendering, the model will build an incomplete understanding.
The solution might involve:
- Server-side rendering
- Dynamic rendering strategies
- Alternative delivery methods for critical content
However, the right approach depends on resources, infrastructure, and internal priorities.
9. What would a defensive SEO strategy look like for AI search? What should we be tracking, shielding, or rewriting?
A defensive strategy starts with understanding how your brand exists across different dimensions.
Myriam Jessier has done excellent work applying the Johari Window framework to brand control in AI search.
The model breaks visibility into four quadrants:
- Open areas known to your brand and customers
- Hidden areas you haven’t communicated to your audience
- Blind spots you’ve missed about how customers perceive your brand
- What is unknown to both
Each requires a different response:
Open areas: strengthen entity confidence
This is your core brand identity so, you need to reinforce entity recognition. Gus Pelogia has a guide to building an Entity Tracker that measures how strongly your brand is associated with specific topics. If confidence drops below certain thresholds, you risk exclusion from knowledge graphs.
Use the same terminology repeatedly to improve consistency across board and enforce semantic precision. LLMs are pattern learners. If you describe yourself five different ways, they will reflect that inconsistency.
Hidden areas: protect internal assets
This includes staging environments, internal documentation, private tools, and sensitive resources.
Aggressively restrict access to prevent AI training crawlers from accessing these pages. Use authentication, firewall controls, and proper blocking mechanisms. Data leakage becomes part of the training corpus once it’s scraped.
Blind spots: monitor external narratives
This is where reviews, social media, forums, and third-party commentary live. LLMs train on these associations, and the adjectives used in reviews attach themselves to your brand. Hence, sentiment signals become part of the probabilistic profile.
Implement social listening, monitor your reputation signals, and track how your brand is described across platforms.
Unknown to both: Proactively control your brand narrative
This quadrant is the most uncertain because you can’t control what you don’t see. However, you can influence the ecosystem through data philanthropy, and here’s how:
- Publish original research
- Provide authoritative resources
- Contribute structured, high-quality information
If you want to control how the model talks about your brand, give it something worth citing. Remember, the safest defensive strategy is to become the trusted source.
10. Structured data and knowledge graphs are foundational to how LLMs understand content. How can SEOs strengthen authority at the entity level?
Using Gus Pelogia’s guide, start by checking the confidence level of the page. If the confidence score is below 50-55%, the model is not confident in that entity and is unlikely to cite the page.
Here are a few things you can do to improve authority at the entity level:
Remove ambiguity:
These are pattern systems, not reasoning engines. They are essentially spicy autocomplete, so do not leave important signals open to interpretation.
Shaun Anderson’s work analyzing the data warehouse leak and image analysis demonstrates how many of these signals connect directly. Entity signals, structured references, and relationships all feed the same ecosystem.
Be explicit:
Use first-party sources to provide references. Supply the data yourself rather than relying on the model to infer it. Make sure foundational details are correct and consistent, including logos, brand information, and entity attributes.
Include structured data:
Structured data plays a role here, but it should be treated as part of a broader knowledge graph strategy. Clearly define relationships and entities so machines can interpret them without guessing.
What’s your biggest fear around using agentic AI for SEO?
I have two concerns, which I’ve outlined below:
Agentic misalignment:
The team at Anthropic, for all their faults, is also one of the more transparent groups publishing research about these systems.
In a simulated environment, Claude Opus 4 attempted to blackmail a supervisor to prevent being shut down, and the team released the full details of that experiment.
Source: Anthropic
They also stress-tested sixteen leading models from multiple developers in a hypothetical corporate environment to identify risky agent behaviors that could cause real harm.
In some cases, models:
- Resorted to malicious insider behavior to avoid being replaced or to achieve their goals
- Leaked sensitive information to competitors
Anthropic refers to this phenomenon (where models independently and intentionally choose harmful actions) as agentic misalignment.
Security risks:
There are also serious security concerns around agentic systems. Recent research from Cornell University outlines a range of potential exploits in AI agent workflows, including:
- Prompt-to-SQL injection attacks
- Direct prompt injection
- Toxic agent flow attacks
- Jailbreak fuzzing
- Multimodal adversarial attack
- Retrieval poisoning and other vulnerabilities
These systems also demonstrate extremely poor judgment when interacting with external links. They will follow phishing links and expose credentials because they cannot evaluate risk the way humans do.
Conclusion: Implement proactive measures to ensure LLMs don’t misrepresent your brand
LLMs are trained models that rely on pattern recognition to generate probabilistic answers, often without rendering important site content.
Protect your site by auditing log files and tightening crawler access. Strengthen your entity signals with consistent brand signals and structured data so models stop guessing. Finally, create citable content to become the source of truth and improve your brand visibility.
The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.