are bots that automatically visit and read web pages. They are controlled via the robots.txt file, and reputable crawlers usually respect its directives.

Three use cases matter for control:

  • training bots fetch content to improve future models,
  • search and retrieval bots make content available for AI search and source retrieval,
  • user-triggered bots fetch a page when a user enters a specific URL or query.

Blocking all AI crawlers indiscriminately protects content from certain uses but can cost . A differentiated strategy is usually better: deliberately allow or block training, and keep live retrieval and AI search accessible where possible.

Blocks arise not only via but also via firewalls, a or bot management. robots.txt is not a security mechanism; real protection requires login, server or WAF rules.