is the process in which an automated bot (such as ) requests a URL and downloads its along with resources like images, and JavaScript. Crawling is the prerequisite for content to be processed and at all.

Which URLs a crawler may fetch can be controlled through the robots.txt file. Its main purpose is to manage crawler traffic and avoid overloading servers.

It is important to distinguish crawling from indexing: crawling only means fetching a page, not adding it to the search index. For search engines to read instructions such as noindex or a , a page must be crawlable. If it is blocked via robots.txt, those signals cannot be detected reliably.

  • Crawling = fetching a URL
  • controlled via robots.txt
  • prerequisite for indexing and reading robots directives