Crawling

is the process in which an automated bot (such as ) requests a URL and downloads its along with resources like images, and JavaScript. Crawling is the prerequisite for content to be processed and at all.

Which URLs a crawler may fetch can be controlled through the robots.txt file. Its main purpose is to manage crawler traffic and avoid overloading servers.

It is important to distinguish crawling from indexing: crawling only means fetching a page, not adding it to the search index. For search engines to read instructions such as noindex or a , a page must be crawlable. If it is blocked via robots.txt, those signals cannot be detected reliably.

Crawling = fetching a URL
controlled via robots.txt
prerequisite for indexing and reading robots directives