is the step in which Google processes previously content and decides whether a URL is added to the search index. Only indexed pages can appear in search results.

Crawling and indexing are two separate steps. A page may be crawled but, in some cases, should not appear in the index. This is exactly what the noindex directive controls.

To keep a page out of the index, it should remain crawlable and carry a noindex instruction. Blocking it via robots.txt is not a reliable way to prevent indexing: a blocked URL can still appear in search results if other pages link to it.

  • Indexing = inclusion in the search index
  • controlled via noindex
  • robots.txt prevents crawling, not reliably indexing