Crawling

Why a page can be blocked but still indexed

A practical explanation of the crawl-vs-indexing gap that confuses many small site owners.

Published Jun 10, 2026 | Updated Jun 18, 2026

A page can still appear in search results even when robots.txt blocks crawling because crawling and indexing are related but different processes. If search systems discover the blocked URL through links, sitemaps, or other references, they may still keep the URL in their index with limited information. That is why people often feel like robots.txt failed, when in reality it was never meant to be a pure de-indexing tool.

This usually happens when a blocked page is still linked from navigation, XML sitemaps, internal search results, or third-party sites. The URL keeps sending discovery signals, but the crawler is prevented from fetching the content and seeing page-level directives. In that situation, the blocked state can actually make cleanup harder because the search system has fewer signals available to understand your preferred outcome.

The safest fix depends on what you really want. If the page should remain accessible but stay out of search, noindex is usually the clearer route. If the page should not exist anymore, a redirect or removal may be more appropriate. If the goal is simply to reduce crawl access to utility sections, robots.txt is still useful, but it should not carry jobs it was not designed to do.

Why this guide matters

Use this guide when you want a little more context before publishing, need a quick refresher on best practices, or want to avoid the mistakes that commonly lead to crawl or indexing issues later.

Use this with the matching tool

Robots.txt Generator

If you want to apply this advice immediately, use the related tool and compare the output against the points covered in this guide.

Open Robots.txt Generator

Open related tool Browse all articles