Crawl control

Build a robots.txt file that is easier to trust before you publish.

Start with a template that matches your site, adjust blocked and allowed paths, and test important URLs before the file goes live.

Templates

Next steps

What to do after the file looks right

A robots.txt file becomes risky when it goes live faster than it gets reviewed. Use this short path before you publish.

1. Copy

Take the generated file

Copy the current robots.txt output only after the rule list matches your real public and private paths.

2. Validate

Test important URLs

Use the built-in URL tester for admin, app, search, and key public pages so you do not publish accidental blocks.

3. Publish

Replace the live file carefully

Publish the final file at the site root, then recheck that the live version still matches the rules you intended.

Rule builder

Site URLSitemap URL

Which crawler are these rules for?Include `Allow: /` for clarity

What should be blocked from crawlers?

Custom blocked paths

What should stay clearly crawlable?

Custom allowed pathsAdd a `Host` directive if your setup needs oneOther advanced directives

Generated robots.txt

Live preview based on your current rules

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

URL tester

Active template

Starter site

Disallow rules

Allow rules

Test URL or pathTest user-agent

Result

Blocked

The most specific matching rule is a Disallow directive.

Matched rule: disallow: /admin/

Target path: /admin/

Looks good

Your current setup looks structurally clean.

Use this when

You need to limit crawler access to admin, app, search, or private utility paths.
You want a starting robots.txt file without writing directives by hand.
You need to test whether a rule blocks or allows a specific URL pattern.

Good fit for

Marketing sites with a few private sections.
Content sites that need cleaner crawl rules after launch.
Small SaaS or ecommerce sites with public pages and gated areas.

Before you publish

Make sure you are not blocking CSS, JavaScript, or image paths needed for rendering.
Check that important public pages are still crawlable by the intended user-agent.
If your real goal is de-indexing, confirm whether noindex or redirects are the better tool.

Examples

Real-world robots.txt patterns

Use these examples as starting points, then keep the final file aligned with your actual public and private paths.

Blog

Block internal search, preview, and admin paths.

User-agent: * Disallow: /search Disallow: /preview/ Disallow: /wp-admin/ Allow: / Sitemap: https://yourblog.com/sitemap.xml

Keep public posts crawlable and avoid blocking assets needed for rendering.

SaaS

Protect logged-in and API sections while leaving marketing pages open.

User-agent: * Disallow: /app/ Disallow: /api/ Disallow: /login Allow: /pricing Allow: /docs/ Sitemap: https://yourapp.com/sitemap.xml

Do not block the CSS or JS files the public site needs to load correctly.

Ecommerce

Guide crawlers away from cart and checkout URLs.

User-agent: * Disallow: /cart Disallow: /checkout Disallow: /account/ Allow: /products/ Allow: /collections/ Sitemap: https://yourstore.com/sitemap.xml

Use noindex or redirects for pages you want removed from search, not robots.txt alone.

Frequently asked

Should I block all duplicate content with robots.txt?

Usually no. If the page should still be crawled for signals, a canonical or noindex approach is often better than blocking the crawl outright.

Do I need to list my sitemap in robots.txt?

It is a good default because it helps discovery, but the sitemap should still be submitted in Search Console too.

Can I use robots.txt to de-index pages?

Not reliably. robots.txt is mainly for crawl control. If your real goal is to remove pages from search results, noindex or redirects are usually safer.

Why does the tester sometimes say a page is allowed by default?

That means no matching allow or disallow rule was found for the user-agent, so the crawler would not be blocked by the current rule set.

Common mistakes

Blocking CSS or JavaScript files the site needs to render correctly.
Using robots.txt when you really mean "do not index this page."
Leaving old staging or preview rules in production after launch.

How to use it

Start with the nearest template, then remove rules before adding new ones.
Keep the disallow list short and explainable so future edits stay safer.
Test important URLs after every change, especially admin, app, and search paths.