NNorthstar SEO ToolkitTechnical SEO tools for lean websites
Crawl control

Build a robots.txt file that is easier to trust before you publish.

Start with a template that matches your site, adjust blocked and allowed paths, and test important URLs before the file goes live.

Templates
Next steps

What to do after the file looks right

A robots.txt file becomes risky when it goes live faster than it gets reviewed. Use this short path before you publish.

1. Copy
Take the generated file

Copy the current robots.txt output only after the rule list matches your real public and private paths.

2. Validate
Test important URLs

Use the built-in URL tester for admin, app, search, and key public pages so you do not publish accidental blocks.

3. Publish
Replace the live file carefully

Publish the final file at the site root, then recheck that the live version still matches the rules you intended.

Rule builder
What should be blocked from crawlers?
What should stay clearly crawlable?
Generated robots.txt
Live preview based on your current rules
User-agent: * Disallow: /admin/ Disallow: /private/ Disallow: /tmp/ Allow: / Sitemap: https://yourdomain.com/sitemap.xml
URL tester
Active template
Starter site
Disallow rules
3
Allow rules
1
Result
Blocked

The most specific matching rule is a Disallow directive.

Matched rule: disallow: /admin/
Target path: /admin/
Looks good
  • Your current setup looks structurally clean.
Use this when
  • You need to limit crawler access to admin, app, search, or private utility paths.
  • You want a starting robots.txt file without writing directives by hand.
  • You need to test whether a rule blocks or allows a specific URL pattern.
Good fit for
  • Marketing sites with a few private sections.
  • Content sites that need cleaner crawl rules after launch.
  • Small SaaS or ecommerce sites with public pages and gated areas.
Before you publish
  • Make sure you are not blocking CSS, JavaScript, or image paths needed for rendering.
  • Check that important public pages are still crawlable by the intended user-agent.
  • If your real goal is de-indexing, confirm whether noindex or redirects are the better tool.
Examples

Real-world robots.txt patterns

Use these examples as starting points, then keep the final file aligned with your actual public and private paths.

Blog
Block internal search, preview, and admin paths.

User-agent: * Disallow: /search Disallow: /preview/ Disallow: /wp-admin/ Allow: / Sitemap: https://yourblog.com/sitemap.xml

Keep public posts crawlable and avoid blocking assets needed for rendering.

SaaS
Protect logged-in and API sections while leaving marketing pages open.

User-agent: * Disallow: /app/ Disallow: /api/ Disallow: /login Allow: /pricing Allow: /docs/ Sitemap: https://yourapp.com/sitemap.xml

Do not block the CSS or JS files the public site needs to load correctly.

Ecommerce
Guide crawlers away from cart and checkout URLs.

User-agent: * Disallow: /cart Disallow: /checkout Disallow: /account/ Allow: /products/ Allow: /collections/ Sitemap: https://yourstore.com/sitemap.xml

Use noindex or redirects for pages you want removed from search, not robots.txt alone.

Frequently asked
Should I block all duplicate content with robots.txt?

Usually no. If the page should still be crawled for signals, a canonical or noindex approach is often better than blocking the crawl outright.

Do I need to list my sitemap in robots.txt?

It is a good default because it helps discovery, but the sitemap should still be submitted in Search Console too.

Can I use robots.txt to de-index pages?

Not reliably. robots.txt is mainly for crawl control. If your real goal is to remove pages from search results, noindex or redirects are usually safer.

Why does the tester sometimes say a page is allowed by default?

That means no matching allow or disallow rule was found for the user-agent, so the crawler would not be blocked by the current rule set.

Common mistakes
  • Blocking CSS or JavaScript files the site needs to render correctly.
  • Using robots.txt when you really mean "do not index this page."
  • Leaving old staging or preview rules in production after launch.
How to use it
  • Start with the nearest template, then remove rules before adding new ones.
  • Keep the disallow list short and explainable so future edits stay safer.
  • Test important URLs after every change, especially admin, app, and search paths.