What to do after the file looks right
A robots.txt file becomes risky when it goes live faster than it gets reviewed. Use this short path before you publish.
Copy the current robots.txt output only after the rule list matches your real public and private paths.
Use the built-in URL tester for admin, app, search, and key public pages so you do not publish accidental blocks.
Publish the final file at the site root, then recheck that the live version still matches the rules you intended.
The most specific matching rule is a Disallow directive.
- Your current setup looks structurally clean.
- You need to limit crawler access to admin, app, search, or private utility paths.
- You want a starting robots.txt file without writing directives by hand.
- You need to test whether a rule blocks or allows a specific URL pattern.
- Marketing sites with a few private sections.
- Content sites that need cleaner crawl rules after launch.
- Small SaaS or ecommerce sites with public pages and gated areas.
- Make sure you are not blocking CSS, JavaScript, or image paths needed for rendering.
- Check that important public pages are still crawlable by the intended user-agent.
- If your real goal is de-indexing, confirm whether noindex or redirects are the better tool.
Real-world robots.txt patterns
Use these examples as starting points, then keep the final file aligned with your actual public and private paths.
User-agent: * Disallow: /search Disallow: /preview/ Disallow: /wp-admin/ Allow: / Sitemap: https://yourblog.com/sitemap.xml
Keep public posts crawlable and avoid blocking assets needed for rendering.
User-agent: * Disallow: /app/ Disallow: /api/ Disallow: /login Allow: /pricing Allow: /docs/ Sitemap: https://yourapp.com/sitemap.xml
Do not block the CSS or JS files the public site needs to load correctly.
User-agent: * Disallow: /cart Disallow: /checkout Disallow: /account/ Allow: /products/ Allow: /collections/ Sitemap: https://yourstore.com/sitemap.xml
Use noindex or redirects for pages you want removed from search, not robots.txt alone.
Usually no. If the page should still be crawled for signals, a canonical or noindex approach is often better than blocking the crawl outright.
It is a good default because it helps discovery, but the sitemap should still be submitted in Search Console too.
Not reliably. robots.txt is mainly for crawl control. If your real goal is to remove pages from search results, noindex or redirects are usually safer.
That means no matching allow or disallow rule was found for the user-agent, so the crawler would not be blocked by the current rule set.
- Blocking CSS or JavaScript files the site needs to render correctly.
- Using robots.txt when you really mean "do not index this page."
- Leaving old staging or preview rules in production after launch.
- Start with the nearest template, then remove rules before adding new ones.
- Keep the disallow list short and explainable so future edits stay safer.
- Test important URLs after every change, especially admin, app, and search paths.