Robots.txt Generator
Need to create a robots.txt file for your website? The free Robots.txt Generator by Amaze SEO Tools builds a properly formatted robots.txt file through an intuitive form interface — letting you set global crawl rules, configure individual search engine bot permissions, specify crawl delays, declare your sitemap, and block specific folders, all without writing a single line of robots.txt syntax manually.Amaze SEO Tools provides a free Robots.txt Generator that creates a valid robots.txt file by walking you through every important directive with dropdown menus, input fields, and clear labels — generating the correct syntax automatically.
The robots.txt file is one of the most important files on any website. Located at the root of your domain (e.g., https://www.example.com/robots.txt), it tells search engine crawlers which parts of your site they are allowed to access and which parts they should avoid. A properly configured robots.txt helps search engines crawl your site efficiently, protects private or duplicate content from being indexed, reduces unnecessary server load from bot traffic, and ensures your sitemap is discoverable.
However, robots.txt syntax — while simple in concept — is easy to get wrong. A misplaced directive, incorrect path format, or missing User-agent line can accidentally block important pages from search engines or fail to protect private content. Our generator eliminates syntax errors by producing the file from form selections, ensuring every directive is correctly formatted.
Interface Overview
Default – All Robots Are
The first field sets the global default crawl rule that applies to all search engine bots unless overridden by a specific bot configuration below. The input field contains "Allow" by default, meaning all robots are permitted to crawl your site. You can change this to restrict access globally.
This setting generates the User-agent: * directive in your robots.txt — the universal rule that applies to every crawler that visits your site.
Crawl-Delay
A dropdown menu labeled "Crawl-Delay" with the default value "Default - No Delay". The crawl-delay directive tells bots to wait a specified number of seconds between successive requests to your server. This is useful for preventing aggressive crawling from overloading your server resources.
Options typically range from no delay to several seconds. For most websites with adequate hosting, no delay is appropriate. Smaller or resource-constrained servers may benefit from a crawl delay of 5–10 seconds to prevent bot traffic from affecting site performance for human visitors.
Sitemap
A text input field with the placeholder "https://www.example.com/sitemap.xml". Enter the full URL of your XML sitemap here. Below the field, a note reads "Leave blank if you don't have."
The sitemap directive tells search engines where to find your XML sitemap — the file that lists all the important pages on your site. Including this in robots.txt ensures that every crawler that reads your robots.txt also discovers your sitemap, even if it has not been submitted through the search engine's webmaster tools.
Search Robots (16 Individual Bot Configurations)
A detailed section labeled "Search Robots:" provides individual dropdown menus for 16 different search engine crawlers. Each dropdown defaults to "Same as Default" — meaning the bot follows the global rule set above. You can override the default for any individual bot to allow or disallow crawling specifically for that bot.
The 16 search robots listed are:
- Google — Googlebot, Google's primary web crawler that indexes pages for Google Search results.
- Google Image — Googlebot-Image, the crawler that specifically indexes images for Google Image Search.
- Google Mobile — Googlebot-Mobile, the crawler that indexes pages for Google's mobile search results.
- MSN Search — MSNBot, the crawler for Microsoft's MSN Search (predecessor to Bing).
- Yahoo — Slurp, Yahoo's primary web crawler for indexing web pages.
- Yahoo MM — Yahoo-MMCrawler, Yahoo's multimedia content crawler for images and videos.
- Yahoo Blogs — Yahoo-Blogs, Yahoo's crawler specifically for blog content discovery and indexing.
- Ask/Teoma — Teoma, the web crawler for Ask.com (formerly Ask Jeeves) search engine.
- GigaBlast — Gigabot, the crawler for GigaBlast search engine.
- DMOZ Checker — Robozilla, the crawler associated with the DMOZ Open Directory Project for verifying listed sites.
- Nutch — Nutch, an open-source web crawler used by various search applications and research projects.
- Alexa/Wayback — ia_archiver, the crawler used by Alexa Internet and the Internet Archive's Wayback Machine for archiving web content.
- Baidu — Baiduspider, the primary web crawler for Baidu, China's largest search engine.
- Naver — Naverbot/Yeti, the web crawler for Naver, South Korea's dominant search engine.
- MSN PicSearch — Psbot, the crawler for MSN's image search functionality.
Each dropdown offers options to set that specific bot to Allow, Disallow, or Same as Default. This granular control lets you permit Google to crawl your entire site while blocking archival bots, or allow all bots except a specific aggressive crawler.
Disallow Folders
Below the search robots section, a "Disallow Folders" area lets you specify directories that should be blocked from all crawlers. An instruction reads: "The path is relative to the root and must contain a trailing slash '/'."
A text input field is pre-populated with /cgi-bin/ — a common directory to block. A green "+" button to the right of the input allows you to add additional folder paths. Click the "+" button for each additional directory you want to block. Each folder path is added as a separate Disallow directive in the generated robots.txt.
reCAPTCHA (I'm not a robot)
A Google reCAPTCHA checkbox appears below the Disallow Folders section.
Action Button
Generate (Blue Button)
After configuring all settings, click "Generate" to create the robots.txt file. The tool compiles your selections into properly formatted robots.txt syntax, including User-agent directives, Allow/Disallow rules, Crawl-delay settings, Sitemap declarations, and individual bot configurations.
How to Use Robots.txt Generator – Step by Step
- Open the Robots.txt Generator on the Amaze SEO Tools website.
- Set the default rule — confirm "Allow" for all robots, or change to restrict global access.
- Set the crawl delay — keep "Default - No Delay" for most sites, or choose a delay for resource-constrained servers.
- Enter your sitemap URL — paste the full URL to your XML sitemap (e.g., https://www.example.com/sitemap.xml). Leave blank if you do not have one.
- Configure individual search robots — adjust any of the 16 bot dropdowns if you need different rules for specific crawlers. Leave as "Same as Default" to apply the global rule.
- Add disallowed folders — specify directories to block. Use the "+" button to add multiple folders.
- Complete the reCAPTCHA by ticking the "I'm not a robot" checkbox.
- Click "Generate" to create the robots.txt file.
- Copy the output and save it as a plain text file named robots.txt in the root directory of your website.
Understanding Robots.txt Directives
User-agent
Specifies which crawler the following rules apply to. User-agent: * targets all crawlers. User-agent: Googlebot targets only Google's main crawler. Each User-agent line starts a new rule block for that specific bot.
Allow
Permits the specified crawler to access a particular path. Allow: / permits access to the entire site. Allow: /blog/ permits access specifically to the /blog/ directory. Allow is used to create exceptions within broader Disallow rules.
Disallow
Blocks the specified crawler from accessing a particular path. Disallow: /admin/ blocks crawlers from the admin directory. Disallow: / blocks access to the entire site. An empty Disallow (Disallow:) allows full access.
Crawl-delay
Tells crawlers to wait a specified number of seconds between requests. Crawl-delay: 10 means the bot should wait 10 seconds between fetching pages. Note: Google does not honor the Crawl-delay directive (use Google Search Console's crawl rate settings instead), but Bing, Yahoo, and other crawlers respect it.
Sitemap
Declares the location of your XML sitemap. Sitemap: https://www.example.com/sitemap.xml tells all crawlers where to find your sitemap. This directive is placed outside any User-agent block because it applies universally.
Common Folders to Disallow
These are directories commonly blocked in robots.txt files:
- /cgi-bin/ — Server-side CGI scripts that should not be indexed.
- /admin/ or /wp-admin/ — Administrative backend areas.
- /private/ — Private content directories.
- /tmp/ — Temporary files.
- /cart/ or /checkout/ — E-commerce cart and checkout pages that create duplicate or session-specific URLs.
- /search/ or /?s= — Internal search results pages, which create infinite URL combinations.
- /tag/ or /archive/ — Tag and archive pages that may create thin or duplicate content.
- /wp-includes/ — WordPress core files that do not need to be crawled.
- /feed/ — RSS feed URLs that duplicate page content.
- /api/ — API endpoints not intended for indexing.
Common Use Cases
New Website Launch
Every new website needs a robots.txt file from day one. The generator creates a properly structured file that allows search engines to discover and index your content while blocking administrative areas and non-public directories — setting up healthy crawl behavior from the start.
WordPress and CMS Configuration
WordPress, Joomla, Drupal, and other CMS platforms create numerous directories and URL patterns that should not be indexed — admin panels, plugin directories, search result pages, and tag archives. The generator lets you block these directories systematically while keeping your content accessible to search engines.
E-Commerce SEO
Online stores generate many crawl-unfriendly URLs — filtered product pages, shopping cart sessions, checkout flows, user account pages, and faceted navigation URLs. A well-configured robots.txt prevents search engines from wasting crawl budget on these pages and focuses indexing on your product and category pages.
Blocking Specific Bots
Some website owners want to block specific crawlers — aggressive bots that consume excessive bandwidth, archival crawlers they do not want to preserve their content, or regional search engines irrelevant to their audience. The individual bot dropdowns let you allow most crawlers while blocking specific ones.
Managing Crawl Budget
Large websites with thousands or millions of pages need to ensure search engines spend their crawl budget on the most important pages. Blocking low-value directories (pagination, tag archives, internal search results) in robots.txt directs crawlers toward high-value content like product pages, blog posts, and landing pages.
Development and Staging Environment Protection
Development and staging environments that are accidentally accessible to the public should have a robots.txt that blocks all crawlers (Disallow: /) to prevent test content from being indexed and appearing in search results.
Example Robots.txt Output
A typical generated robots.txt file might look like this:
User-agent: * Allow: / Crawl-delay: 10 Disallow: /cgi-bin/ Disallow: /admin/ Disallow: /private/ User-agent: Googlebot Allow: / User-agent: ia_archiver Disallow: / Sitemap: https://www.example.com/sitemap.xml
This file allows all bots to crawl the site with a 10-second crawl delay, blocks three directories from all crawlers, explicitly allows Google full access, blocks the Internet Archive crawler entirely, and declares the sitemap location.
Tips for Best Results
- Start with Allow for most sites — Unless you have specific reasons to block crawling globally, keep the default "Allow" setting. You want search engines to find and index your content.
- Always include your sitemap URL — The Sitemap directive is one of the easiest SEO wins. It ensures every crawler that reads your robots.txt discovers your sitemap, improving crawl coverage.
- Block admin and private directories — There is no reason for search engines to crawl login pages, admin panels, or internal tools. Block these to keep crawl attention on public content.
- Do not use robots.txt to hide sensitive content — Robots.txt is a public file that anyone can read. Disallowed URLs are visible to anyone viewing your robots.txt. For truly sensitive content, use authentication, server-side access controls, or noindex meta tags instead.
- Test before deploying — After generating your robots.txt, use Google Search Console's robots.txt Tester to verify that important pages are accessible and blocked pages are correctly restricted.
- Upload to the root directory — The file must be named exactly robots.txt and placed in the root directory of your domain (e.g., https://www.example.com/robots.txt). It does not work in subdirectories.
- Use trailing slashes for directories — When specifying disallowed paths, include the trailing slash for directories: /admin/ not /admin. This ensures the entire directory is blocked.
- Crawl-delay is not supported by all bots — Google ignores the Crawl-delay directive. Use Google Search Console to manage Google's crawl rate. Bing, Yahoo, and other crawlers do honor crawl-delay.
Frequently Asked Questions
Q: Is the Robots.txt Generator free?
A: Yes. Completely free — no registration and no hidden fees.
Q: What is a robots.txt file?
A: A plain text file placed in the root directory of a website that provides instructions to search engine crawlers about which pages or sections they are allowed or not allowed to access. It is part of the Robots Exclusion Protocol — a standard followed by all major search engines.
Q: Where do I upload the robots.txt file?
A: Upload it to the root directory of your domain so it is accessible at https://www.yourdomain.com/robots.txt. Use your web hosting file manager, FTP client, or CMS settings to place the file.
Q: Does robots.txt prevent pages from appearing in search results?
A: Robots.txt prevents crawling, but it does not guarantee a page will not appear in search results. If other sites link to a blocked page, search engines may still list the URL (without a description) in results. To prevent indexing entirely, use a noindex meta tag or X-Robots-Tag HTTP header instead.
Q: Should I block CSS and JavaScript files?
A: No. Google specifically recommends allowing access to CSS and JavaScript files so that Googlebot can render your pages as users see them. Blocking these files can harm your search rankings because Google cannot assess the page properly.
Q: How often should I update my robots.txt?
A: Update it whenever your site structure changes significantly — new directories, removed sections, new sitemaps, or new crawl requirements. For stable sites, the robots.txt may not need frequent changes.
Q: Can robots.txt protect private content?
A: No. Robots.txt is publicly readable — anyone can view it to see which directories you are blocking. It is a voluntary protocol that well-behaved crawlers follow, not a security mechanism. Use server-side authentication for genuinely private content.
Q: What happens if I do not have a robots.txt file?
A: Without a robots.txt file, search engines assume they are allowed to crawl your entire site. This is fine for simple sites, but larger sites benefit from a robots.txt that guides crawlers toward important content and away from non-public areas.
Create a properly formatted robots.txt file for your website — use the free Robots.txt Generator by Amaze SEO Tools to configure crawler access, declare your sitemap, set crawl delays, and manage individual search engine bot permissions with an easy form interface!