Robots.txt Generator
Create a compliant robots.txt file with foolproof safeguards. Includes presets for 20+ bots and 10 CMS platforms to manage search engine crawling efficiently.
1. Global Settings (All Robots)
Note: Google ignores Crawl-Delay, but Bing and Yandex respect it.2. CMS Presets (Quick Setup)
3. Custom Rules
4. Sitemap Location
About
The robots.txt file acts as the primary gatekeeper for your website's interaction with search engine crawlers. It is a simple text file placed in the root directory of your site that instructs bots (like Googlebot or Bingbot) on which pages they should access and which they must ignore. While it does not strictly enforce security, it is critical for Search Engine Optimization (SEO) and server load management.
Using a generator is highly recommended because syntax errors in this file can have catastrophic consequences, such as accidentally preventing search engines from indexing your entire website. This tool provides a structured interface to define User-agent directives, Disallow paths, and Allow exceptions without needing to manually type complex syntax. It includes safeguards and presets for popular Content Management Systems (CMS) to ensure standard administrative directories are protected from public crawling.
Formulas
The robots.txt protocol follows a specific hierarchical logic. The file is processed top-to-bottom, grouping rules by User-agent.
- Step 1. User-agent Definition. Identifies the specific robot (e.g.,
User-agent: Googlebot) or applies to all robots using a wildcard (User-agent: *). - Step 2. Blocking Access (Disallow). Specifies directories or files the bot must avoid.
Example:Disallow: /admin/prevents access to the admin folder. - Step 3. Granting Access (Allow). Overrides a parent Disallow rule for a specific sub-path.
Example:Disallow: /public/followed byAllow: /public/images/. - Step 4. Sitemap Declaration. An optional but recommended directive pointing crawlers to the XML sitemap.
Format:Sitemap: https://example.com/sitemap.xml.
Reference Data
| User-Agent (Bot) | Owner | Primary Function |
|---|---|---|
| Googlebot | Main crawler for Google Search index. | |
| Bingbot | Microsoft | Crawler for Bing search engine. |
| Slurp | Yahoo | Crawler for Yahoo Search. |
| DuckDuckBot | DuckDuckGo | Privacy-focused search engine crawler. |
| Baiduspider | Baidu | Leading Chinese search engine crawler. |
| YandexBot | Yandex | Leading Russian search engine crawler. |
| FacebookExternalHit | Meta | Crawls pages to generate previews for shared links. |
| Applebot | Apple | Used for Siri and Spotlight suggestions. |
| AhrefsBot | Ahrefs | SEO analysis and backlink checking. |
| MJ12bot | Majestic | Link intelligence and SEO mapping. |