Skip to main content
ilovecalcs logoilovecalcs.

Developer Tools · Live

Robots.txt Generator, build your crawl rules visually.

Select user-agents, add Allow or Disallow directives for specific paths, and set your Sitemap URL. The robots.txt file updates in real-time — copy it or download it directly. 100% client-side.

How it worksReal-time

Settings

Global settings

Quick templates

Not honoured by Google — supported by some other crawlers.

Summary

Rule blocks1
Total directives2
Output lines5
Characters93
SitemapYes
robots.txt5 lines

Rules

User-agent blocks (1)

#1User-agent:

SEO & crawler guide

What robots.txt does, how crawlers read it, and how to use it correctly.

The robots.txt file is one of the oldest and most universally supported conventions on the web. Proposed by Martijn Koster in 1994 and followed by every major search engine, it serves as a polite communication channel between website owners and the automated programs — crawlers, spiders, bots — that index the web.

How crawlers process robots.txt

Before crawling any page on a domain, a compliant crawler fetches /robots.txt from the root of the domain (not a subdirectory). It caches this file for a period (Google caches it for up to 24 hours) and applies its rules to every subsequent request in that session. The file is domain-specific: a robots.txt at example.com does not apply to subdomain.example.com, which needs its own file.

The robots.txt format

A robots.txt file consists of one or more rule groups. Each group starts with one or more User-agent: lines (identifying which crawler the rules apply to) followed by Allow: and Disallow: directives. Groups are separated by blank lines. Comments start with #.

User-agent: Googlebot
Disallow: /admin/
Allow: /

User-agent: *
Disallow: /private/

Sitemap: https://example.com/sitemap.xml

Rule matching and precedence

A crawler reading the file selects the most specific User-agent block that matches its identity. The wildcard * is a fallback; specific bot names take priority. Within a group, when both an Allow and Disallow directive match the same URL, the longer (more specific) path wins. For paths of equal length, Allow wins over Disallow. Path matching is case-sensitive and begins from the root.

Critical misconception: robots.txt ≠ noindex

Disallowing a URL does not remove it from search results. If other pages link to a disallowed URL, Google can discover and index it from those links alone — it just will not visit the page to read its content or any noindex meta tag. This means disallowing a page you also want noindexed is self-defeating. To block indexing, place a <meta name="robots" content="noindex"> tag on the page, or return an X-Robots-Tag: noindex HTTP header — and make sure the page is crawlable so Google can read it.

What to disallow and what to leave open

Common paths worth disallowing for most sites:

  • Admin panels/admin/, /wp-admin/, /dashboard/. No SEO value and you do not want them indexed.
  • Staging/test paths/staging/, /test/. Prevents duplicate content issues.
  • Internal search result pages/search?. Faceted search can produce thousands of near-duplicate URLs that waste crawl budget.
  • Session/tracking parameters — if not handled by canonical tags, disallow parameterised variants.

What you should never disallow for an SEO-focused site: CSS, JavaScript, fonts, and images. Googlebot needs these to render pages correctly. Blocking them can cause your pages to be evaluated as if they had no styling or interactivity.

Blocking AI training crawlers

Since 2023, many content creators have added blocks for AI training bots. The major ones are GPTBot (OpenAI), CCBot (Common Crawl, source for many training datasets), anthropic-ai (Anthropic), PerplexityBot, and Google-Extended (for Google's AI products, separate from Googlebot). Adding a Disallow rule for these bots is legitimate and widely done — but understand that it only prevents future crawls; it cannot remove your content from any existing training dataset.

Sitemap declaration

Adding a Sitemap: line to robots.txt is best practice regardless of whether you submit your sitemap separately via Search Console. It ensures that any crawler that reads your robots.txt also discovers your sitemap, without you needing to submit it to each search engine manually. You can include multiple Sitemap: lines pointing to separate sitemaps (pages, images, news, videos).

Testing your robots.txt

After deploying, test it with:

  • Google Search Console — the robots.txt Tester lets you check how Googlebot interprets specific URLs.
  • Bing Webmaster Tools — equivalent tool for Bingbot.
  • curlcurl https://example.com/robots.txt to verify the file is reachable and returns 200 OK. A 4xx or 5xx response is interpreted by Google as "no restrictions" and "crawl nothing" respectively.