Robots.txt file kahan upload karni hoti hai?

Robots.txt file aapki website ke root directory me honi chahiye — yaani seedha domain ke neeche, jaise yourdomain.com/robots.txt. Agar WordPress use karte ho toh public_html ya www folder ka root hota hai. FTP ya cPanel File Manager se yahan upload karo. File ka naam exactly 'robots.txt' hona chahiye — sab lowercase.

Kya robots.txt se website ko hackers se protect kar sakte hain?

Bilkul nahi. Robots.txt sirf well-behaved crawlers ke liye ek suggestion hai — ye koi security tool nahi hai. Koi bhi human ya malicious bot directly us blocked URL pe ja sakta hai kyunki robots.txt ek public readable file hai. Sensitive areas ke liye actual password protection, .htaccess authentication, ya firewall rules use karo.

Googlebot crawl-delay ko maanta hai ya nahi?

Officially Googlebot crawl-delay directive ko ignore karta hai. Google ka apna system hai jo tumhari server response time dekh ke crawl speed automatically adjust karta hai. Agar Google ke crawl rate ko limit karna hai toh Google Search Console me jaake 'Crawl rate settings' use karo. Crawl-delay Bingbot aur Yandexbot ke liye useful hai.

Agar robots.txt me koi rule nahi hai toh kya hoga?

Agar robots.txt file exist nahi karti ya empty hai, toh crawlers by default puri site crawl karte hain — koi restriction nahi. Ye live public website ke liye theek hai, par staging ya development sites ke liye dangerous ho sakta hai. Isliye staging sites pe hamesha 'Disallow: /' use karo.

WordPress aur Shopify ke liye alag alag robots.txt kyun chahiye?

Dono platforms ka folder structure alag hota hai. WordPress me /wp-admin/, /wp-includes/ jaise specific paths hain jo block karne chahiye. Shopify me /cart, /checkout, /collections/ filter URLs block karne zaroori hain. Ek generic robots.txt dono ke liye kaam nahi karega — isliye is generator me alag presets hain jo platform-specific paths automatically set karte hain.

Multiple sitemaps ke liye robots.txt me kya karna hoga?

Robots.txt me aap multiple Sitemap directives add kar sakte ho, ek line pe ek sitemap URL. Jaise: 'Sitemap: https://example.com/sitemap.xml' aur neeche 'Sitemap: https://example.com/news-sitemap.xml'. Is generator ke sitemap section me aap multiple URLs add kar sakte ho — sab automatically output me include ho jaate hain.

🤖 Robots.txt Generator

Visually build a robots.txt — allow/disallow rules, crawl-delay, sitemaps. Safe presets included.

⚡ Quick Presets

➕ Add User-Agent Block

User-Agent

Crawl-Delay (seconds, optional)

Disallow Paths

Allow Paths (override disallow)

📋 User-Agent Blocks

No blocks yet — add one above or pick a preset.

🗺️ Sitemap References

What a Robots.txt File Actually Does (Before You Generate Anything)

Before touching any generator, you need a clear mental model. A robots.txt file sits at the very root of your domain — https://yourdomain.com/robots.txt — and it acts as a set of instructions to web crawlers before they explore a single page. When Googlebot lands on your site, it checks this file first. What you write there tells it where it is welcome, where it should stay out, and how fast it can knock on your server's door.

The critical nuance most tutorials skip: robots.txt is not a security wall. It is a courtesy signal. A well-behaved bot like Googlebot obeys it. A scraper bot running in someone's basement probably ignores it entirely. So its real job in SEO is crawl budget management — steering search engines toward pages that actually matter, and away from thin, duplicate, or internal-only content that wastes their time.

Writing this file by hand is error-prone. A single typo — a missing slash, a space where there should not be one — can silently block your entire site from Google. That is exactly the problem an online Robots.txt Generator solves.

Step 1 — Set Your Default Crawl Policy

Open your generator of choice. Most tools (SEOptimer, SmallSEOTools, SE Ranking, DNSChecker, and similar) share a near-identical workflow because robots.txt syntax itself is standardized. The first decision you make is the broadest: what happens to crawlers by default?

Allow all bots (default open): Every crawler can access everything you have not specifically blocked. This is the right starting point for the vast majority of websites.
Block all bots (default closed): Useful during staging or if you are launching a site that is not yet ready for indexing. The output will be a single chilling line: Disallow: /

Choose "Allow all" unless you have a specific reason otherwise. You will add the restrictions in later steps.

Step 2 — Target Specific User Agents

This is where most tutorials gloss over something genuinely useful. Rather than writing a single block of rules for every crawler on earth, you can address different bots differently. The generator presents a dropdown or a list of named agents:

Googlebot — Google's main web crawler
Googlebot-Image — responsible for pulling images into Google Images
Bingbot — Microsoft's crawler
GPTBot — OpenAI's training data crawler
ClaudeBot — Anthropic's crawler
Baiduspider — Baidu's crawler (relevant if you target Chinese audiences)

A practical example: suppose you want Googlebot to crawl your entire site normally, but you want to block AI training crawlers entirely. You would create two separate rule blocks. In the generator, add a rule group for GPTBot with Disallow: /, then another group for ClaudeBot with Disallow: /, while leaving the * (all bots) group open. The generator handles the formatting of these separate sections automatically so you do not accidentally merge conflicting rules.

Step 3 — Add Your Disallow Paths

This is the most consequential step. You are telling crawlers which directories or URL patterns to skip. Here are the most common paths worth blocking and the reasoning behind each:

/wp-admin/ — If you are on WordPress, this admin panel has no indexable value and no reason to appear in Google. Block it.
/wp-includes/ — Core WordPress files. Same reasoning.
/cart/ and /checkout/ — E-commerce checkout flows are dynamic, session-specific pages. Googlebot crawling a cart URL finds near-empty content, which is crawl budget wasted.
/search/ — Internal site search results pages are almost always thin, duplicate content. Blocking them keeps your crawl budget focused on real pages.
/private/ or /staging/ — Development or internal-use directories that should never appear in any index.

Two syntax rules the generator enforces for you, but you should understand anyway: paths must start with a forward slash (/), and directory paths should end with a trailing slash (/wp-admin/ not /wp-admin). The file is also case-sensitive. Blocking /Admin/ does absolutely nothing to protect /admin/ — these are different paths.

Some generators offer CMS-specific presets. If you select "WordPress," the tool pre-fills a sensible set of disallow rules covering the most common WordPress directories that do not need indexing. This preset is a good starting point, but review it — your setup may differ.

Step 4 — Set a Crawl Delay (Optional, But Thoughtful)

The crawl-delay directive tells a bot to wait a set number of seconds between requests. Options typically range from none up to 120 seconds. Most production sites can skip this entirely — Googlebot is remarkably well-behaved about server load. But if your hosting is shared or your server strains under concurrent requests, setting a crawl delay of 5 or 10 seconds for heavy crawlers like archive.org's ia_archiver or lesser-known scrapers can meaningfully reduce server pressure.

Note: Googlebot does not officially honor the crawl-delay directive from robots.txt. To slow down Google specifically, use the crawl rate settings inside Google Search Console instead. The crawl-delay in your robots.txt affects other bots that do respect it.

Step 5 — Add Your Sitemap URL

Every good generator includes a Sitemap field. Paste the full URL to your XML sitemap here — typically https://yourdomain.com/sitemap.xml. The generator adds the correct directive at the bottom of the file:

Sitemap: https://yourdomain.com/sitemap.xml

This is not strictly required (you can also submit your sitemap in Google Search Console directly), but including it in robots.txt is a clean, universally understood signal. If you have multiple sitemaps — a news sitemap, an image sitemap, a video sitemap — add each one on its own line. Most generators let you add multiple sitemap entries by clicking an "Add another sitemap" button.

Step 6 — Generate, Review, and Download

Hit the Generate button. Before you copy or download anything, read the output. It will look something like this:

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /search/
Allow: /wp-admin/admin-ajax.php

User-agent: GPTBot
Disallow: /

Sitemap: https://yourdomain.com/sitemap.xml

Notice the Allow: /wp-admin/admin-ajax.php line in that WordPress example. Some generators add this automatically. Admin-ajax.php is a WordPress file that many front-end features rely on — including live search and dynamic content — and blocking it while blocking the rest of wp-admin is intentional and correct. This kind of nuance is why using a generator beats writing the file from scratch.

Download the file as robots.txt (it must be named exactly that, all lowercase). Upload it to the absolute root directory of your web server — not a subfolder, not a subdomain — so it resolves at yourdomain.com/robots.txt.

Step 7 — Test Before You Walk Away

This step is non-negotiable. Open Google Search Console, navigate to Settings, and find the robots.txt section. Paste your URL or let it fetch the live file. The tester shows you exactly which URLs are blocked and which are accessible under your current rules. Test a URL you expect to be blocked (like /wp-admin/) and a URL you expect to be crawlable (like /blog/) to confirm both directions work as intended.

You can also verify manually by simply visiting yourdomain.com/robots.txt in a browser right after uploading. If the file loads and looks correct, it is deployed. If you see a 404, the file is in the wrong location.

The One Mistake That Breaks Everything

A disallow rule does not un-index a page that Google has already crawled. If a URL is already in Google's index and you add Disallow: /that-page/ to your robots.txt, Google may still show that page in results — it just stops visiting it to update the content. To actually remove an indexed page from search results, you need a noindex meta tag on that page or use the URL removal tool in Google Search Console. Robots.txt controls access to a crawler; it does not erase search history. Keep these two tools in separate mental boxes and you will avoid a very common and very frustrating SEO mistake.

🤖 Robots.txt Generator

🤖 Robots.txt Generator

What a Robots.txt File Actually Does (Before You Generate Anything)

Step 1 — Set Your Default Crawl Policy

Step 2 — Target Specific User Agents

Step 3 — Add Your Disallow Paths

Step 4 — Set a Crawl Delay (Optional, But Thoughtful)

Step 5 — Add Your Sitemap URL

Step 6 — Generate, Review, and Download

Step 7 — Test Before You Walk Away

The One Mistake That Breaks Everything

FAQ