robots.txt Configuration
The robots.txt file tells crawlers which pages they can and cannot access. Proper configuration is essential for controlling what gets indexed.
What is robots.txt?
robots.txt is a text file at your site's root that provides instructions to web crawlers:
https://example.com/robots.txt
It follows the Robots Exclusion Protocol, understood by all major search engines.
Basic robots.txt in Next.js
Create a dynamic robots.txt using the App Router:
// app/robots.ts
import type { MetadataRoute } from 'next'
export default function robots(): MetadataRoute.Robots {
return {
rules: {
userAgent: '*',
allow: '/',
disallow: ['/admin/', '/api/', '/private/'],
},
sitemap: 'https://example.com/sitemap.xml',
}
}
This generates:
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /private/
Sitemap: https://example.com/sitemap.xml
Multiple User-Agent Rules
Different crawlers can have different rules:
export default function robots(): MetadataRoute.Robots {
return {
rules: [
{
userAgent: 'Googlebot',
allow: '/',
disallow: '/admin/',
},
{
userAgent: 'Bingbot',
allow: '/',
disallow: '/admin/',
},
{
userAgent: '*',
allow: '/',
disallow: ['/admin/', '/api/'],
},
],
sitemap: 'https://example.com/sitemap.xml',
}
}
Static robots.txt
Alternatively, place a static file in the public directory:
# public/robots.txt
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Sitemap: https://example.com/sitemap.xml
What to Block
Admin and Private Areas
Disallow: /admin/
Disallow: /dashboard/
Disallow: /private/
API Routes
Disallow: /api/
API routes are for data, not for indexing.
Search Results
Disallow: /search
Disallow: /search?*
Search results pages are thin content and can cause duplicate content issues.
User-Specific Pages
Disallow: /account/
Disallow: /profile/
Disallow: /settings/
Staging and Preview
Disallow: /preview/
Disallow: /draft/
Utility Pages
Disallow: /login
Disallow: /signup
Disallow: /logout
What NOT to Block
Static Assets
Don't block CSS, JavaScript, or images - crawlers need these to render pages:
# BAD - Don't do this!
Disallow: /_next/
Disallow: /static/
Disallow: *.css
Disallow: *.js
If you block these, Googlebot can't properly render your pages.
Important Content
Never block content you want indexed:
# BAD - Blocks your blog!
Disallow: /blog/
Environment-Based robots.txt
Block crawlers on staging/preview environments:
// app/robots.ts
import type { MetadataRoute } from 'next'
export default function robots(): MetadataRoute.Robots {
const isProduction = process.env.NODE_ENV === 'production'
const baseUrl = process.env.NEXT_PUBLIC_BASE_URL
if (!isProduction) {
// Block all crawlers on non-production
return {
rules: {
userAgent: '*',
disallow: '/',
},
}
}
return {
rules: {
userAgent: '*',
allow: '/',
disallow: ['/admin/', '/api/'],
},
sitemap: `${baseUrl}/sitemap.xml`,
}
}
Crawl-Delay Directive
Some crawlers support crawl-delay (not Googlebot):
User-agent: *
Crawl-delay: 10
This requests 10 seconds between requests. Use Google Search Console to control Googlebot's crawl rate instead.
Testing robots.txt
Google Search Console
Use the robots.txt Tester tool:
- Go to Search Console
- Select your property
- Open robots.txt Tester (under Legacy tools)
- Test specific URLs
Manual Testing
Visit your robots.txt directly:
https://your-site.com/robots.txt
Common Mistakes
Blocking Everything
# Oops! Blocks the entire site
User-agent: *
Disallow: /
Case Sensitivity
Paths are case-sensitive:
Disallow: /Admin/ # Won't block /admin/
Trailing Slashes
Be consistent:
Disallow: /admin # Blocks /admin but not /admin/
Disallow: /admin/ # Blocks /admin/ and subdirectories
Forgetting Wildcards
Disallow: /api # Blocks /api, not /api/users
Disallow: /api/ # Blocks /api/ and all subdirectories
Summary
In this lesson, you learned:
- What robots.txt does and how it works
- Creating robots.txt in Next.js (dynamic and static)
- User-agent specific rules
- What to block (admin, API, user pages)
- What NOT to block (assets, important content)
- Environment-based configuration
- Testing and common mistakes
In the next lesson, we'll cover canonical URLs and redirects.

