robots.txt Configuration

The robots.txt file tells crawlers which pages they can and cannot access. Proper configuration is essential for controlling what gets indexed.

What is robots.txt?

robots.txt is a text file at your site's root that provides instructions to web crawlers:

https://example.com/robots.txt

It follows the Robots Exclusion Protocol, understood by all major search engines.

Basic robots.txt in Next.js

Create a dynamic robots.txt using the App Router:

// app/robots.ts
import type { MetadataRoute } from 'next'

export default function robots(): MetadataRoute.Robots {
  return {
    rules: {
      userAgent: '*',
      allow: '/',
      disallow: ['/admin/', '/api/', '/private/'],
    },
    sitemap: 'https://example.com/sitemap.xml',
  }
}

This generates:

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /private/

Sitemap: https://example.com/sitemap.xml

Multiple User-Agent Rules

Different crawlers can have different rules:

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      {
        userAgent: 'Googlebot',
        allow: '/',
        disallow: '/admin/',
      },
      {
        userAgent: 'Bingbot',
        allow: '/',
        disallow: '/admin/',
      },
      {
        userAgent: '*',
        allow: '/',
        disallow: ['/admin/', '/api/'],
      },
    ],
    sitemap: 'https://example.com/sitemap.xml',
  }
}

Static robots.txt

Alternatively, place a static file in the public directory:

# public/robots.txt
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/

Sitemap: https://example.com/sitemap.xml

What to Block

Admin and Private Areas

Disallow: /admin/
Disallow: /dashboard/
Disallow: /private/

API Routes

Disallow: /api/

API routes are for data, not for indexing.

Search Results

Disallow: /search
Disallow: /search?*

Search results pages are thin content and can cause duplicate content issues.

User-Specific Pages

Disallow: /account/
Disallow: /profile/
Disallow: /settings/

Staging and Preview

Disallow: /preview/
Disallow: /draft/

Utility Pages

Disallow: /login
Disallow: /signup
Disallow: /logout

What NOT to Block

Static Assets

Don't block CSS, JavaScript, or images - crawlers need these to render pages:

# BAD - Don't do this!
Disallow: /_next/
Disallow: /static/
Disallow: *.css
Disallow: *.js

If you block these, Googlebot can't properly render your pages.

Important Content

Never block content you want indexed:

# BAD - Blocks your blog!
Disallow: /blog/

Environment-Based robots.txt

Block crawlers on staging/preview environments:

// app/robots.ts
import type { MetadataRoute } from 'next'

export default function robots(): MetadataRoute.Robots {
  const isProduction = process.env.NODE_ENV === 'production'
  const baseUrl = process.env.NEXT_PUBLIC_BASE_URL

  if (!isProduction) {
    // Block all crawlers on non-production
    return {
      rules: {
        userAgent: '*',
        disallow: '/',
      },
    }
  }

  return {
    rules: {
      userAgent: '*',
      allow: '/',
      disallow: ['/admin/', '/api/'],
    },
    sitemap: `${baseUrl}/sitemap.xml`,
  }
}

Crawl-Delay Directive

Some crawlers support crawl-delay (not Googlebot):

User-agent: *
Crawl-delay: 10

This requests 10 seconds between requests. Use Google Search Console to control Googlebot's crawl rate instead.

Testing robots.txt

Google Search Console

Use the robots.txt Tester tool:

Go to Search Console
Select your property
Open robots.txt Tester (under Legacy tools)
Test specific URLs

Manual Testing

Visit your robots.txt directly:

https://your-site.com/robots.txt

Common Mistakes

Blocking Everything

# Oops! Blocks the entire site
User-agent: *
Disallow: /

Case Sensitivity

Paths are case-sensitive:

Disallow: /Admin/   # Won't block /admin/

Trailing Slashes

Be consistent:

Disallow: /admin    # Blocks /admin but not /admin/
Disallow: /admin/   # Blocks /admin/ and subdirectories

Forgetting Wildcards

Disallow: /api      # Blocks /api, not /api/users
Disallow: /api/     # Blocks /api/ and all subdirectories

Summary

In this lesson, you learned:

What robots.txt does and how it works
Creating robots.txt in Next.js (dynamic and static)
User-agent specific rules
What to block (admin, API, user pages)
What NOT to block (assets, important content)
Environment-based configuration
Testing and common mistakes

In the next lesson, we'll cover canonical URLs and redirects.