How to Create a robots.txt for your Magento 1 Shop

Using a robots.txt is essential for instructing bots and crawlers how and at which rate your shop should be indexed. In this article we explain how to configure your Hypernode to serve a robots.txt for one or multiple storefronts.

Using robots.txt for Magento

Manage multiple robots.txt for multiple storefronts

If you want to use several different robots.txt for each storefront, we need to create some configuration:

  • Create a directory structure and copy the current robots.txt in place per storefront:

for CODE in $(n98-magerun sys:store:list --format csv | sed 1d | cut -d "," -f 2 )
do
    mkdir -p /data/web/public/robots/$CODE
    cp /data/web/public/robots.txt /data/web/public/robots/$CODE/robots.txt
    echo -e "\n\n## robots for storefront: $CODE" >> /data/web/public/robots/$CODE/robots.txt
done
  • Now create /data/web/nginx/server.robots with the following content:

location /robots.txt {
    rewrite ^/robots\.txt$ /robots/$storecode/robots.txt;
}

It’s recommended to give each robots.txt for each storefront some unique identifier (for example the storefront name in a comment in the file), so it’s clear which robots.txt is served on which storefront.

  • Now test your robots.txt by requesting and verify whether the right sitemap is served:

curl -v https://www.example.com/robots.txt

Now start editing your robots.txt for each store!

Manage one robots.txt for all storefronts

For Magento multi sites, it is possible to manage one single robots.txt for all domains as well.

To do this, create a snippet in /data/web/nginx called server.robots with the following content:

location /robots.txt { return 200 "### Autogenerated robots.txt\n

# Sitemap
Sitemap: https://$http_host/sitemap.xml

# Crawlers Setup
User-agent: *
Crawl-delay: 20

# Allowable Index
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/

# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /stats/
Disallow: /var/

# Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /*/filter/
#Disallow: /*/l/
Disallow: /filter/
#Disallow: /l/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /review/s/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /productalert/

# Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /mage
Disallow: /error_log
Disallow: /install.php

# CE
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt

# Enterprise Edition
Disallow: /LICENSE_EE.html
Disallow: /LICENSE_EE.txt
Disallow: /RELEASE_NOTES.txt
Disallow: /STATUS.txt

# Paths (no clean URLs)
Disallow: /*.php
Disallow: /*?p=*
Disallow: /*?SID=

# Disallow user interactive pages
Disallow: /*review
Disallow: /product_info
Disallow: /popup_image

## Disallow osCommerce links
Disallow: *cPath=*

";
}

This will create a location /robots.txt which returns the same data on all storefronts.

Create a single robots.txt

If you want to serve just one single storefront (for example on a Hypernode Start plan), all you have to do is place a robots.txt file in /data/web/public and you’re done. :-)