How to Create a robots.txt for your Magento 1 Shop

Using a robots.txt is essential for instructing bots and crawlers how and at which rate your shop should be indexed. In this article we explain how to configure your Hypernode to serve a robots.txt for one or multiple storefronts.

Using robots.txt for Magento

Manage multiple robots.txt for multiple storefronts

If you want to use several different robots.txt for each storefront, we need to create some configuration:

Create a directory structure and copy the current robots.txt in place per storefront:

for CODE in $(n98-magerun sys:store:list --format csv | sed 1d | cut -d "," -f 2 )
do
    mkdir -p /data/web/public/robots/$CODE
    cp /data/web/public/robots.txt /data/web/public/robots/$CODE/robots.txt
    echo -e "\n\n## robots for storefront: $CODE" >> /data/web/public/robots/$CODE/robots.txt
done

Now create /data/web/nginx/server.robots with the following content:

location /robots.txt {
    rewrite ^/robots\.txt$ /robots/$storecode/robots.txt;
}

It’s recommended to give each robots.txt for each storefront some unique identifier (for example the storefront name in a comment in the file), so it’s clear which robots.txt is served on which storefront.

Now test your robots.txt by requesting and verify whether the right sitemap is served:

curl -v https://www.example.com/robots.txt

Now start editing your robots.txt for each store!

Manage one robots.txt for all storefronts

For Magento multi sites, it is possible to manage one single robots.txt for all domains as well.

To do this, create a snippet in /data/web/nginx called server.robots with the following content:

location /robots.txt { return 200 "### Autogenerated robots.txt\n

# Sitemap
Sitemap: https://$http_host/sitemap.xml

# Crawlers Setup
User-agent: *
Crawl-delay: 20

# Allowable Index
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/

# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /stats/
Disallow: /var/

# Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /*/filter/
#Disallow: /*/l/
Disallow: /filter/
#Disallow: /l/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /review/s/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /productalert/

# Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /mage
Disallow: /error_log
Disallow: /install.php

# CE
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt

# Enterprise Edition
Disallow: /LICENSE_EE.html
Disallow: /LICENSE_EE.txt
Disallow: /RELEASE_NOTES.txt
Disallow: /STATUS.txt

# Paths (no clean URLs)
Disallow: /*.php
Disallow: /*?p=*
Disallow: /*?SID=

# Disallow user interactive pages
Disallow: /*review
Disallow: /product_info
Disallow: /popup_image

## Disallow osCommerce links
Disallow: *cPath=*

";
}

This will create a location /robots.txt which returns the same data on all storefronts.

Create a single robots.txt

If you want to serve just one single storefront (for example on a Hypernode Start plan), all you have to do is place a robots.txt file in /data/web/public and you’re done. :-)