How to Create a robots.txt for your Magento 1 Shop
Using a robots.txt
is essential for instructing bots and crawlers how and at which rate your shop should be indexed. In this article we explain how to configure your Hypernode to serve a robots.txt
for one or multiple storefronts.
Using robots.txt for Magento
Manage multiple robots.txt for multiple storefronts
If you want to use several different robots.txt
for each storefront, we need to create some configuration:
Create a directory structure and copy the current
robots.txt
in place per storefront:
for CODE in $(n98-magerun sys:store:list --format csv | sed 1d | cut -d "," -f 2 )
do
mkdir -p /data/web/public/robots/$CODE
cp /data/web/public/robots.txt /data/web/public/robots/$CODE/robots.txt
echo -e "\n\n## robots for storefront: $CODE" >> /data/web/public/robots/$CODE/robots.txt
done
Now create
/data/web/nginx/server.robots
with the following content:
location /robots.txt {
rewrite ^/robots\.txt$ /robots/$storecode/robots.txt;
}
It’s recommended to give each robots.txt
for each storefront some unique identifier (for example the storefront name in a comment in the file), so it’s clear which robots.txt
is served on which storefront.
Now test your
robots.txt
by requesting and verify whether the right sitemap is served:
curl -v https://www.example.com/robots.txt
Now start editing your robots.txt
for each store!
Manage one robots.txt for all storefronts
For Magento multi sites, it is possible to manage one single robots.txt
for all domains as well.
To do this, create a snippet in /data/web/nginx
called server.robots
with the following content:
location /robots.txt { return 200 "### Autogenerated robots.txt\n
# Sitemap
Sitemap: https://$http_host/sitemap.xml
# Crawlers Setup
User-agent: *
Crawl-delay: 20
# Allowable Index
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/
# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /stats/
Disallow: /var/
# Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /*/filter/
#Disallow: /*/l/
Disallow: /filter/
#Disallow: /l/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /review/s/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /productalert/
# Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /mage
Disallow: /error_log
Disallow: /install.php
# CE
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
# Enterprise Edition
Disallow: /LICENSE_EE.html
Disallow: /LICENSE_EE.txt
Disallow: /RELEASE_NOTES.txt
Disallow: /STATUS.txt
# Paths (no clean URLs)
Disallow: /*.php
Disallow: /*?p=*
Disallow: /*?SID=
# Disallow user interactive pages
Disallow: /*review
Disallow: /product_info
Disallow: /popup_image
## Disallow osCommerce links
Disallow: *cPath=*
";
}
This will create a location /robots.txt
which returns the same data on all storefronts.
Create a single robots.txt
If you want to serve just one single storefront (for example on a Hypernode Start plan), all you have to do is place a robots.txt
file in /data/web/public
and you’re done. :-)