29 December 2010

How to write a robots.txt file

What is robots.txt?

Robots.txt file is used by the webmasters to give instructions to robots on how to crawl their site. This includes instructions like which directory to crawl which to not, which robot is allowed and which is not, and the location of sitemap etc.

Basic Configuration of robots.txt

To allow all bots to crawl your website the robots.txt file will look like this:

User-agent: *
Allow: /
To disallow all the robots from crawling your site you can use the following code:
User-agent: *
Disallow: /
To disallow all bots except google, you can use following configuration:
User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /
To disallow bots from crawling some particular directories and pages you can use following code:
User-agent: *
Disallow: /admin/
Disallow: /css/
Disallow: /example.htm
You can also add the link of your sitemap in the robots.txt file for auto discovery using following syntax:
Sitemap: http://skills2earn.com/sitemap.xml
Reference

www.sitemaps.org

No comments: