SEO |
A perfectly created robots.txt plays a key role in search engine optimization. To create a robots.txt file, you must have access to the root of your domain - you must also have knowledge about robots.txt syntax.
User-agent and Disallow are keywords generally used in robots.txt files. User-agent are the search engine robots whiles the keyword 'Disallow' directs the user-agent. It tells the 'User-agent' whether or not to crawl a particular page or URL.
Some Commands And Their Meaning
Example One
User-agent: *
Disallow:
Meaning- The above command tells the robot it has been given access to crawl all pages.
Example Two
User-agent: *
Disallow: /
Meaning- The above command tells the robot it doesn't have access to crawl any page.
Example Three
To allow or block a particular robot, indicate its name by replacing the syntax '*' with its name. For instance, to block Google bots from accessing your page, your robots.txt should look like this;
User-agent: Googlebot
Disallow: /no-google/
And if you're blocking it from a specific URL then your robots.txt must be created this way;
User-agent: Googlebot
Disallow: /no-google/sample-page.html
You can find any web robot in the Web Robots Database
For the sake of brevity, I have listed below some simplified URL blocking commands.
COMMAND | SYNTAX
Block
Entire site | Disallow: /
A directory | Disallow: /sample_d/
A private file | Disallow: /p_file.html
THINGS TO NOTE
- Only one 'Disallow' line is allowed for each URL. Each URL should have its own 'Disallow' line.
- Use lower cases (small letters) for the file name. The file name of your 'robots.txt' should not be 'Robots.txt', 'ROBOTS.TXT' or 'Robots. TXT'. Only lower cases are allowed.
- Two characters are accepted for pattern exclusion. ( * and $ ).
Asterisk (*) - Referred to as the wildcard, which means all. (*) means all robots are to follow the robots.txt syntax. You can also block any sequence of characters using an asterisk (*). For instance, the sample code below blocks access to all subdirectories that begin with the word 'example'.
User-agent: *
Disallow: /example*/
($) - Is used for matching the end of any link. You can block any URL that ends in a particular way using ($). For instance, the sample code below blocks any URL that ends with .msn
User-agent: *
Disallow: /*.msn$
A PERFECT ROBOTS.TXT EXAMPLE
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /archives/
Sitemap:http://www.yourblog.com/sitemap.xml
Have any questions? Feel free to ask.