Optimize Your Blog With Robots.txt

SEO
SEO

A perfectly created robots.txt plays a key role in search engine optimization. To create a robots.txt file, you must have access to the root of your domain - you must also have knowledge about robots.txt syntax.


User-agent and Disallow are keywords generally used in robots.txt files. User-agent are the search engine robots whiles the keyword 'Disallow' directs the user-agent. It tells the 'User-agent' whether or not to crawl a particular page or URL.

Some Commands And Their Meaning

Example One

User-agent: *

Disallow:

Meaning- The above command tells the robot it has been given access to crawl all pages.

Example Two

User-agent: *

Disallow: /

Meaning- The above command tells the robot it doesn't have access to crawl any page.

Example Three

To allow or block a particular robot, indicate its name by replacing the syntax '*' with its name. For instance, to block Google bots from accessing your page, your robots.txt should look like this;

User-agent: Googlebot

Disallow: /no-google/

And if you're blocking it from a specific URL then your robots.txt must be created this way;

User-agent: Googlebot

Disallow: /no-google/sample-page.html


You can find any web robot in the Web Robots Database


For the sake of brevity, I have listed below some simplified URL blocking commands.

COMMAND | SYNTAX

Block

Entire site | Disallow: /

A directory | Disallow: /sample_d/

A private file | Disallow: /p_file.html


THINGS TO NOTE

- Only one 'Disallow' line is allowed for each URL. Each URL should have its own 'Disallow' line.

- Use lower cases (small letters) for the file name. The file name of your 'robots.txt' should not be 'Robots.txt', 'ROBOTS.TXT' or 'Robots. TXT'. Only lower cases are allowed.

- Two characters are accepted for pattern exclusion. ( * and $ ).

Asterisk (*) - Referred to as the wildcard, which means all. (*) means all robots are to follow the robots.txt syntax. You can also block any sequence of characters using an asterisk (*). For instance, the sample code below blocks access to all subdirectories that begin with the word 'example'.


User-agent: *

Disallow: /example*/

($) - Is used for matching the end of any link. You can block any URL that ends in a particular way using ($). For instance, the sample code below blocks any URL that ends with .msn


User-agent: *

Disallow: /*.msn$

A PERFECT ROBOTS.TXT EXAMPLE

User-agent: *

Disallow: /cgi-bin/

Disallow: /wp-admin/

Disallow: /archives/

Sitemap:http://www.yourblog.com/sitemap.xml




Have any questions? Feel free to ask.