Get Control of your Site with robots.txt

This little often overlooked file can be one very powerful tool for website design and especially search engine optimization.

What it does is tell search engine bot (spiders) which areas of a site they go to. Actually, if you don’t care about bots or want to restrict them, then you don’t need a robots.txt not even an empty one.

A basic robots.txt file uses two main variables — User-agent and Disallow.

User-agent allows you to specify an exact bot and tell it what to do. For example, you could allow Google’s spider to see all of your site but disallow the Wayback Machine.

There is is a list of common bots maintained at the Web Robots Database. There is also a wildcard using an asterisk (*) which means it applies to all bots.

Disallow is the command that tells the robot which files not to view.

Here is an example of robots.txt file:

User-Agent: *
Disallow: /report/

What this is doing is telling all bots to view the report folder.

Another example which is common is this:

User-Agent: Googlebot-Image
Disallow: /images/

That tells Google’s image bot not to look in your images folder.

In conclusion, you should always use a robots.txt file, but remember that only good bots actually take heed of it. It is not a security measure. There is nothing to stop an unscrupulousness bot from ignoring it and looking at any file it wants.