Search
Recommended Sites
Related Links






   

Informative Articles

Adding Sound to your Web Site - The Good, The Bad And The Ugly
Many webmasters like the idea of adding background music to their web sites but most shy away from doing it worrying about slow loading pages and large file sizes. There are many different ways to add background music to your site and some of them...

Fix My Website: Practical Graphics
Despite my lousy eyesight, I'm a hopelessly visual person. When I dink around with the site I co-edit with my friend Mary, I scan large stacks of photos and pictures from old pamphlets, and use them liberally. I admit it: I'm a graphics abuser,...

NAVIGATION BAR AND BULLETED LISTS
Next, we are going to learn to make a navigation bar so that the Web site becomes "navigable", and then we learn to present point-wise information in the form of "Bulleted Lists". A proper navigation is the backbone of your Web site's success on...

Some Online Tools to Make a Mark on Internet
If you are simply dreaming of "having" a Website. It's okay..... But if you are keen on "making" your own mark in the Web, and really interested in authoring it, brand it with your very own style and technique, there are some online tools for you to...

Tried and Tested Tips to Improve your Website - Part 1
1. DO NOT use excessive graphics or banner images on a single web page. They tend to slow the loading of your Web page. Impatient surfers might close their browser and move on. If you have many images on a single page, consider deleting few of...

 
Controlling Search Engines with Robots.txt

Introduction

As mentioned in previous articles, search engines can be great source of traffic to a standard business or personal website. What would happen though, if you didn't want to appear in them?

This is the purpose of robots.txt files.

While they generally do not help you get listed, they can help ensure that you don't get listed if you wish not to be. What is a robot?

A robot (also shortened to just "bot", or called a spider) is a computer that goes around collecting information from websites.

Different bots do different things, depending on the owners reasons for having them. In the case of search engines, the robots purpose is to collect information about what your site contains ready for it to be included in the search engine. So where does the robots.txt file fit in?

Search engines generally like to respect the owners of websites. Most like to provide people the option of not including some or all of their pages on their site in the search engine. The robots.txt file is used for telling them.

Before the bot goes around your site looking at the various pages you have, it will take a look inside your robots.txt file first to see if it is allowed to.

If the bot doesn't find a robots.txt file, or the file is blank, it will normally assume you don't want any robot blocked and feel free to roam around your site. So how do I control where it can go?

Robots.txt files can either specify individual robots to restrict, or cover them all with the one command.

Commands for robots consist of two parts:

* User-agent: used for the name of the robot to control * Disallow: where they are banned from accessing

In the example below, we would block robots called googlebot from accessing greentree.html. Googlebot is the name of Google's search engine robot, and by blocking it from this page we would remove it from Google next time they update their results.

User-agent: googlebot Disallow: greentree.html

While this works great for that individual page, what if we wanted to block it from all pages? It would be highly inefficient to list every page on your site as blocked, but we could do:

User-agent: googlebot Disallow: greentree.html Disallow: /frogs/

The above code would block googlebot from accessing greentree.html and every page in the frogs directory.

Still the whole site would not be blocked, but we have already reduced the areas that can be seen significantly. To block the whole site we disallow the "/" directory. This "/" directory is absolutely everything on the site.

For example:

User-agent: googlebot Disallow: /

You now have the ability to block as many bots as you like by naming each one individually down the file. In the case below we have banned googlebot and slurp (the name of Yahoo's robot) from the site.

User-agent: googlebot Disallow: /

User-agent: slurp Disallow: /

Finally, if the same rules apply to all bots we can specify them with the "*" character instead.

User-agent: * Disallow: /

Finally, it is worth mentioning that while almost every bot likes to play nicely with the websites it visits, there are some that do not. If you have pages that really shouldn't be seen my any sort of robot, then perhaps you should use a method of password protecting them.

About the author:

David Fitzgerald is a network administrator for the cheap web hosting and domain">www.cheap-web-site-hosting.com.au/cgi-bin/domains/d omains.pl">domain name registration services of Cheap Web Site Hosting.

Sign up for PayPal and start accepting credit card payments instantly.