All About Robots.txt and SEO
Understanding Robots.txt
The solution is this case is to stop specific Search Engine spiders from indexing some of your web pages. This is done using a robots.txt file which resides on your web space.
Robots.txt Is A Vital Weapon In The Punishing Battle
A Robots.txt file is a vital part of any webmasters battle against getting banned or punished by the search engines if he or she designs different pages for different search engines.
The robots.txt file is just a simple text file as the file extension suggests. It's created using a simple text editor like notepad or WordPad, complicated word processors such as Microsoft Word will only corrupt the file.
Getting The Robots To Work For You
You can insert certain code in this text file to make it work. This is how it can be done.
User-Agent: (Spider Name)
Disallow: (File Name)
The User-Agent is the name of the search engines spider and Disallow is the name of the file that you don't want that spider to index.
Handle Multiple Files
You have to start a new batch of code for each engine, but if you want to list multiple disallow files, you can one under another. For example –
User-Agent: Slurp
Disallow: xyz-gg.html
Disallow: xyz-al.html
Disallow: xxyyzz-gg.html
Disallow: xxyyzz-al.html
The above code disallows a site to spider two pages optimized for Google (gg) and two pages optimized for AltaVista (al).
Where Does My Robots.txt Go?
The robots.txt file resides on your webspace, but where on your webspace? The root directory! If you upload your file to sub-directories it will not work. If you wanted to disallow all engines from indexing a file, you simply use the * character where the engines name would usually be. However beware that the * character won't work on the Disallow line.
Here are the names of a few of the big engines:
Excite - ArchitextSpider
AltaVista - Scooter
Lycos - Lycos_Spider_(T-Rex)
Google - Googlebot
Alltheweb - FAST-WebCrawler
Be sure to check over the file before uploading it, as you may have made a simple mistake, which could mean your pages are indexed by engines you don't want to index them, or even worse none of your pages might be indexed.
Another advantage of the Robots.txt file is that by examining it, you can get information on what spiders, or agents have accessed your web pages. This will give you a list of all the host names as well as agent names of the spiders. Moreover, information of very small search engines also gets recorded in the text file. Thus, you know what Search Engines are likely to list your website.
- Leon Edward
Learn more about seo search engine optimization from Home Business IT @ http://www.homebusinessit.com/searchengineoptimization.html
Leon Edward helps people to start, build, market and promote internet home business and ways to earn money online at home at http://www.HomeBusinessIT.com Get Your free legitimate internet business kit bonuses and training articles, visit http://www.homebusinessit.com/newsletter/
Visit his Internet Business Ideas Store for Online Business Ideas , Home Business Reviews to Earn Money At Home . Visit his informative websites, Day Trading Made Easy , Facebook Business Applications , Buy and Sell Domain Names