Some search engines may look for and analyse its contents of a text file named robots.txt in your main directory. The robots.txt file is may be used to tell search engines which directory's they should not search in.
If the search engine is able to find robots.txt it will analyse its contents.
To Allow access all directory's:-
User-agent: *
Disallow:
To Disallow access all directory's. (note the back slash):-
User-agent: *
Disallow: /
To exclude specific directory's. In this example, three directories are excluded e.g.
cgi-bin, data and images:-
User-agent: *
Disallow: /cgi-bin/
Disallow: /data/
Disallow: /images/
Place robots.txt in your main directory.
Some search engines may have specific codes to prevent that particular search engine from searching a web site.
Excluding a file from an individual Search Engine. For example for Google:-
If you have a file, keepoutfile.htm, that you do not wish to be indexed by Google. Google uses a spider that Google sends out is called 'Googlebot'. You would add these lines to your robots.txt file:
User-Agent: Googlebot
Disallow: /keepoutfile.htm/
View the robots.txt file for his site.
Advice: Recommended use robots.txt. Some search engines may list sites quicker or be able to improve site ranking for sites which use them. Only ever use relevant robots.txt that relate to your web page. They should never be abused or over used. Also see Meta Tag
Don't rely on robots.txt to stop users from viewing information or as a powerful security tool.
Remember to rename the file exactly as "robots.txt" (all lowercase) and Not robot.txt in singular. Place robots.txt in your main directory.
robots.txt are generally used to restrict search engine or user access in some way.
More information:- Robotstxt.org and Database of Web robots, Overview
|
|||||||||||||||
Google state robots exclusion protocol now with even more flexibility ed by Dan Crow, Product Manager. the third and last in my series of blog posts about the Robots Exclusion Protocol (REP). In the first post, I introduced robots.txt and the robots META tags, giving an overview of when to use them. In the second post, I shared some examples of what you can do with the REP. The REP META tags give you useful control over how each webpage on your site is indexed. But it only works for HTML pages. How can you control access to other types of documents, such as Adobe PDF files, video and audio files and other types? Well, now the same flexibility for specifying per-URL tags is available for all other files type. Two new features that we have recently added to the protocol. The date and time is specified in the RFC 850 format. Information about the X-Robots-Tag
Google's robots.txt analysis tool to recognize sitemap declarations and relative urls.
Download RobotStats robots.txt statistics Download RobotStats robots.txt statistics
Microsoft Search Engine Optimization Toolkit. The IIS Search Engine Optimization (SEO) Toolkit enables Web developers, hosting providers, and Web server administrators to discover ways to make their site content better optimized for users and search engines, thus helping to improve the site's relevance in search results. The IIS SEO Toolkit includes the Site Analysis tool, the Robots Exclusion Protocol feature, and the Sitemaps and Site Indexes feature, which let you perform detailed analysis of web site content and manage robots.txt and sitemap files.
Download the New IIS SEO Toolkit Beta The IIS Search Engine Optimization (SEO) Toolkit helps Web developers, hosting providers, and Web server administrators to improve their Web site's relevance in search results by recommending how to make the site content more search engine-friendly. The IIS SEO Toolkit includes the Site Analysis module, the Robots Exclusion module, and the Sitemaps and Site Indexes module, which let you perform detailed analysis and offer recommendations and editing tools for managing your Robots and Sitemaps files. Also view Anti-POP-UP & Toolbars
The Web Robots Pages List of robots and protocols for setting up a robots.txt file.
Sim Spider Search Engine Robot Simulator Search Engine World has a spider that simulates what the Search Engine robots read from your website.
Robots tracking and statistics
Database of Web Robots, Overview
.htaccess Files and .htaccess Help. How use to .htaccess
Apache open-source software and Apache Servers. Mod Rewrite
Also view Follow, Nofollow. Index, Noindex, Meta Tags, Google Site Map, Site Maps and Content Doorway Pages
Please Help support A Great Portal with a donation. If you find the Advertising Methods FREE TIPS useful.
Compare Bargains. Discounts and special offers. Compare Bargains Domain Name for Sale, URL, for Sale. http://www.comparebargains.com A domain name to make money from. A Computer Portal. Freeware, Shareware. Download software. Computer languages and Programming code. Including PERL Scripts and Java Scripts. Webmaster Tools. Internet Marketing, Website promotion.
Hardware Help from BIOS to Windows and UNIX.
Advertising Methods FREE TIPS. Read in conjunction with the General Guide about Advertising Methods.. It is recommended that use Split Testing with any Advertising Methods. Legal Information