Control which of your pages are NOT indexed with a robots.txt file
You should add a robots.txt file to the root directory of every web site you create to help control the indexing of your site by robots that ignore the META name="ROBOTS" content="NOINDEX, NOFOLLOW" convention. In this file you specifically list any pages that you DO NOT want walked and indexed. This is very easy to do and every site should have a robots.txt file on the root directory. Use a robots.txt file to list folders that contain only images, or folders that you have password protected, or folders that contain sensitive or personal information you do not want to have available to the pubic through search engines.
Create a new file with notepad and call it robots.txt
The two conventions for a robots text file are: User-agent: and Disallow:
User-agent: *
By using the * or wild card you are addressing ALL robots. If you wish to address only individual robots you need to list each one separately with individually User-agent: lines by their specific name or IP address..... (use the * wild card and address them all..... it is the safest way). To address individual robots you must use a separate User-agent: and Disallow: statement for each named spider, separated by a line break.
Disallow: /(folder_name/ or folder_name/page_name.html for specific web pages).
List any folder (or specific folder/pages) that you do not want to have indexed by robots. Use a separate Disallow: statement for each folder or folder/page.
WARNING: Disallow: / Used without any folder name tells the robot DO NOT index any page of this web site. Even if you submit your site to search engines and you have the robots.txt file set to User-agent: * Disallow: / your site will NEVER be indexed.
All of the pages of this site are listed as DO NOT INDEX via a robots.txt file using User-agent: * Disallow: / as well as listing each folder individually. Every page also has a NOINDEX, NOFOLLOW robots META tag. The only way you will find this site is from the NetObjects news groups or if I give you a link (unless you bookmark it). It will NOT come up on ANY search engine EVER!!...... If you use the User-agent: * and Disallow: / statements in this same manner your site will never be found!!!!! Be very careful here.....
Below are some examples of folders that would relate to this web site.
Disallow: /backup/ Disallow: /metatags/ Disallow: /popup/ Disallow: /popup/pop-blur-close/ Disallow: /popup/pop-auto-close/ Disallow: /popup/onclick/ Disallow: /popup/pop-top/ Disallow: /popup/pop-maxup/ Disallow: /404trap/ Disallow: /misc/ Disallow: /misc/404redirect
Comments can be placed in a robots.txt file by starting the line with #
Download this example robots.txt file.
############################### # # robots.txt file for this web site # # address all robots using wild card * # User-agent: * # # # list folders robots are not allowed to index # # Disallow: /images/ Disallow: /password/ Disallow: /personal_info/ Disallow: /backup/import-template/ Disallow: /backup/export-template/ Disallow: /misc/404redirect/ Disallow: /comment-form/ # # # List specific files to exclude from index # Disallow: /nutsNbolts/turtle_without_clothes.html # # End of robots.txt file # ###############################
To get Fusion to upload your robots txt file:
- In Assets view, from the menu select Assets > New File Asset
- Enter the name of the robots.txt file in the asset name, then browse to where you have this file stored and select it. Check the box Always Publish FIle
Fusion will always keep a current version of your robots.txt file in your web site root folder for you.
Did you find this tutorial useful? Do you want to keep this resource online? Make a donation to keep gotFusion alive
This page was written by and is maintained by turtle
|
|