GotFusion.com Where your journey begins
 
Showcase your website at Fusion GalleriesNetObjects Fusion 7 Websites Portal

 


 

Control which of your pages are NOT indexed
with a robots.txt file

You should add a robots.txt file to the  root directory of every web site you create to help control the indexing of your site by robots that ignore the <META name="ROBOTS" content="NOINDEX, NOFOLLOW"> convention. In this file you specifically list any pages that you DO NOT want walked and indexed. This is very easy to do and every site should have a robots.txt file on the root directory.  Use a robots.txt file to list folders that contain only images, or folders that you have password protected, or folders that contain sensitive or personal information you do not want to have available to the pubic through search engines.

Create a new file with notepad and call it robots.txt

The two conventions for a robots text file are:  User-agent: and Disallow:

User-agent: *

By using the * or wild card you are addressing ALL robots. If you wish to address only individual robots you need to list each one separately with individually User-agent: lines by their specific name or IP address..... (use the * wild card and address them all..... it is the safest way). To address individual robots you must use a separate User-agent: and Disallow: statement for each named spider, separated by a line break.

Disallow: /(folder_name/ or folder_name/page_name.html for specific web pages).

List any folder (or specific folder/pages) that you do not want to have indexed by robots.   Use a separate Disallow: statement for each folder or folder/page.

WARNING:  Disallow: / Used without any folder name tells the robot DO NOT index any page of this web site.  Even if you submit your site to search engines and you have the robots.txt file set to User-agent: * Disallow: / your site will NEVER be indexed.

All of the pages of this site are listed as DO NOT INDEX via a robots.txt file using User-agent: * Disallow: / as well as listing each folder individually. Every page also has a NOINDEX, NOFOLLOW robots META tag. The only way you will find this site is from the NetObjects news groups or if I give you a link (unless you bookmark it). It will NOT come up on ANY search engine EVER!!...... If you use the User-agent: * and Disallow: / statements in this same manner your site will never be found!!!!! Be very careful here..... 

Below are some examples of folders that would relate to this web site.

Disallow: /backup/
Disallow: /metatags/
Disallow: /popup/
Disallow: /popup/pop-blur-close/
Disallow: /popup/pop-auto-close/
Disallow: /popup/onclick/
Disallow: /popup/pop-top/
Disallow: /popup/pop-maxup/
Disallow: /404trap/
Disallow: /misc/
Disallow: /misc/404redirect

Comments can be placed in a robots.txt file by starting the line with #

Download this example robots.txt file.

###############################
#
#            robots.txt file for this web site
#
#          address all robots using wild card *
#
User-agent: *
#
#
#      list folders robots are not allowed to index
#
#
Disallow: /images/
Disallow: /password/
Disallow: /personal_info/
Disallow: /backup/import-template/
Disallow: /backup/export-template/
Disallow: /misc/404redirect/
Disallow: /comment-form/
#
#
#        List specific files to exclude from index
#
Disallow: /nutsNbolts/turtle_without_clothes.html
#
#                    End of robots.txt file
#
###############################

To get Fusion to upload your robots txt file:

  1. In Assets view, from the menu select Assets > New File Asset



  2. Enter the name of the robots.txt file in the asset name, then browse to where you have this file stored and select it. Check the box Always Publish FIle


Fusion will always keep a current version of your robots.txt file in your web site root folder for you.

View the PREVIOUS page

Return to the TOP of this page

View the NEXT page in this tutorial


Additional resources with more information about the robots.txt files

http://www.robotstxt.org/wc/robots.html

http://spider-food.net/handling-robots-b.html

http://www.searchtools.com/robots/robots-txt.html

http://www.searchengineworld.com/robots/robots_tutorial.htm

http://www.searchengineguide.com/1stsearchranking/2001/robots.html


|  Fusion  |  Web Design  |  Hosting  |  Resources  |  gotFusion Store  | 

Problems with this page?  

All content copyright © 2002, 2003 gotFusion LLC.  The name gotFusion and the gotFusion ® logo are registered trademarks of gotFusion LLC
Copyright, legal notice & privacy statement