- Knowledgebase Home
- » Browse by Service
- » (dv) Dedicated-Virtual Server
- » (dv) 3.5
- » Web Applications
- Knowledgebase Home
- » Browse by Service
- » (dv) Dedicated-Virtual Server
- » (dv) 3.5
- » Web Applications
- » PHP and Coding
Prevent search engines from indexing your websites
Overview
Web Robots, also known as Web Wanderers, Crawlers, or Spiders, are programs that traverse the web automatically. Search engines, such as Google or Yahoo, use them to index the web content of your site. However they can also be used inappropriately, such as spammers using them to scan for email addresses. A robots.txt file will tell robots who visit your sites how you wish them to behave.
Instructions
First, using a plain text editor, create a robots.txt document with your favorite text editor. Then simply upload it to a directory on your service. For details on all the rules you can create please visit: http://www.robotstxt.org/
The following is an example robots.txt file which you are free to use. You will need to upload this file to your webroot, such as /home/00000/domains/example.com/html/ /var/www/vhosts/example.com/httpdocs/. Remember to remove the # sign for any command you wish the robots to follow, but be sure not to un-comment the commands description.
# Example robots.txt from (mt) Media Temple
# Learn more at http://wiki.mediatemple.net
# (mt) Forums - http://wiki.mediatemple.net/w/MT:Join_User_Forums
# (mt) System Status - http://status.mediatemple.net
# (mt) Statement of Support - http://mediatemple.net/support/statement/
# How do I check that my robots.txt file is working as expected
# http://www.google.com/support/webmasters/bin/answer.pyanswer=35237
# For a list of Robots please visit: http://www.robotstxt.org/db.html
# Instructions
# Remove the "#" to uncomment any line that you wish to use, but be sure not to uncomment the Description.
# Grant Robots Access
#######################################################################################
# This example allows all robots to visit all files because the wildcard "*" specifies all robots:
#User-agent: *
#Disallow:
#To allow a single robot you would use the following:
#User-agent: Google
#Disallow:
#User-agent: *
#Disallow: /
# Deny Robots Access
#######################################################################################
# This example keeps all robots out:
#User-agent: *
#Disallow: /
# The next is an example that tells all crawlers not to enter into four directories of a website:
#User-agent: *
#Disallow: /cgi-bin/
#Disallow: /images/
#Disallow: /tmp/
#Disallow: /private/
# Example that tells a specific crawler not to enter one specific directory:
#User-agent: BadBot
#Disallow: /private/
# Example that tells all crawlers not to enter one specific file called foo.html
#User-agent: *
#Disallow: /domains/example.com/html//var/www/vhosts/example.com/httpdocs/