Indexing Pages / Directories Prohibition Through robots.txt

Search robots start their work by looking for the robots.txt file first of all when they enter a website. This text file is located in the site’s root directory where the main index.file is located as well. For the main site/domain, this folder is called public_html. The file robots.txt contains direct instructions for search robots. 

These instructions can prohibit folder or website page indexing and point the robot to the main website mirror. It will also recommend the search robot to observe a specific time interval for the site indexing and much more. 

In case that the robots.txt file is not located in the website directory, you can create it. To disable site indexing with the help of the robots.txt file, 2 directives can be used: User-agent and Disallow. 

  • User-agent: SPECIFY_SEARCH_BOT
  • Disallow: / # entire website’s indexing will be prohibited 
  • Disallow: /page/ # indexing of a separate /page/ will be prohibited

For example: 

To prevent your website from being indexed by MSNbot


User agent: MSNBot

Disallow: /


To prevent your website from being indexed by Yahoo Bot


User agent: Slurp

Disallow: /


To prevent your website from being indexed by Yandex Bot


User agent: Yandex

Disallow: /


To prevent your website from being indexed by Google Bot


User agent: Googlebot

Disallow: /


To prevent your website from being indexed by all search engines


User agent: *

Disallow: /


To disable indexing of the cgi-bin and image folders for every search engine


User agent: *

Disallow: /cgi-bin/

Disallow: /images/


Now, have a look at how to allow all website page indexing by search engines. Note that an empty robots.txt file will be equivalent to the instruction below. 


User agent: *

Disallow: 


For example: 


Use the following lines to allow only Yandex, Google, and Rambler bots to index the website with a delay of 4 seconds between every page poll. 


User agent: *

Disallow: / 


User agent: Yandex

Crawl-delay: 4

Disallow: 


User agent: Googlebot

Crawl-delay: 4

Disallow: 


User agent: StackRambler

Crawl-delay: 4

Disallow: 



Blog
Subscribe to our Newsletter