Managing the Indexing of Website’s Content with Robots.txt file | Tutorial

Robots Exclusion Standard is known as robots.txt protocol that advises or help web crawlers, spiders or bots to index content from the websites. Website owners use this robots.txt file to instruct web crawlers of what to crawl and what to not. These web crawlers are the automated data fetching code that is coded to fetch and update the search engines databases. These robots play an important role in indexing the websites that are shown when searched through a search engine.

Are all robots same?

No! not all robots are same. These specific functionality of a web crawler would be different for different search engines. In fact, these crawlers algorithm is a secret of search engines popularity, just as we use Google as our first priority.

How to create Robots.txt file?

By using robots.txt file, the website owner can allow and block specific content of the website thereby optimizing the site for quality search results in the search engine.

Generally, the robots.txt is written by using two command constraints,
  1. User-agent
  2. Disallow
User-agent specifies which web crawlers or spiders which we call robots.
Disallow specifies the content to be blocked by the robots specified in the above constraint.

You can use Allow to specify content that is to be allowed to crawl and index by the search engine bots. But remember this ‘Allow’ command is only recognized by some of the bots, not all. So its better to use ‘User-agent’ and ‘Disallow’ in your robots.txt file. Take a look at the following example of how to create a robots.txt file.

Open Notepad and start typing the following as per your need,

The following code allows all robots and all content from the website:
User-agent:*
Disallow:
The following code allows no robots to crawl and index any content:
User-agent:*
Disallow:/
The following code blocks some part of the website:
User-agent:
Disallow:/admin
Disallow:/temp/example.html
Disallow:/search
The following code blocks only single bot
User-agent: Bad-bot
Disallow: /
Note: The ‘Bad-bot’ is not actually a bot. It was an example.

The following code allows a single bot
User-agent: Google
Disallow:
User-agent:*
Disallow:/
Save the file with the default .txt format.

In this manner, you can create a robots.txt file to allow or block specific content to get optimized results of your website in the search results.

How to upload robots.txt file to your website?

Robots.txt file has to be uploaded to the root directory of the website. That means after uploading the robots.txt file could be accessed from your website’s www.example.com/robots.txt URL.

How to upload robots.txt file to you blogger?

Blogger doesn’t feature FTP just like Wordpress. But blogger allows robots.txt file to be written through its settings. To upload your robots.txt file to blogger blog,
  1. Login to blogger.com and go to your blog’s dashboard.
  2. Go to Settings>Search preferences.
  3. Under the Crawling and Indexing settings, Enable robots.txt settings and copy-paste the code from the .txt file you have done previously.
  4. Then go to Custom Robot Header Tags, Enable it and select the options as shown in the screenshot below.
  5. Robots.txt file blogger
You can even allow or block by using patterns in the robots.txt file. I’ll come up about advanced robots.txt file coding with another article. Subscribe to us to get notified when I get back with more info. Feel free to comment if you had any problem or needed any advice in creating the robots.txt file.

2/Post a reply/Replies

  1. Thank you it cleared the concept of robot file and things to be used in it but still little problems

    See this link

    http://www.hackbook.biz/search?updated-max=2013-12-27T22:54:00-08:00&max-results=5

    As you click on older posts they keep on expanding or changing . How do i block them from being indexed ?

    And is there a need to block them ?

    ReplyDelete
    Replies
    1. There is no need to block search pages. Humming bird algorithm of Google takes better care of such things now. If you wanted you can 'Disallow: www.hackbook.biz/search' in your robots.txt file. To check if the link indexed or not type 'site: your URL'. I found no search pages when searched for your site in Google. Hope this helped you understand!

      Delete

Post a Comment

Previous Post Next Post