#Robots.txt ~ Digital Marketing in chandigarh

For Queries please contact us at (9988741983)

Monday 3 October 2016

Robots.txt


What is robots.txt?

The Robots Exclusion Protocol(REP) is a group of  web standards that regulate web robot behavior and search engine indexing.


Here in webpage this file gets merged which tells crawlers which of the page of the website is accessible to them and which is not.We can see our websites robot file by typing out website name/robots.txt. for example see the following image:




We can decide which content is to allow or disallow from the google. There are certain rules or methods:
 By writing :

User agent: *
Disallow: /
It will block all the web crawlers from all content.

User agent: Googlebot
Disallow: /no google/
It will block a specific web crawler from a specific folder


User agent: Googlebot
Disallow:/no google/blocked-page.html
Block a specific Web crawler from a specific web page

Sitemap Paramaeter
User agent: *
Disallow:
Sitemap: http://www.example.com/none-standard-location/sitemap.xml

Important Rules:

  • In most cases meta robots with parameters "no-index, follow" should be employed as a way to restrict crawling or indexation.
  • Is is to be noted that malicious crawlers completely ignore robots.txt and so this protocol is not a good security mechanism.
  • Only one" Disallow:" line is allowed for each URL.
  • The filename of "robots.txt" is case sensitive. We have to use "robots.txt" and not "Robots.TXT".
  • Spacing will not be accepted to separate any parameters.



Share:

0 comments:

Post a Comment

fb page

Digital Marketing

Digitalmarketing-chandigarh.blogspot.com