Digital marketing in Chandigarh(9988741983)

Monday, 3 October 2016

Robots.txt

What is robots.txt?

The Robots Exclusion Protocol(REP) is a group of web standards that regulate web robot behavior and search engine indexing.

Here in webpage this file gets merged which tells crawlers which of the page of the website is accessible to them and which is not.We can see our websites robot file by typing out website name/robots.txt. for example see the following image:

We can decide which content is to allow or disallow from the google. There are certain rules or methods:
By writing :

User agent: *
Disallow: /
It will block all the web crawlers from all content.

User agent: Googlebot

Disallow: /no google/

It will block a specific web crawler from a specific folder

User agent: Googlebot

Disallow:/no google/blocked-page.html

Block a specific Web crawler from a specific web page

Sitemap Paramaeter

User agent: *

Disallow:

Sitemap: http://www.example.com/none-standard-location/sitemap.xml

Important Rules:

In most cases meta robots with parameters "no-index, follow" should be employed as a way to restrict crawling or indexation.
Is is to be noted that malicious crawlers completely ignore robots.txt and so this protocol is not a good security mechanism.
Only one" Disallow:" line is allowed for each URL.
The filename of "robots.txt" is case sensitive. We have to use "robots.txt" and not "Robots.TXT".
Spacing will not be accepted to separate any parameters.

Digital Marketing in chandigarh

Monday, 3 October 2016

Robots.txt

What is robots.txt?

Important Rules:

0 comments:

Post a Comment

fb page

Digital Marketing

Digitalmarketing-chandigarh.blogspot.com

More useful Links

Blog Archive

Digital Marketing

About us