Robots.txt file: What Is It, Why Do You Need It On Your Website ?

0
9

Surely you’ve heard of him. Although you are not quite clear what it is and why it is considered so important. I’m talking about the Robots.txt file.

What is special about this file?

What is it for?

How is it created?

If these questions are hanging around your head, you are in the right place! I’m going to tell you everything you need to know about this file. And when you finish reading the article, you will even be able to create your own.

Do you want to prove it? Well, let’s go to the mess 😉

What is the robots.txt file

To start at the beginning, you have to understand what the robots.txt file is and what it does.

Have you heard of robots or spiders from Google? Well, the robots.txt file is a file that tells the robots of the search engine what to crawl and index and what not.

As you know, spiders (also called “bots”) browse websites trying to find new content to include in search results. And although you think that the idea is to index the more content of your website better, it is not true.

There are pages and elements of the web that we do not want Google to index, such as internal files, pages of little relevance or those that we do not want the search engine to display in its search results (for example, the cookie policy page).

And this brings us to the next question:

 

What is the robots.txt file for?

Basically, to tell the bots what pages and files we want to go through and why not.

You can use it to:

  • Prevent the indexing of certain pages or directories (for example, duplicate content, test pages or private area).
  • Block access to your website to certain bots.
  • Deny search engines access to certain files.
  • Avoid tracking URLs that you have deleted and that report 404 error.
  • Indicate the location of your sitemap to facilitate the tracking and indexing of your website.
  • Prevent your website from being indexed until it is completely finished.

So as you see, it is a file that contains the information that the search engine spiders will read before crawling the web to know which parts are allowed access. That is, it works as a recommendation (because there are robots that do what they want) about which pages to visit and index.

Always remember that the robots.txt file is public and anyone can see it by typing /robots.txt at the end of the domain. So do not use it to try to hide private information from search engines or things like that.

How to create a robots.txt file for your website in WordPress

Now that you know what it is and what it is for, let’s see how to create a robots.txt file in WordPress.

And here I have 2 good news: the first, which is easier than you think. The second, that if you use the Yoast SEO plugin, it will be even easier because it does it for you.

To do this, just go to “Tools”, “File Editor” and create or modify your robots.txt file.

What if you do not want to use the Yoast SEO plugin?

I answer you right now.

Commands and wildcards

To create a robots.txt file, the first thing to keep in mind are the commands that are used to create the restrictions.

Beware of trying to innovate. You may end up blocking something you do not want. In addition, correct writing is very important. That is, it respects spaces, uppercase or lowercase and only introduces the allowed commands.

These are the main parameters that are used in the robots.txt:

  • User-agent: specifies to what type of robots the commands that you put below are directed.
  • Disallow: block the access of the User-agent (the bot) to the directory or URL that you specify.
  • Allow: allows access to the URL or directory you specify.
  • Sitemap: tells the bots where the sitemap of the site is located.
  • Crawl-delay: is used to indicate a delay time between each page that the bot crawls. In this way, you avoid high consumption of resources. In this case, keep in mind that not all bots pay attention to this command.

In addition to all this, there are 2 extra characters that are used as wildcards:

  • The asterisk (*): indicates “all”. It is used above all User-agent : * , for all bots; or / * / to indicate all the directories.
  • The dollar symbol ($) : used to specify any file that ends with a certain extension. For example: /.gif$ indicates all files finished in .gif.

And as for the restrictions, these are the most common:

  • User-agent: * – Include all robots
  • User-agent: Googlebot – Specify the Google robot
  • User-agent: Bingbot – Specify the Bing robot
  • Disallow: / – Deny the entire site
  • Disallow: / directory / – Deny a directory
  • Disallow: / foo * / – Deny directories that start with “foo”
  • Disallow: /page-web.htm – Deny a page
  • Disallow: /.gif$ – Deny the .gif extension
  • Allow: / directory / subdirectory / – Allow a subdirectory
  • Sitemap: https://www.tuweb.com/sitemap.xml – indicate the site map.

Sample file Robots.txt

Let’s see an example of a robots.txt file for WordPress.

User-agent: *
Disallow: / wp-admin /
Disallow: /wp-login.php/
Sitemap: https://tuweb.com/sitemap_index.xml

And now, let’s interpret this information:

  • We indicate that all the functions of the file are valid for all bots.
  • Deny access to the most private parts of WordPress (second and third line).
  • We indicate the sitemap.

Upload the file Robots.txt

Once we have created the robots.txt file, how do we upload it to our site so that Google robots can find it?

Following these steps:

  • We save the code as a text file (in a .txt document) with the name of “robots”.
  • We place it in the highest level directory of the site (https://www.tuweb.com/robots.txt).
  • We confirm that everything is correct through Google Search Console.

Do you need help with your robots.txt file?

I hope that with this tutorial, you feel able to create your own robots.txt file. However, remember that through Yoast SEO, the operation is facilitated to the maximum. And if you have any questions, you know I’m in the comments to help you out.

LEAVE A REPLY

Please enter your comment!
Please enter your name here