The robots.txt is a text file that is located in the root directory of a website. In it, the behavior of the crawlers / bots is regulated.
Notice: The robots.txt is only a recommendation for the crawler. With the included rules it is not possible to protect directories from unwanted access. Malicious crawlers can access the content without problems despite robots.txt.
A standard robots.txt file in WordPress looks like this:
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php
User-agent:
- defines which crawlers are addressed. With a * all crawlers are controlled. For example, if you only want to address the Google Bot, you can do this with User-Agent: google
This addresses all bots that start with google*.
Disallow:
- defines which directories are not to be called or crawled. This automatically includes all subdirectories. If you want to release a subdirectory for the bot, this can be done with Allow be made.
- We have made the experience that it can happen during a CMS change that Google wants to search any subdirectories of the old CMS. Since these no longer exist, the bot can simply be restricted for this.
Allow:
- In the example the admin-ajax.php in the "wp-admin" directory is unlocked, because the complete directory with all files and subdirectories was locked before.