What are robots.txt files?
In the world of SEO (Search Engine Optimization), there's a hidden hero that plays a crucial role in helping websites achieve their ranking goals - the robots.txt file. It may not be as flashy as on-page optimization or link building, but it's a vital component of SEO strategy.
Robots.txt files, often referred to as the "robots exclusion protocol," are simple text files that live on a website's server.
Their primary purpose is to tell search engine robots (also known as crawlers or spiders) how to interact with the content of a website.
These files contain a set of directives that guide search engine bots on which pages or sections of a site should be crawled and indexed and which ones should be ignored.
When should you use a robots.txt file?
Robots.txt files are harmless to include in your site, and generally you'll want to have one even if it is a generic default file which allows all pages to be crawled.
However, in some common cases you will definitely want to customize your robots.txt file:
- You have an admin section or other private internal pages which you do not want included in search engine pages 👉🏽 your robots.txt file should disallow crawling these pages
- You have resources such as PDFs, videos, graphs, and images which should only be for your users 👉🏽 these should also be disallowed
- You have a larger site (several thousand pages) and you want Google and other search engines to only concentrate on your most important pages 👉🏽 disallow the less important pages, like page 10 of your product search results
4 Rules to Remember About Robots.txt files
Robots.txt files are suggestions only, for how search engine crawlers read your website. Non-compliant bots will not follow your robots.txt file, and there are stronger methods to prevent a particular page from being shown on Google (such as using noindex).
The robots.txt file consists of multiple rules for each crawler to follow. These crawlers (or bots) are identified by their "User-agent". For example, to prevent Google from crawling a particular page your robots.txt file would need a rule for the "Googlebot" user-agent to be disallowed from that page URL.
The default assumption is that a crawler can access any page or directory which is not explicitly blocked by a disallow rule.
Rules are case-sensitive. For example, a rule which disallows "/articles/summary" will not prevent a bot from crawling "/articles/SUMMARY".
Manage Your Websites with Ease.
JoyBird has this feature (and more!) enabled on all websites built with our Content Management System.
Sign up today to claim your $100 free credit.