If you use meta robots tags on your pages, you can give search engines instructions on how you’d like them to crawl or index parts of your website. This page lists an overview of all the different values you can have in the meta robots tag, what they do, and which search engines support each value.

The different robots meta tag values

The following values (‘parameters’) can be placed on their own, or together in the content attribute of tag (separated by a comma), to control how search engines interact with your page.

Scroll down for an overview of which search engines support which specific parameters.

index
Allow search engines to add the page to their index, so that it can be discovered by people searching.
Note: This is assumed by default on all pages – you generally don’t need to add this parameter.
noindex
Disallow search engines from adding this page to their index, and therefore disallow them from showing it in their results.
follow
Tells the search engines that it may follow links on the page, to discover other pages.
Note: This is assumed by default on all pages – you generally don’t need to add this parameter.
nofollow
Tells the search engines robots to not follow any links on the page.
Note: It’s unclear whether this attribute prevents search engines from following links, or just prevents them from assigning any value to those links.
Note: It’s also unclear (and inconsistent between search engines) whether this applies to all links, or only internal links.
none
A shortcut for noindex, nofollow.
all
A shortcut for index, follow.
Note: This is assumed by default on all pages, and does nothing if specified.
noimageindex
Disallow search engines from indexing images on the page.
Note: If images are linked to directly from elsewhere, search engines can still index them, so using an X-Robots-Tag HTTP header is generally a better idea.
noarchive
Prevents the search engines from showing a cached copy of this page in their search results listings.
nocache
Same as noarchive, but only used by MSN/Live.
nosnippet
Prevents the search engines from showing a text or video snippet (i.e., a meta description) of this page in the search results, and prevents them from showing a cached copy of this page in their search results listings.
Note: Snippets may still show an image thumbnail, unless noimageindex is also used.
notranslate
Prevents search engines from showing translations of the page in their search results.
Unavailable_after
Tells search engines a date/time after which they should not show it in search results; a ‘timed’ version of noindex.
Note: Must be in RFC850 format (e.g., Monday, 15-Aug-05 15:52:01 UTC).
noyaca
Prevents the search results snippet from using the page description from the Yandex Directory.
Note: Only supported by Yandex.
noydir
Blocks Yahoo from using the description for this page in the Yahoo directory as the snippet for your page in the search results.
Note: Since Yahoo closed its directory this tag is deprecated, but you might come across it once in awhile.

Which search engine supports which robots meta tag values?

This table shows which search engines support which values. Note that the documentation provided by some search engines is sparse, so there are many unknowns.

Robots value Google Yahoo Bing Ask Baidu Yandex
index Y* Y* Y* ? Y Y
noindex Y Y Y ? Y Y
follow Y* Y* Y* ? Y Y
nofollow Y Y Y ? Y Y
none Y ? ? ? N Y
all Y ? ? ? N Y
noimageindex Y N N ? N N
noarchive Y Y Y ? Y Y
nocache N N Y ? N N
nosnippet Y N Y ? N N
notranslate Y N N ? N N
unavailable_after Y N N ? N N
noodp N Y** Y** ? N N
noydir N Y** N ? N N
noyaca N N N N N Y

* Most search engines have no specific documentation for this, but we’re assuming that support for excluding parameters (e.g., nofollow) implies support for the positive equivalent (e.g., follow).
** Whilst the noodp and noydir attributes may still be ‘supported’, these directories no longer exist, and it’s likely that these values do nothing.

Rules for specifics search engines

Sometimes, you might want to provide specific instructions to a specific search engine, but not to others. Or you may want to provide completely different instructions to different search engines.

In these cases, you can change the value of the content attribute to a specific search engine (e.g., GOOGLEBOT or MSNBOT).

Note: Given that search engines will simply ignore instructions which they don’t support or understand, it’s very rare to need to use multiple meta robots tags to set instructions for specific crawlers.

Conflicting parameters, and robots.txt files

It’s important to remember that meta robots tags work differently to instructions in your robots.txt file, and that conflicting rules may cause unexpected behaviours. For example, search engines won’t be able to see your meta tags if the page is blocked via robots.txt.

You should also take care to avoid setting conflicting values in your meta robots tag (such as using both index and noindex parameters) – particularly if you’re setting different rules for different search engines. In cases of conflict, the most restrictive interpretation is usually chosen (i.e., “don’t show” usually beats “show”).

Become a technical SEO expert with our Technical SEO 1 training! »

Technical SEO 1 training Info

The resources from the search engines

The search engines themselves have pages about this subject as well:

And of course there’s always the official robots.txt pages and Danny Sullivan’s big robots meta write up.

Read more: ‘robots.txt: the ultimate guide’ »

The post The ultimate guide to the meta robots tag appeared first on Yoast.

If you want to keep your page out of the search results, there are a number of things you can do. Most of ’em are not hard and you can implement these without a ton of technical knowledge. If you can check a box, your content management system will probably have an option for that. Or allows nifty plugins like our own Yoast SEO to help you prevent the page from showing up in search results. In this post, I won’t give you difficult options to go about this. I will simply tell you what steps to take and things to consider. 

Become a technical SEO expert with our Technical SEO 1 training! »

Technical SEO 1 training Info

Why do you want to keep your page out of the search results?

It sounds like a simple question, but it’s not, really. Why do you want to keep your page out of the search results in the first place? If you don’t want that page indexed, perhaps you shouldn’t publish it? There are obvious reasons to keep for instance your internal search result pages out of Google’s search result pages or a “Thank you”-page after an order or newsletter subscription that is of no use for other visitors. But when it comes to your actual, informative pages, there really should be a good reason to block these. Feel free to drop yours in the comments below this post.

If you don’t have a good reason, simply don’t write that page.

Private pages

If your website contains a section that is targeted at, for instance, an internal audience or a, so-called, extranet, you should consider offering that information password-protected. A section of your site that can only be reached after filling out login details won’t be indexed. Search engines simply have no way to log in and visit these pages.

How to keep your page out of the search results

If you are using WordPress, and are planning a section like this on your site, please read Chris Lema’s article about the membership plugins he compared.

Noindex your page

Like that aforementioned “Thank you”-page, there might be more pages like that which you want to block. And you might even have pages left after looking critically if some pages should be on your site anyway. The right way to keep a page out of the search results is to add a robots meta tag. We have written a lengthy article about that robots meta tag before, be sure to read that.

Adding it to your page is simple: you need to add that tag to the <head> section of your page, in the source code. You’ll find examples from the major search engines linked in the robots meta article as well.

Are you using WordPress, TYPO3 or Magento? Things are even easier. Please read on.

Noindex your page with Yoast SEO

The above mentioned content management systems have the option to install our Yoast SEO plugin/extension. In that plugin or extension, you have the option to noindex a page right from your editor.

In this example, I’ll use screenshots from the meta box in Yoast SEO for WordPress. You’ll find it in the post or page editor, below the copy you’ve written. In Magento and TYPO3 you can find it in similar locations.

How to keep your site out of the search results using Yoast SEO

Click the Advanced tab in our Yoast SEO meta box. It’s the cog symbol on the left.
Use the selector at “Allow search engines to show this post/page in search results”, simply set that to “No” and you are done.

The second option in the screenshot is about following the links on that page. That allows you to keep your page out of the search results, but follow links on that page as these (internal) links matter for the other pages (again, read the robots meta article for more information). The third option: leave that as is, this is what you have set for the site-wide robots meta settings.

It’s really that simple: select the right value and your page will tell search engines to either keep the page in or out of the search results.

The last thing I want to mention here is: use with care. This robots meta setting will truly prevent a page from being indexed, unlike robots.txt suggestion to leave a page out of the search result pages. Google might ignore the latter, triggered by a lot of inbound links to the page. 

If you want to read up on how to keep your site from being indexed, please read Preventing your site from being indexed, the right way. Good luck optimizing!

The post How to keep your page out of the search results appeared first on Yoast.

In our major Yoast SEO 7.0 update, there was a bug concerning attachment URL’s. We quickly resolved the bug, but some people have suffered anyhow (because they updated before our patch). This post serves both as a warning and an apology. We want to ask all of you to check whether your settings for the redirect of the attachment URL’s are correct. And, for those of you who suffered from a decrease in rankings because of incorrect settings, we offer a solution that Google has OKed as well.

Is redirect attachment URLs set to “Yes”?

You need to check this manually: unless you have a very specific reason to allow attachment URLs to exist (more on that below), the setting should be set to “Yes” . If the setting says “Yes”, you’re all set. You can find this setting in Search Appearance, in the tab Media.

media attachment urls setting in Yoast SEO

Is your attachment URL set to “No”?

If your attachment URL is set to “no”, there are two different scenario’s which could apply to you. You could intentionally have set this setting to “no”, but the setting  could also be turned to “no” without your intent.

Intentionally set to “No”

If you intentionally put the setting of the attachment URL to “No”, you’ll probably be aware of that fact. In that case, your attachment URL’s are an important aspect of your site. You’re linking actively to these pages and these pages have real content on them (more than just a photo). This could for instance apply to a photography site. If you want this setting to say “No”, you’ll probably have put a lot of thought in this. In this case, you can leave your setting to “no”. You’re all set!

Unintentionally set to “No”

It is also possible that you notice that the setting is set to “No” and this was not intentionally. You’ve suffered from our bug. We’re so very sorry. You should switch your toggle to “Yes” and save the changes. Perhaps you need to do a little bit more, though. There are (again) two scenario’s:

Traffic and ranking is normal

Ask yourself the following question: have you noticed any dramatic differences in your rankings and traffic in the last three months (since our 7.0 update of march 6th)? If the answer to this question is no, than you should just turn the redirect setting of the attachment URL to “Yes” and leave it at that. You did not suffer from any harm in rankings, probably because you’re not using attachment URL’s all that much anyway. This will be the case for most sites. After switching your toggle to “Yes” and saving the changes, you’re good to go!

Traffic and ranking have decreased

In the second scenario, you notice that the redirect attachment URL setting is set to “No” and you did indeed suffer from a dramatic decrease in traffic and ranking. We’re so very sorry about that. Make sure to switch the setting of the attachment URL to “Yes” immediately.  In order to help you solve your ranking problem, we have built a search index purge plugin. Download and install this plugin here. More on the working of this separate plugin below.

What to do if you’re not sure

If you’re not sure whether you’ve been affected by this, and your Google Search Console is inconclusive: don’t do anything other than setting the setting to “Yes”. See “What did Google say” below for the rationale.

What do attachment URL’s do anyway?

When you upload an image in WordPress, WordPress does not only store the image, it also creates a separate so-called attachment URL for every image. These attachment URLs are very “thin”: they have little to no content outside of the image. Because of that fact, they’re bad for SEO: they inflate the number of pages on your site while not increasing the amount of quality content. This is something that WordPress does, which our plugin takes care off (if the setting is correctly turned to “Yes”).

Historically, we had had a (default off) setting that would redirect the attachment URL for an image to the post the image was attached to. So if I uploaded an image to this post, the attachment URL for that image would redirect to this post. In the old way of dealing with this, it meant that images added for other reasons (like say, a site icon, or a page header you’d add in the WordPress customizer), would not redirect.  It also meant that if you used an image twice, you could not be certain where it would redirect.

In Yoast SEO 7.0 we introduced a new feature to deal with these pages. Now, we default to redirecting the attachment URL to the image itself. This basically means attachment URLs no longer exist on your site at all. This actually is a significant improvement.

What did the bug do (wrong)?

The bug was simple yet very painful: when you updated from an earlier version of Yoast SEO to Yoast SEO 7.0-7.0.2 (specifically those versions), we would not always correctly convert the setting you had for the old setting into the new one. We accidentally set the setting to ‘no’. Because we overwrote the old settings during the update, we could not revert this bug later on.

The impact of the bug

For some sites our bug might have a truly bad impact. In Twitter and Facebook discussions I’ve had, I’ve been shown sites that had the number of indexed URLs on their site quintupled, without adding any content. Because with that setting being “No” XML sitemaps was enabled for attachments. As a result of that, lots and lots of attachment URLs got into Google’s index. Some of those sites are now suffering from Panda-like problems. The problem will be specifically big if you have a lot of pictures on your website and few high quality content-pages. In these cases,  Google will think you’ve created a lot of ‘thin content’ pages all of a sudden.

The vast majority of the websites running Yoast SEO probably hasn’t suffered at all. Still, we messed up. I myself, am sorry. More so than normal, because I came up with and coded this change myself…

What did Google say?

We have good contacts at Google and talk to them regularly about issues like these. In this case, we discussed it with John Mueller and his first assessment was similar to mine: sites should normally not suffer from this. That’s why we don’t think drastic measures are needed for everyone. Let me quote him:

“Sites generally shouldn’t be negatively affected by something like this. We often index pages like that for normal sites, and they usually don’t show up in search. If they do show up for normal queries, usually that’s a sign that the site has other, bigger problems. Also, over the time you mentioned, there have been various reports on twitter & co about changes in rankings, so if sites are seeing changes, I’d imagine it’s more due to normal search changes than anything like this.”

We’ve also discussed potential solutions with him. The following solution has been OK’d by him as the best and fastest solution.

What does this search index purge plugin do?

The purpose of the search index purge plugin is to purge attachment URLs out of the search results as fast as possible. Just setting the Yoast SEO attachment URL redirect setting to “Yes” isn’t fast enough. When you do that, you no longer have XML sitemaps or anything else that would make Google crawl those pages, and thus it could take months for Google to remove those URLs. That’s why I needed to be creative.

Installing this plugin will do the following two things:

  • Every attachment URL will return a 410 status code.
  • A static XML sitemap, containing all the attachment URLs on a given site will be created. The post modified date for each of those URLs is the activation date and time of the plugin.

The XML sitemap with recent post modified date will make sure that Google spiders all those URLs again. The 410 status code will make sure Google takes them out of its search results in the fastest way possible.

After six months the attachment URLs should be gone from the search results. You should then remove the search index purge plugin, and keep the redirect setting of the attachment URLs set to “Yes”.

Advice: keep informed!

We try to do the very best we can to help you get the best SEO out of your site. We regularly update our configuration wizard and there is no harm whatsoever in running through it again. Please regularly check if your site’s settings are still current for your site. We do make mistakes, and this release in particular has led us to a rigorous post mortem on all the stages of this release’s process.

We regularly write about things that change in Google, so stay up to date by subscribing to our newsletter below. If you want to understand more of the how and why of all this, please do also take our new, free, SEO for Beginners course, which you’ll get access to when you sign up.

The post Media / attachment URL: what to do with them? appeared first on Yoast.

Crawl errors occur when a search engine tries to reach a page on your website but fails at it. Let’s shed some more light on crawling first. Crawling is the process where a search engine tries to visit every page of your website via a bot. A search engine bot finds a link to your website and starts to find all your public pages from there. The bot crawls the pages and indexes all the contents for use in Google, plus adds all the links on these pages to the pile of pages it still has to crawl. Your main goal as a website owner is to make sure the search engine bot can get to all pages on the site. Failing this process returns what we call crawl errors.

Become a technical SEO expert with our Technical SEO 1 training! »

Technical SEO 1 training Info

Your goal is to make sure that every link on your website leads to an actual page. That might be via a 301 redirect, but the page at the very end of that link should always return a 200 OK server response.

Google divides crawl errors into two groups:

  1. Site errors. You don’t want these, as they mean your entire site can’t be crawled.
  2. URL errors. You don’t want these either, but since they only relate to one specific URL per error, they are easier to maintain and fix.

Let’s elaborate on that.

Site errors

Site errors are all the crawl errors that prevent the search engine bot from accessing your website. That can have many reasons,  these being the most common:

  • DNS Errors. This means a search engine isn’t able to communicate with your server. It might be down, for instance, meaning your website can’t be visited. This is usually a temporary issue. Google will come back to your website later and crawl your site anyway. If you see notices of this in your Google Search Console at crawl errors, that probably means Google has tried a couple of times and still wasn’t able to.
  • Server errors. If your search console shows server errors, this means the bot wasn’t able to access your website. The request might have timed out. The search engine (f.i.) tried to visit your site, but it took so long to load that the server served an error message. Server errors also occur when there are flaws in your code that prevent a page from loading. It can also mean that your site has so many visitors that the server just couldn’t handle all the requests. A lot of these errors are returned as 5xx status codes, like the 500 and 503 status codes described here.
  • Robots failure. Before crawling, (f.i.) Googlebot tries to crawl your robots.txt file as well, just to see if there are any areas on your website you’d rather not have indexed. If that bot can’t reach the robots.txt file, Google will postpone the crawl until it can reach the robots.txt file. So always make sure it’s available.

That explains a tad bit about crawl errors related to your entire site. Now let’s see what kind of crawl errors might occur for specific pages.

URL errors

As mentioned, URL errors refer to crawl errors that occur when a search engine bot tries to crawl a specific page of your website. When we discuss URL errors, we tend to discuss crawl errors like (soft) 404 Not Found errors first. You should frequently check for these type of errors (useGoogle Search Console or Bing webmaster tools) and fix ’em. If the page/subject of that page indeed is gone never to return to your website, serve a 410 page. If you have similar content on another page, please use a 301 redirect instead. Make sure your sitemap and internal links are up to date as well, obviously.

We found that a lot of these URL errors are caused by internal links, by the way. So a lot of these errors are your fault. If you remove a page from your site at some point, adjust or remove any inbound links to it as well. These links have no use anymore. If that link remains the same, a bot will find it and follow it, only to find a dead end (404 Not found error). On your website. You need to do some maintenance now and then on your internal links!

Among these common errors might be an occasional DNS error or server error for that specific URL. Re-check that URL later and see if the error has vanished. Be sure to use fetch as Google and mark the error as fixed in Google Search Console if that is your main monitoring tool in this. Our plugin can help you with that.

Very specific URL errors

There are some URL errors that apply to certain sites only. That’s why I’d like to list these separately:

  • Mobile-specific URL errors. This refers to page-specific crawl errors that occur on a modern smartphone. If you have a responsive website, these are unlikely to surface. Perhaps just for that piece of Flash content you wanted to replace already. If you maintain a separate mobile subdomain like m.example.com, you might run into more errors. Thing along the lines of faulty redirects from your desktop site to that mobile site. You might even have blocked some of that mobile site with a line in your robots.txt.
  • Malware errors. If you encounter malware errors in your webmaster tools, this means Bing or Google has found malicious software on that URL. That might mean that software is found that is used, for instance, “to gather guarded information, or to disrupt their operation in general.”(Wikipedia). You need to investigate that page and remove the malware.
  • Google News errors. There are some specific Google News errors. There’s quite a list of these possible errors in Google’s documentation, so if your website is in Google News, you might get these crawl errors. They vary from the lack of a title to errors that tell you that your page doesn’t seem to contain a news article at all. Be sure to check for yourself if this applies to your site.

Fix your crawl errors

The bottom line in this article is definitely: if you encounter crawl errors, fix them. It should be part of your site’s maintenance schedule to check for crawl errors now and then. Besides that, if you have installed our premium plugin, you’ll have a convenient way in WordPress and/or TYPO3 to prevent crawl errors when for instance deleting a page. Be sure to check these features yourselves!

Read more: ‘Google Search Console: Crawl’ »

The post Basic SEO: What are crawl errors? appeared first on Yoast.

Your site needs to be up and running if you want to be found in search engines. If you aren’t blocking anything — deliberately or accidentally — search engine spiders can crawl and index it. You probably know that Yoast SEO has lots of options to determine what does and doesn’t need to be indexed, but did you know it also has a check that monitors your site’s indexability? This is the indexability check, provided by our good friends at Ryte.

Optimize your site for search & social media and keep it optimized with Yoast SEO Premium »

Yoast SEO: the #1 WordPress SEO plugin Info

What does it do?

The indexability check checks regularly if your site is indexable. You can find the Ryte indexability check on your site’s dashboard inside the Yoast SEO Posts Overview box. It is straightforward to use as it is just a colored bullet showing the indexability status of your site:

  • Green: All is well, your site is indexable.
  • Grey: Yoast SEO hasn’t been able to determine the status of your site.
  • Red: Your homepage cannot be indexed, and you should look into this immediately.

Dashboard overview Yoast SEORemember, something is up if you should ever get a red bullet. If you do get one, and you are sure your site should be indexable, please check if your site is available by running a Mobile-friendly test by Google. Your site should appear if it is indexable. If it does, it might be that Ryte had the hiccups.

Should Google be unable to run the test, you could hit the ‘Analyze entire site’ button in your WordPress backend and follow the instructions given by Ryte. Sign up with them and give your site the once-over. The phenomenal Ryte suite gives you loads of advice on how to cope with indexability errors and more.

A grey bullet means that your server is unable to connect to the Ryte servers to get the indexability status of your site. There are several reasons why this could be the case. Please see the Indexability check fails post on our knowledge base for more information on how to evaluate and fix this.

What do I have to do to get it?

We add this check automatically when you install Yoast SEO. Find it in your WordPress dashboard. If it doesn’t show a green bullet, you can manually run a check by clicking ‘Fetch the current status’ button inside the Yoast SEO Posts Overview box.

If you don’t need the Ryte indexability check, you can always turn it off. Go to General > Features in Yoast SEO and switch the Ryte integration button to off.

Yoast & Ryte

Ryte & Yoast SEO

Ryte offers a free indexability check for Yoast SEO users. This way, you can quickly see that your site is still reachable for both search engines and visitors. If you need help fixing technical SEO issues or if you are in need of a great suite of SEO tools to help you fix or improve your rankings, you can always sign up for the free Ryte introductory plan. Just hit the purple ‘Analyze entire site’ button and follow the instructions!

Read more: ‘SEO basics: What is crawlability’ »

The post Yoast SEO & Ryte: Checking your site’s indexability appeared first on Yoast.

Paginated archives have long been a topic of discussion in the SEO community. Over time, best practices for optimization have evolved, and we now have pretty clear definitions. This post explains what these best practices are. It’s good to know that Yoast SEO applies all these rules to every archive with pagination.

Optimize your site for search & social media and keep it optimized with Yoast SEO Premium »

Yoast SEO: the #1 WordPress SEO plugin Info

Indicate that an archive has pagination

When a search engine crawls page one of an archive, it needs to know it’s a paginated archive. For the longest time, the only way for it to know that something was a paginated archive is when it found a “next” or “previous link”. This was solved by the introduction of rel="next" and rel="prev" link-elements, to be applied in the head of a page, a topic we’ve written about before.

For a while, there was a discussion in the SEO community about how to combine this with rel canonical. Should page 2 and further of an archive have a canonical link to page 1, or to itself? The idea was that you mostly want visitors to end up on page 1 of an archive. That page is usually the most relevant for the majority of users.

Google is very clear now: each page within a paginated series should canonicalize to itself, so /page/2/ has a canonical pointing to /page/2/.

Should page 2 etc. be in the search results?

For a while, SEOs thought it might be a good idea to add a noindex robots meta tag to page 2 and further of a paginated archive. This would prevent people from finding page 2 and further in the search results. The idea was that the search engine would still follow all these links, so all the linked pages would still be properly indexed.

The problem is that at the end of last year, Google said something that caught our attention: long-term noindex on a page will lead to them not following links on that page. This makes adding noindex to page 2 and further of paginated archives a bad idea, as it might lead to your articles no longer getting the internal links they need.

Because of what Google said about long-term noindex, in Yoast SEO 6.3 we removed the option to add noindex to subpages of archives.

Annoying SEO side effects

So you can no longer keep page 2 and further out of the search results. This has the annoying side effect that Google Search Console might start to give you warnings. Specifically, it might warn you about duplicate titles and duplicate meta descriptions. You can safely ignore these warnings, a fact I’ve confirmed with Google this week:

I guess, in time, Google will stop showing these warnings for paginated archives in Google Search Console.

Read on: ‘Why every website needs Yoast SEO’ »

The post Pagination & SEO: best practices appeared first on Yoast.

Some of the pages of your site serve a purpose, but that purpose isn’t ranking in search engines or even getting traffic to your site. These pages need to be there as glue for other pages, or simply because whatever regulations require them to be accessible on your website. As a regular visitor to our website, you know what noindex or nofollow can do to these pages. If you are new to these terms, please read on and let me explain what they are and what pages they might apply to!

Optimize your site for search & social media and keep it optimized with Yoast SEO Premium »

Yoast SEO: the #1 WordPress SEO plugin Info

What is noindex or nofollow?

Both are settings you can add to your robots meta tag. We did quite an extensive ultimate guide on the robots meta tag that you should read.

In short:

  • It looks like this in most cases:
    <meta name="robots" content="[VALUE1,VALUE2]">
  • VALUE1 and VALUE2 are set to index, follow by default, meaning the page at hand can be indexed and all links on that page can be followed by search engine spiders to index the pages they link to.
  • VALUE1 and VALUE2 can be set to noindex, nofollow as well. noindex means that the page shouldn’t be indexed by search engines, but doesn’t mean the search engines shouldn’t follow the links on the page. nofollow means that it also shouldn’t follow the links.

Pages that you might want to noindex

Author archives on a one-author blog

If you are the only one writing for your blog, your author pages are probably 90% the same as your blog homepage. That’s of no use to Google and can be considered duplicate content. To keep these out of the search results, you can noindex them.

Certain (custom) post types

Sometimes a plugin or a web developer adds a custom post type that you don’t want to be indexed. At Yoast, we use custom pages for our products, as we are not a regular online shop that sells, for instance, kitchen appliances. We don’t need a product image, filters like dimensions and technical specifications on a tab next to the description. Therefore, we noindex the regular product pages WooCommerce outputs and are using our own pages. Indeed, we noindex the product post type.

By the way, I have seen shop solutions that added things like dimensions and weight as a custom post type as well. These pages are considered to be low-quality content. You will understand that these pages have no use for a visitor or Google, so need to be kept out of the search result pages.

Thank you pages

That page serves no other purpose than to thank your customer/newsletter subscriber. Usually thin content, or upsell and social share options, but no added value content-wise.

Admin and login pages

Of course, your login pages are not in Google. But these are. Keep them out of the index by adding that noindex. Exceptions are the login pages that serve a community, like Dropbox or similar services. Just ask yourself if you would google for one of your login pages if you were not in your company. If not, it’s probably safe to say that Google doesn’t need to index these pages.

Internal search results

Internal search results are like the last pages Google wants to point its visitors to. If you want to ruin a search experience, you link to other search pages. But the links on that search result page are still very valuable for Google, so all links should be followed. The robots meta setting should be:
<meta name="robots" content="noindex, follow">

The same setting goes for all the examples mentioned above, there is no need to nofollow the links on these pages. Now, when should you add a nofollow to your robots meta tag?

Pages that you might want to nofollow

Google roughly indicates that there are three reasons to nofollow links:

  1. Untrusted content
  2. Paid links
  3. Crawl prioritization

For instance, we add a nofollow tag to links in comments. We don’t know what all of you are dropping in there, right. It could be anything from the #1 and #2 above. With regards to number 3, this could, for instance, apply to login links, that we sometimes find on WordPress websites, see image on the right. It’s no use having a Googlebot go over these links, as for search engines, they add no value. These are nofollowed.

All of the above is very much on a link level. But if you have for instance a page that shows SEO books, with a surplus of Amazon affiliate links, these might add value to your site for your users. But I’d nofollow that entire page if there’s nothing else that matters on the page. You might have it indexed, though. Just make sure you cloak your links the right way.

To be honest, on a regular website, I don’t think there are a lot of pages I’d set to nofollow. Check for yourself if you have any content that mainly contains links like the ones Google indicated, and decide if Google should follow them or not.

Changing SEO insights

At Yoast, we always try to keep you on top of your SEO game, without per se bugging you about it. One of the settings in Yoast SEO that we have had for years, the “Noindex subpages of archives” checkbox is one of those. It made all the sense in the world to noindex, follow these, and have Google index just the main page, the first page of your (f.i.) category archive.

We were always aware that Google was getting better and better at understanding rel="next" and rel="prev" on these subpages of archives. Yoast SEO adds these tags as well. At this point, we know that rel="next" and rel="prev" cover the way archives should be indexed and noindex-ing subpages isn’t necessary anymore, so we’ve removed that setting from our plugin altogether to make sure it’s done right on your site!

Read on: ‘Prevent your site from being indexed, the right way’ »

The post Which pages should I noindex or nofollow? appeared first on Yoast.

Whenever you make some big changes to your website, for instance to your brand name, you’re probably eager for these changes to show in the search results. Unfortunately, it can take a while for Google to crawl your site again and until then, it will show the indexed version of your site in the results, without the changes. Of course, when people click to your site they see’ll the changes, but you want them to be visible in the results pages too.

So, is there anything you can do to help or speed up this process? And is there anything else to bear in mind when making these changes? Let’s get into that in this Ask Yoast!

Optimize your site for search & social media and keep it optimized with Yoast SEO Premium »

Yoast SEO: the #1 WordPress SEO plugin Info

Jolene Moody emailed us her question on this topic:

“I recently changed the name of my business. We have changed it in the WordPress dashboard too. But I didn’t see the change yet in the search results. How long does it take Google to show this change?”

Watch the video or read the transcript further down the page for my answer!

Helping Google pick up changes on your site

“It depends on how often Google visits your site and you probably don’t know how often that is. Now, what you can do is go to Google search console and go to ‘Fetch & render’ and then fetch and render your homepage. Then, after it’s done, there’s an option to submit to index. At that point, Google will have already crawled your site and will use that data to show your site in the index, so when people search for your brand, at least your homepage will have the proper brand name.

But it’s very important, if you change the name of your business and people are still searching for the old name of your business, that you also have the old name of your business on your site somewhere. That way people can still find you for that, when they don’t know that you’ve renamed your business. Good luck!”

Ask Yoast

In the series Ask Yoast we answer SEO questions from our readers. Have an SEO-related question? Let us help you out! Send an email to ask@yoast.com.

(Note: please check our blog and knowledge base first, the answer to your question may already be out there! For urgent questions, for example about our plugin not working properly, we’d like to refer you to our support page.)

Read more: ‘SEO Basics: What is Googlebot?’ »

The post Ask Yoast: Changes to your site and the search results appeared first on Yoast.

We recently made some changes to how yoast.com is run as a shop and how it’s hosted. In that process, we accidentally removed our robots.txt file and caused a so-called spider trap to open. In this post, I’ll show you what a spider trap is, why it’s problematic and how you can find and fix them.

Optimize your site for search & social media and keep it optimized with Yoast SEO Premium »

Yoast SEO: the #1 WordPress SEO plugin Info

What is a spider trap?

A spider trap is when you create a system that creates unlimited URLs. So Google can spider a page and find 20 new URLs on it. If it then spiders those 20 URLs, it finds 20 * 20 new URLs. If it then spiders those 400 URLs, it finds 400 * 20 = 8,000 new URLs. This escalates quickly, as you can see. If each and every one of these URLs were unique and wonderful, this would not be a problem, but usually, they’re not. So this causes a massive duplicate content problem.

A spider trap is bad for your SEO because every time Google crawls (or “spiders”) a page in your trap, it’s not crawling actual content on your site. Your new, high quality, super-valuable content might get indexed later, or not at all, because Google is spending its precious time in your trap. And the content it is crawling is deemed as duplicate and lessens how Google sees your site overall. This is why solving spider traps is important for SEO, and especially if you’re thinking about crawl budget optimization.

What do spider traps look like?

Our spider was one of a very particular type. We have a tool here on yoast.com called Yoast Suggest. It helps you mine Google Suggest for keyword ideas. When you enter a word into it, it returns the suggestions Google gives when you type that word into Google. The problem is: Google, when given a search box, will start throwing random words into it. And the results then have links for more results. And Google was thus trapping itself.

You might think that this is a nice story and spider traps never happen in real life. Unfortunately, they do. Faceted navigation on web shops often creates hundreds of thousands of combinations of URL parameters. Every new combination of facets (and thus URL parameters) is a new URL. So faceted navigation done poorly very often results in trapping the spider.

Another common cause of spider traps is when a site has date pages. If you can go back one day, to get a new date, and then back, and back, and back, you get a lot of pages. In my time as a consultant for the Guardian, we found Google spidering a date in the year 1670. It had gone back through our online archives, which went back almost 20 years at that point, to find nothing for more than 300 years…

How to recognize a spider trap

The easiest way to recognize a spider trap is by looking at your access logs. These logs contain a line for every visit to your site. Now as you can imagine, on larger sites your access logs get big very quickly. Here at Yoast, we use a so-called ELK-stack to monitor our website’s logs, but I’ve personally also used SEO log file analyzer by Screaming Frog to do this.

Logs from an ELK stack, showing a graph of indexing, URLs, timestamp, user agents and more

An example of logs in our ELK stack

What you’re looking to do is look at only Googlebot’s visits, and then start looking for patterns. In most cases, they’ll jump straight at you. It’s not uncommon for spider traps to take up 20-30% or even larger chunks of all the crawls. If you can’t find them immediately, start grouping crawls, looking for patterns within URLs. You can start from the beginning of the URL, provided you have clean URLs. If your URLs are slightly more cumbersome, you’ll have to create groups manually.

An ELK stack makes this very easy because you can search and segment quickly:

An example of filtering for the word "SEO" within our Googlebot hits in our ELK stack

An example of filtering for the word “SEO” within our Googlebot hits

How do you solve a spider trap?

Solving a spider trap can be a tricky thing. In our case, we don’t want /suggest/ to be indexed at all, so we just blocked it entirely with robots.txt. In other cases, you cannot do that as easily. For faceted navigation, you have to think long and hard about which facets you’d like Google to crawl and index.

In general, there are three types of solutions:

  1. Block (a section of) the URLs with robots.txt.
  2. Add rel=nofollow and noindex,follow on specific subsets of links and pages and use rel=canonical wisely.
  3. Fix the trap by no longer generating endless amounts of URLs.

In the case of the Guardian, we could simply prevent linking to dates where we had no articles. In the case of Yoast.com’s suggest tool, we simply blocked the URL in robots.txt. If you’re working with faceted search, the solution is, usually and unfortunately, not that simple. The best first step to take is to use a form of faceted search that doesn’t create crawlable URLs all the time. Checkboxes are better than straight links, in that regard.

In all, finding and closing a spider trap is one of the more rewarding things an SEO can do to a website. It’s good fun, but can certainly also be hard. If you have fun examples of spider traps, please do share them in the comments!

Read more: ‘Robots.txt: the ultimate guide’ »

The post Closing a spider trap: fix crawl inefficiencies appeared first on Yoast.

How does a new website start ranking? Does it just magically appear in Google after you’ve launched it? What things do you have to do to start ranking in Google and get traffic from the search engines? Here, I explain the first steps you’ll need to take right after the launch of your new website. Learn how to start working on the SEO for a new website!

Optimize your site for search & social media and keep it optimized with Yoast SEO Premium »

Yoast SEO for WordPress pluginBuy now » Info

First: you’ll need to have an external link

One of my closest friends launched a birthday party packages online store last week. It’s all in Dutch and it’s not WordPress (wrong choice of course, but I love her all the same :-)). After my friend launched her website, she celebrated and asked her friends, including me, what they thought of her new site. I love her site, but couldn’t find her in Google, not even if I googled the exact domain name. My first question to my friend was: do you have another site linking to your site? And her answer was ‘no’. I linked to her site from my personal site and after half a day, her website popped up in the search results. The very first step when working on SEO for a new website: getting at least one external link.

Why do you need an external link?

Google is a search engine that follows links. For Google to know about your site, it has to find it by following a link from another site. Google found my friend’s site because I put a link to that site on my personal site. When Google came around to crawl my site after I put the link there, it discovered the existence of my friend’s site. And indexed it. After indexing the site, it started to show the site in the search results.

Read more: ‘What does Google do?’ »

Next step: tweak your settings…

After that first link, your site probably will turn up in the search results. If it doesn’t turn up, it could be that the settings of your site are on noindex or is still blocked by robots.txt. If that’s the case, you’re telling Google not to index your site. Sometimes developers forget to turn either of these off after they finished working on your site.

Some pages are just not the best landing pages. You don’t want people landing on your check out page, for instance. And you don’t want this page to compete with other – useful – content or product pages to show up in the search results. Pages you don’t want to pop up in the search results ever (but there aren’t many of these) should have a noindex.

Yoast SEO can help you to set these pages to noindex. That means Google will not save this page in the index and it’ll not turn op in the search results.

Keep reading: ‘The ultimate guide to the robots meta tag’ »

Important third step: keyword research

My friend’s site now ranks on her domain name. That’s about it. She’s got some work to do to start ranking on other terms as well. When you want to improve the SEO for a new website you have carry out some proper keyword research. So go find out what your audience is searching for! What words do they use?

If you execute your keyword research properly, you’ll end up with a long list of search terms you want to be found for. Make sure to search for those terms in Google yourself. What results are there already? Who will be your online competitors for these search terms? What can you do to stand out from these results?

Read on: ‘Keyword research: the ultimate guide’ »

Learn how to write awesome and SEO friendly articles in our SEO Copywriting training »

SEO copywriting training$ 199 - Buy now » Info

And then: write, write, write

Then you start writing. Write about all those topics that are important to your audience. Use the words you came up with in your keyword research. You need to have content about the topics you want to rank for to start ranking in the search results.

Read more: ‘How to write a high quality and seo-friendly blog post’ »

But also: improve those snippets

Take a look at your results in the search engines once you start ranking (the so called snippets). Are those meta descriptions and the titles of the search results inviting? Are they tempting enough for your audience to click on them? Or should you write better ones?

Yoast SEO helps you to write great titles and meta descriptions. Use our snippet preview to create awesome snippets. That’ll really help in attracting traffic to your site.

Keep reading: ‘The snippet preview: what it means and how to use it?’ »

Think about site structure

Which pages and posts are most important? These should have other pages and posts linking to them. Make sure to link to the most important content. Google will follow your links, the post and pages that have the most internal links will be most likely to rank high in the search engines. Setting up such a structure, is basically telling Google which articles are important and which aren’t. Our brand new text link counter can be a great help to see if you’re linking often enough to your most important content.

Read on: ‘Internal linking for SEO: why and how’ »

Finally: do some link building

Google follows links. Links are important. So get the word out. Reach out to other site owners – preferably of topically related websites – and ask them to write about your new site. If Google follows multiple links to your website, it’ll crawl it more often. This is crucial when you do the SEO for a new website, and will eventually help in your rankings. Don’t go overboard in link building for SEO though, buying links is still a no-go:

Read more: ‘Link building: what not to do?’ »