What’s technical SEO? 8 technical aspects everyone should know

An SEO Basics post about technical SEO might seem like a contradiction in terms. Nevertheless, some basic knowledge about the more technical side of SEO can mean the difference between a high ranking site and a site that doesn’t rank at all. Technical SEO isn’t easy, but here we’ll explain – in layman’s language – which aspects you should (ask your developer to) pay attention to when working on the technical foundation of your website.

What is technical SEO?

Technical SEO refers to improving the technical aspects of a website in order to increase the ranking of its pages in the search engines. Making a website faster, easier to crawl and understandable for search engines are the pillars of technical optimization. Technical SEO is part of on-page SEO, which focuses on improving elements on your website to get higher rankings. It’s the opposite of off-page SEO, which is about generating exposure for a website through other channels.

Why should you optimize your site technically?

Google and other search engines want to present their users with the best possible results for their query. Therefore, Google’s robots crawl and evaluate web pages on a multitude of factors. Some factors are based on the user’s experience, like how fast a page loads. Other factors help search engine robots grasp what your pages are about. This is what, amongst others, structured data does. So, by improving technical aspects you help search engines crawl and understand your site. If you do this well, you might be rewarded with higher rankings or even rich results.

It also works the other way around: if you make serious technical mistakes on your site, they can cost you. You wouldn’t be the first to block search engines entirely from crawling your site by accidentally adding a trailing slash in the wrong place in your robots.txt file.

But it’s a misconception you should focus on technical details of a website just to please search engines. A website should work well – be fast, clear and easy to use – for your users in the first place. Fortunately, creating a strong technical foundation often coincides with a better experience for both users and search engines.

What are the characteristics of a technically optimized website?

A technically sound website is fast for users and easy to crawl for search engine robots. A proper technical setup helps search engines to understand what a site is about and it prevents confusion caused by, for instance, duplicate content. Moreover, it doesn’t send visitors, nor search engines, into dead-end streets by non-working links. Here, we’ll shortly go into some important characteristics of a technically optimized website.

1. It’s fast

Nowadays, web pages need to load fast. People are impatient and don’t want to wait for a page to open. In 2016 already, research showed that 53% of mobile website visitors will leave if a webpage doesn’t open within three seconds. So if your website is slow, people get frustrated and move on to another website, and you’ll miss out on all that traffic.

Google knows slow web pages offer a less than optimal experience. Therefore they prefer web pages that load faster. So, a slow web page also ends up further down the search results than its faster equivalent, resulting in even less traffic.

Wondering if your website is fast enough? Read how to easily test your site speed. Most tests will also give you pointers on what to improve. We’ll guide you through common site speed optimization tips here.

2. It’s crawlable for search engines

Search engines use robots to crawl or spider your website. The robots follow links to discover content on your site. A great internal linking structure will make sure that they’ll understand what the most important content on your site is.

But there are more ways to guide robots. You can, for instance, block them from crawling certain content if you don’t want them to go there. You can also let them crawl a page, but tell them not to show this page in the search results or not to follow the links on that page.

Robots.txt file

You can give robots directions on your site by using the robots.txt file. It’s a powerful tool, which should be handled carefully. As we mentioned in the beginning, a small mistake might prevent robots from crawling (important parts of) your site. Sometimes, people unintentionally block their site’s CSS and JS files in the robot.txt file. These files contain code that tells browsers what your site should look like and how it works. If those files are blocked, search engines can’t find out if your site works properly.

All in all, we recommend to really dive into robots.txt if you want to learn how it works. Or, perhaps even better, let a developer handle it for you!

The meta robots tag

The robots meta tag is a piece of code that you won’t see on the page as a visitor. It’s in the source code in the so-called head section of a page. Robots read this section when finding a page. In it, they’ll find information about what they’ll find on the page or what they need to do with it.

If you want search engine robots to crawl a page, but to keep it out of the search results for some reason, you can tell them with the robots meta tag. With the robots meta tag, you can also instruct them to crawl a page, but not to follow the links on the page. With Yoast SEO it’s easy to noindex or nofollow a post or page. Learn for which pages you’d want to do that.

Read more: https://yoast.com/what-is-crawlability/

3. It doesn’t have (many) dead links

We’ve discussed that slow websites are frustrating. What might be even more annoying for visitors than a slow page, is landing on a page that doesn’t exist at all. If a link leads to a non-existing page on your site, people will encounter a 404 error page. There goes your carefully crafted user experience!

What’s more, search engines don’t like to find these error pages either. And, they tend to find even more dead links than visitors encounter because they follow every link they bump into, even if it’s hidden.

Unfortunately, most sites have (at least) some dead links, because a website is a continuous work in progress: people make things and break things. Fortunately, there are tools that can help you retrieve dead links on your site. Read about those tools and how to solve 404 errors.

To prevent unnecessary dead links, you should always redirect the URL of a page when you delete it or move it. Ideally, you’d redirect it to a page that replaces the old page. With Yoast SEO Premium, you can easily make redirects yourself. No need for a developer!

Read more: https://yoast.com/what-is-a-redirect/

4. It doesn’t confuse search engines with duplicate content

If you have the same content on multiple pages of your site – or even on other sites – search engines might get confused. Because, if these pages show the same content, which one should they rank highest? As a result, they might rank all pages with the same content lower.

Unfortunately, you might have a duplicate content issue without even knowing it. Because of technical reasons, different URLs can show the same content. For a visitor, this doesn’t make any difference, but for a search engine it does; it’ll see the same content on a different URL.

Luckily, there’s a technical solution to this issue. With the so-called, canonical link element you can indicate what the original page – or the page you’d like to rank in the search engines – is. In Yoast SEO you can easily set a canonical URL for a page. And, to make it easy for you, Yoast SEO adds self-referencing canonical links to all your pages. This will help prevent duplicate content issues that you’d might not even be aware of.

5. It’s secure

A technically optimized website is a secure website. Making your website safe for users to guarantee their privacy is a basic requirement nowadays. There are many things you can do to make your (WordPress) website secure, and one of the most crucial things is implementing HTTPS.

HTTPS makes sure that no-one can intercept the data that’s sent over between the browser and the site. So, for instance, if people log in to your site, their credentials are safe. You’ll need a so-called SSL certificate to implement HTTPS on your site. Google acknowledges the importance of security and therefore made HTTPS a ranking signal: secure websites rank higher than unsafe equivalents.

You can easily check if your website is HTTPS in most browsers. On the left hand side of the search bar of your browser, you’ll see a lock if it’s safe. If you see the words “not secure” you (or your developer) have some work to do!

Read more: SEO Basics: What is HTTPS?

6. Plus: it has structured data

Structured data helps search engines understand your website, content or even your business better. With structured data you can tell search engines, what kind of product you sell or which recipes you have on your site. Plus, it will give you the opportunity to provide all kinds of details about those products or recipes.

Because there’s a fixed format (described on Schema.org) in which you should provide this information, search engines can easily find and understand it. It helps them to place your content in a bigger picture. Here, you can read a story about how it works and how Yoast SEO helps you with that.

Implementing structured data can bring you more than just a better understanding by search engines. It also makes your content eligible for rich results; those shiny results with stars or details that stand out in the search results.

7. Plus: It has an XML sitemap

Simply put, an XML sitemap is a list of all pages of your site. It serves as a roadmap for search engines on your site. With it, you’ll make sure search engines won’t miss any important content on your site. The XML sitemap is often categorized in posts, pages, tags or other custom post types and includes the number of images and the last modified date for every page.

Ideally, a website doesn’t need an XML sitemap. If it has an internal linking structure which connects all content nicely, robots won’t need it. However, not all sites have a great structure, and having an XML sitemap won’t do any harm. So we’d always advise having an XML site map on your site.

8. Plus: International websites use hreflang

If your site targets more than one country or countries where the same language is spoken, search engines need a little help to understand which countries or language you’re trying to reach. If you help them, they can show people the right website for their area in the search results.

Hreflang tags help you do just that. You can define for a page which country and language it is meant for. This also solves a possible duplicate content problem: even if your US and UK site show the same content, Google will know it’s written for a different region.

Optimizing international websites is quite a specialism. If you’d like to learn how to make your international sites rank, we’d advise taking a look at our Multilingual SEO training.

Want to learn more about this?

So this is technical SEO in a nutshell. It’s quite a lot already, while we’ve only scratched the surface here. There’s so much more to tell about the technical side of SEO! If you want to take a deep-dive into technical SEO, we’d advise our Technical SEO training or Structured data training. With these courses, you’ll learn how to create a solid technical foundation for your own website.

PS You’re the ambitious type? Get both training courses together and save $59!

Read more: https://yoast.com/wordpress-seo/

The post What’s technical SEO? 8 technical aspects everyone should know appeared first on Yoast.

How to keep your page out of the search results

If you want to keep your page out of the search results, there are a number of things you can do. Most of ’em are not hard and you can implement these without a ton of technical knowledge. If you can check a box, your content management system will probably have an option for that. Or allows nifty plugins like our own Yoast SEO to help you prevent the page from showing up in search results. In this post, I won’t give you difficult options to go about this. I will simply tell you what steps to take and things to consider.

Why do you want to keep your page out of the search results?

It sounds like a simple question, but it’s not, really. Why do you want to keep your page out of the search results in the first place? If you don’t want that page indexed, perhaps you shouldn’t publish it? There are obvious reasons to keep for instance your internal search result pages out of Google’s search result pages or a “Thank you”-page after an order or newsletter subscription that is of no use for other visitors. But when it comes to your actual, informative pages, there really should be a good reason to block these. Feel free to drop yours in the comments below this post.

If you don’t have a good reason, simply don’t write that page.

Private pages

If your website contains a section that is targeted at, for instance, an internal audience or a, so-called, extranet, you should consider offering that information password-protected. A section of your site that can only be reached after filling out login details won’t be indexed. Search engines simply have no way to log in and visit these pages.

How to keep your page out of the search results

If you are using WordPress, and are planning a section like this on your site, please read Chris Lema’s article about the membership plugins he compared.

Noindex your page

Like that aforementioned “Thank you”-page, there might be more pages like that which you want to block. And you might even have pages left after looking critically if some pages should be on your site anyway. The right way to keep a page out of the search results is to add a robots meta tag. We have written a lengthy article about that robots meta tag before, be sure to read that.

Adding it to your page is simple: you need to add that tag to the <head> section of your page, in the source code. You’ll find examples from the major search engines linked in the robots meta article as well.

Are you using WordPress, TYPO3 or Magento? Things are even easier. Please read on.

Noindex your page with Yoast SEO

The above mentioned content management systems have the option to install our Yoast SEO plugin/extension. In that plugin or extension, you have the option to noindex a page right from your editor.

In this example, I’ll use screenshots from the meta box in Yoast SEO for WordPress. You’ll find it in the post or page editor, below the copy you’ve written. In Magento and TYPO3 you can find it in similar locations.

How to keep your site out of the search results using Yoast SEO

Advanced tab Yoast SEO meta box

Click the Advanced tab in our Yoast SEO meta box. It’s the cog symbol on the left.
Use the selector at “Allow search engines to show this post/page in search results”, simply set that to “No” and you are done.

The second option in the screenshot is about following the links on that page. That allows you to keep your page out of the search results, but follow links on that page as these (internal) links matter for the other pages (again, read the robots meta article for more information). The third option: leave that as is, this is what you have set for the site-wide robots meta settings.

It’s really that simple: select the right value and your page will tell search engines to either keep the page in or out of the search results.

The last thing I want to mention here is: use with care. This robots meta setting will truly prevent a page from being indexed, unlike robots.txt suggestion to leave a page out of the search result pages. Google might ignore the latter, triggered by a lot of inbound links to the page. 

If you want to read up on how to keep your site from being indexed, please read Preventing your site from being indexed, the right way. Good luck optimizing!

The post How to keep your page out of the search results appeared first on Yoast.

SEO for a new website: the very first things to do

How does a new website start ranking? Does it just magically appear in Google after you’ve launched it? What things do you have to do to start ranking in Google and get traffic from the search engines? Here, I explain the first steps you’ll need to take right after the launch of your new website. Learn how to start working on the SEO for a new website!

Optimize your site for search & social media and keep it optimized with Yoast SEO Premium »

Yoast SEO for WordPress pluginBuy now » Info

First: you’ll need to have an external link

One of my closest friends launched a birthday party packages online store last week. It’s all in Dutch and it’s not WordPress (wrong choice of course, but I love her all the same :-)). After my friend launched her website, she celebrated and asked her friends, including me, what they thought of her new site. I love her site, but couldn’t find her in Google, not even if I googled the exact domain name. My first question to my friend was: do you have another site linking to your site? And her answer was ‘no’. I linked to her site from my personal site and after half a day, her website popped up in the search results. The very first step when working on SEO for a new website: getting at least one external link.

Why do you need an external link?

Google is a search engine that follows links. For Google to know about your site, it has to find it by following a link from another site. Google found my friend’s site because I put a link to that site on my personal site. When Google came around to crawl my site after I put the link there, it discovered the existence of my friend’s site. And indexed it. After indexing the site, it started to show the site in the search results.

Read more: ‘What does Google do?’ »

Next step: tweak your settings…

After that first link, your site probably will turn up in the search results. If it doesn’t turn up, it could be that the settings of your site are on noindex or is still blocked by robots.txt. If that’s the case, you’re telling Google not to index your site. Sometimes developers forget to turn either of these off after they finished working on your site.

Some pages are just not the best landing pages. You don’t want people landing on your check out page, for instance. And you don’t want this page to compete with other – useful – content or product pages to show up in the search results. Pages you don’t want to pop up in the search results ever (but there aren’t many of these) should have a noindex.

Yoast SEO can help you to set these pages to noindex. That means Google will not save this page in the index and it’ll not turn op in the search results.

Keep reading: ‘The ultimate guide to the robots meta tag’ »

Important third step: keyword research

My friend’s site now ranks on her domain name. That’s about it. She’s got some work to do to start ranking on other terms as well. When you want to improve the SEO for a new website you have carry out some proper keyword research. So go find out what your audience is searching for! What words do they use?

If you execute your keyword research properly, you’ll end up with a long list of search terms you want to be found for. Make sure to search for those terms in Google yourself. What results are there already? Who will be your online competitors for these search terms? What can you do to stand out from these results?

Read on: ‘Keyword research: the ultimate guide’ »

Learn how to write awesome and SEO friendly articles in our SEO Copywriting training »

SEO copywriting training$ 199 - Buy now » Info

And then: write, write, write

Then you start writing. Write about all those topics that are important to your audience. Use the words you came up with in your keyword research. You need to have content about the topics you want to rank for to start ranking in the search results.

Read more: ‘How to write a high quality and seo-friendly blog post’ »

But also: improve those snippets

Take a look at your results in the search engines once you start ranking (the so called snippets). Are those meta descriptions and the titles of the search results inviting? Are they tempting enough for your audience to click on them? Or should you write better ones?

Yoast SEO helps you to write great titles and meta descriptions. Use our snippet preview to create awesome snippets. That’ll really help in attracting traffic to your site.

Keep reading: ‘The snippet preview: what it means and how to use it?’ »

Think about site structure

Which pages and posts are most important? These should have other pages and posts linking to them. Make sure to link to the most important content. Google will follow your links, the post and pages that have the most internal links will be most likely to rank high in the search engines. Setting up such a structure, is basically telling Google which articles are important and which aren’t. Our brand new text link counter can be a great help to see if you’re linking often enough to your most important content.

Read on: ‘Internal linking for SEO: why and how’ »

Finally: do some link building

Google follows links. Links are important. So get the word out. Reach out to other site owners – preferably of topically related websites – and ask them to write about your new site. If Google follows multiple links to your website, it’ll crawl it more often. This is crucial when you do the SEO for a new website, and will eventually help in your rankings. Don’t go overboard in link building for SEO though, buying links is still a no-go:

Read more: ‘Link building: what not to do?’ »

Preventing your site from being indexed, the right way

We’ve said it in 2009, and we’ll say it again: it keeps amazing us that there are still people using just a robots.txt files to prevent indexing of their site in Google or Bing. As a result their site shows up in the search engines anyway. You know why it keeps amazing us? Because robots.txt doesn’t actually do the latter, even though it does prevents indexing of your site. Let me explain how this works in this post.

For more on robots.txt, please read robots.txt: the ultimate guide.

Become a technical SEO expert with our Technical SEO 1 training! »

Technical SEO 1 training$ 199 - Buy now » Info

There is a difference between being indexed and being listed in Google

Before we explain things any further, we need to go over some terms here first:

  • Indexed / Indexing
    The process of downloading a site or a page’s content to the server of the search engine, thereby adding it to its “index”.
  • Ranking / Listing / Showing
    Showing a site in the search result pages (aka SERPs).

So, while the most common process goes from Indexing to Listing, a site doesn’t have to be indexed to be listed. If a link points to a page, domain or wherever, Google follows that link. If the robots.txt on that domain prevents indexing of that page by a search engine, it’ll still show the URL in the results if it can gather from other variables that it might be worth looking at. In the old days, that could have been DMOZ or the Yahoo directory, but I can imagine Google using, for instance, your My Business details these days, or the old data from these projects. There are more sites that summarize your website, right.

Now if the explanation above doesn’t make sense, have a look at this 2009 Matt Cutts video explanation:

If you have reasons to prevent indexing of your website, adding that request to the specific page you want to block like Matt is talking about, is still the right way to go. But you’ll need to inform Google about that meta robots tag.  So, if you want to effectively hide pages from the search engines you need them to index those pages. Even though that might seem contradictory. There are two ways of doing that.

Prevent listing of your page by adding a meta robots tag

The first option to prevent listing of your page is by using robots meta tags. We’ve got an ultimate guide on robots meta tags that’s more extensive, but it basically comes down to adding this tag to your page:

<meta name="robots" content="noindex,nofollow>

The issue with a tag like that is that you have to add it to each and every page.

Or by adding a X-Robots-Tag HTTP header

To make the process of adding the meta robots tag to every single page of your site a bit easier, the search engines came up with the X-Robots-Tag HTTP header. This allows you to specify an HTTP header called X-Robots-Tag and set the value as you would the meta robots tags value. The cool thing about this is that you can do it for an entire site. If your site is running on Apache, and mod_headers is enabled (it usually is), you could add the following single line to your .htaccess file:

Header set X-Robots-Tag "noindex, nofollow"

And this would have the effect that that entire site can be indexed. But would never be shown in the search results.

So, get rid of that robots.txt file with Disallow: / in it. Use the X-Robots-Tag or that meta robots tag instead!

Read more: ‘The ultimate guide to the meta robots tag’ »

Ask Yoast: Block your site’s search results pages?

Every website should have a decent internal search functionality that shows the visitors search results that fit their search query. However, those search results pages on your site don’t need to be shown in Google’s search results. In fact, Google advises against this too; it’s not a great user experience to click on a Google search result, just to end up on a search result page of your site. Learn what’s best practice to prevent this from happening!

User experience is not the only reason to prevent Google from including these pages in their search results. Spam domains can also abuse your search results pages, which is what happened to Krunoslav from Croatia. He therefore emailed Ask Yoast:

“Some spam domains were linking to the search results pages on my WordPress site. So what could I do to block Google from accessing my site search results? Is there any code that I could put in robots.txt?”

Check out the video or read the answer below!

Become a technical SEO expert with our Technical SEO 1 training! »

Technical SEO 1 training$ 199 - Buy now » Info

Block your search results pages?

In the video, we explain what you could do to prevent Google from showing your site’s search results:

“Well, to be honest, I don’t think I would block them. What you could do, is try two different things:

1. One is do nothing and run our Yoast SEO plugin. We’ll automatically noindex all the search result pages on your site. But if that leads to weird rankings or to other stuff that is not really working for you, then you could do another thing:

2. The second way is to block them and put a disallow:/?=s* in your robots.txt. This basically means that you’re blocking Google from crawling your entire search query. I don’t know whether that’s the best solution though.

I would try noindex first and see if that does anything. If it doesn’t, then use the method of blocking your search results in your robots.txt.

Good luck!”

Ask Yoast

In the series Ask Yoast we answer SEO questions from followers. Need some advice about SEO? Let us help you out! Send your question to ask@yoast.com.

Read more: ‘Block your site’s search results pages’ »

Block your site’s search result pages

Why should you block your internal search result pages for Google? Well, how would you feel if you are in dire need for the answer to your search query and end up on the internal search pages of a certain website? That’s one crappy experience. Google thinks so too. And prefers you not to have these internal search pages indexed.

Optimize your site for search & social media and keep it optimized with Yoast SEO Premium »

Yoast SEO for WordPress pluginBuy now » Info

Google considers these search results pages to be of lower quality than your actual informational pages. That doesn’t mean these internal search pages are useless, but it makes sense to block these internal search pages.

Back in 2007

10 Years ago, Google, or more specifically Matt Cutts, told us that we should block these pages in our robots.txt. The reason for that:

Typically, web search results don’t add value to users, and since our core goal is to provide the best search results possible, we generally exclude search results from our web search index. (Not all URLs that contains things like “/results” or “/search” are search results, of course.)
– Matt Cutts (2007)

Nothing changed, really. Even after 10 years of SEO changes, this remains the same. The Google Webmaster Guidelines still state that you should “Use the robots.txt file on your web server to manage your crawling budget by preventing crawling of infinite spaces such as search result pages.” Furthermore, the guidelines state that webmasters should avoid techniques like automatically generated content, in this case, “Stitching or combining content from different web pages without adding sufficient value”.

However, blocking internal search pages in your robots.txt doesn’t seem the right solution. In 2007, it even made more sense to simply redirect the user to the first result of these internal search pages. These days, I’d rather use a slightly different solution.

Blocking internal search pages in 2017

I believe nowadays, using a noindex, follow meta robots tag is the way to go instead. It seems Google ‘listens’ to that meta robots tag and sometimes ignores the robots.txt. That happens, for instance, when a surplus of backlinks to a blocked page tells Google it is of interest to the public anyway. We’ve already mentioned this in our Ultimate guide to robots.txt.

The 2007 reason is still the same in 2017, by the way: linking to search pages from search pages delivers a poor experience for a visitor. For Google, on a mission to deliver the best result for your query, it makes a lot more sense to link directly to an article or another informative page.

Yoast SEO will block internal search pages for you

If you’re on WordPress and using our plugin, you’re fine. We’ve got you covered:

Block internal search pages

That’s located at SEO › Titles & Metas › Archives. Most other content management systems allow for templates for your site’s search results as well, so adding a simple line of code to that template will suffice:
<meta name="robots" content="noindex,follow"/>

Become a technical SEO expert with our Technical SEO 1 training! »

Technical SEO 1 training$ 199 - Buy now » Info

Meta robots AND robots.txt?

If you try to block internal search pages by adding that meta robots tag and disallowing these in your robots.txt, please think again. Just the meta robots will do. Otherwise, you’ll risk losing the link value of these pages (hence the follow in the meta tag). If Google listens to your robots.txt, they will ignore the meta robots tag, right? And that’s not what you want. So just use the meta robots tag!

Back to you

Did you block your internal search results? And how did you do that? Go check for yourself! Any further insights or experiences are appreciated; just drop us a line in the comments.

Read more: ‘Robots.txt: the ultimate guide’ »

SEO basics: What is crawlability?

Ranking in the search engines requires a website with flawless technical SEO. Luckily, the Yoast SEO plugin takes care (of almost) everything on your WordPress site. Still, if you really want to get most out of your website and keep on outranking the competition, some basic knowledge of technical SEO is a must. In this post, I’ll explain one of the most important concepts of technical SEO: crawlability.

What is the crawler again?

A search engine like Google consists of a crawler, an index and an algorithm. The crawler follows the links. When Google’s crawler finds your website, it’ll read it and its content is saved in the index.

A crawler follows the links on the web. A crawler is also called a robot, a bot, or a spider. It goes around the internet 24/7. Once it comes to a website, it saves the HTML version of a page in a gigantic database, called the index. This index is updated every time the crawler comes around your website and finds a new or revised version of it. Depending on how important Google deems your site and the amount of changes you make on your website, the crawler comes around more or less often.

Read more: ‘SEO basics: what does Google do’ »

And what is crawlability?

Crawlability has to do with the possibilities Google has to crawl your website. Crawlers can be blocked from your site. There are a few ways to block a crawler from your website. If your website or a page on your website is blocked, you’re saying to Google’s crawler: “do not come here”. Your site or the respective page won’t turn up in the search results in most of these cases.
There are a few things that could prevent Google from crawling (or indexing) your website:

  • If your robots.txt file blocks the crawler, Google will not come to your website or specific web page.
  • Before crawling your website, the crawler will take a look at the HTTP header of your page. This HTTP header contains a status code. If this status code says that a page doesn’t exist, Google won’t crawl your website. In the module about HTTP headers of our (soon to be launched!) Technical SEO training we’ll tell you all about that.
  • If the robots meta tag on a specific page blocks the search engine from indexing that page, Google will crawl that page, but won’t add it to its index.

This flow chart might help you understand the process bots follow when attempting to index a page:

Want to learn all about crawlability?

Although crawlability is just the very basics of technical SEO (it has to do with all the things that enable Google to index your site), for most people it’s already pretty advanced stuff. Nevertheless, if you’re blocking – perhaps even without knowing! – crawlers from your site, you’ll never rank high in Google. So, if you’re serious about SEO, this should matter to you.

If you really want to understand all the technical aspects concerning crawlability, you should definitely check out our Technical SEO 1 training, which will be released this week. In this SEO course, we’ll teach you how to detect technical SEO issues and how to solve them (with our Yoast SEO plugin).

Keep reading: ‘How to get Google to crawl your site faster’ »

 

Ask Yoast: should I redirect my affiliate links?

There are several reasons for cloaking or redirecting affiliate links. For instance, it’s easier to work with affiliate links when you redirect them, plus you can make them look prettier. But do you know how to cloak affiliate links? We explained how the process works in one of our previous posts. This Ask Yoast is about the method of cloaking affiliate links we gave you in that post. Is it still a good idea to redirect affiliate links via the script we described?

Elias Nilson emailed us, saying that he read our article about cloaking affiliate links and he’s wondering if the solution is still up-to-date.

“Is it still a good idea to redirect affiliate links via the script you describe in your post?”

Check out the video or read the answer below!

Get the most out of Yoast SEO, learn every feature and best practice in our Yoast SEO for WordPress training! »

Yoast SEO for WordPress training€ 99 - Buy now » Info

Redirect affiliate links

Read this transcript to figure out if it is still a valid option to redirect affiliate links via the described script. Want to see the script directly? Read this post: ‘How to cloak affiliate links’:

Honestly, yes. Recently we updated the post about cloaking affiliate links, so the post and therefore the script is still up to date. Link cloaking, which sounds negative, because we use the word cloaking, is basically hiding from Google that you’re an affiliate. And if you’re an affiliate, that’s still the thing that you want to do, because usually Google ranks original content that is not by affiliates better than it does affiliates.

So, yes, I’d still recommend that method, the link will be below this post, so you can see the original post that we are referencing to. It’s a very simple method to cloak your affiliate links and I think it works in probably the best way that I know.

So, keep going. Good luck.

Ask Yoast

In the series Ask Yoast we answer SEO questions from followers. Need help with SEO? Let us help you out! Send your question to ask@yoast.com.

Read more: ‘How to cloak your affiliate links’ »

Ask Yoast: nofollow layered navigation links?

If you have a big eCommerce site with lots of products, layered navigation can help your users to narrow down their search results. Layered or faceted navigation is an advanced way of filtering by providing groups of filters for (many) product attributes. In this filtering process, you might create a lot of URLs though, because the user will be able to filter and thereby group items in many ways, and those groups will all be available on separate URLs. So what should you do with all these URLs? Do you want Google to crawl them all?

In this Ask Yoast, we’ll answer a question from Daniel Jacobsen:

“Should I nofollow layered navigation links? And if so, why? Are there any disadvantages of this?”

Check out the video or read the answer below!

Want to outrank your competitor and get more sales? Read our Shop SEO eBook! »

Shop SEO$ 25 - Buy now » Info

Layered navigation links

Read this transcript to learn how to deal with layered or faceted navigation links:

“The question is: “Why would you want to do that?” If you have too many URLs, so if you have a layered or a faceted navigation that has far too many options -creating billions of different types of URLs for Google to crawl – then probably yes. At the same time you need to ask yourself: “Why does my navigation work that way?” And, “Can we make it any different?” But in a lot of eCommerce systems that’s very hard. So in those cases adding a nofollow to those links, does actually help to prevent Google from indexing each and every one of the versions of your site.

I’ve worked on a couple of sites with faceted navigation that had over a billion variations in URLs, even though they only had like 10,000 products. If that’s the sort of problem you have, then yes, you need to nofollow them and maybe you even need to use your robots.txt file to exclude some of those variants. So specific stuff that you don’t want indexed, for instance, if you don’t want color indexed, you could do a robots.txt line that says: “Disallow for everything that has color in the URL”. At that point you strip down what Google crawls and what it thinks is important. The problem with that is, that if Google has links pointing at that version from somewhere else, those links don’t count for your site’s ranking either.

So it’s a bit of a quid pro quo, where you have to think about what is the best thing to do. It’s a tough decision. I really would suggest getting an experienced technical SEO to look at your site if it really is a problem, because it’s not a simple cut-and-paste solution that works the same for every site.

Good luck!”

Ask Yoast

In the series Ask Yoast we answer SEO questions from followers! Need help with SEO? Let us help you out! Send your question to ask@yoast.com.

Read more: ‘Internal search for online shops: an essential asset’ »

Playing with the X-Robots-Tag HTTP header

Traditionally, you will use a robots.txt file on your server to manage what pages, folders, subdomains or other content search engines will be allowed to crawl. But did you know that there’s also such a thing as the X-Robots-Tag HTTP header? In this post we’ll discuss what the possibilities are and how this might be a better option for your blog.

Quick recap: robots.txt

Before we continue, let’s take a look at what a robots.txt file does. In a nutshell, what it does is tell search engines to not crawl a particular page, file or directory of your website.

Using this, helps both you and search engines such as Google. By not providing access to certain, unimportant areas of your website, you can save on your crawl budget and reduce load on your server.

Please note that using the robots.txt file to hide your entire website for search engines is definitely not recommended.

Say hello to X-Robots-Tag

Back in 2007, Google announced that they added support for the X-Robots-Tag directive. What this meant was that you not only could restrict access to search engines via a robots.txt file, you could also programmatically set various robot.txt-related directives in the headers of a HTTP response. Now, you might be thinking “But can’t I just use the robots meta tag instead?”. The answer is yes. And no. If you plan on programmatically blocking a particular page that is written in HTML, then using the meta tag should suffice. But if you plan on blocking crawling of, lets say an image, then you could use the HTTP response approach to do this in code. Obviously you can always use the latter method if you don’t feel like adding additional HTML to your website.

X-Robots-Tag directives

As Sebastian explained in 2008, there are two different kinds of directives: crawler directives and indexer directives. I’ll briefly explain the difference below.

Get the most out of Yoast SEO, learn every feature and best practice in our Yoast SEO for WordPress training! »

Yoast SEO for WordPress training$ 99 - Buy now » Info

Crawler directives

The robots.txt file only contains the so called ‘crawler directives’, which tells search engines where they are or aren’t allowed to go. By using the

Allow

directive, you can specify where search engines are allowed to crawl.

Disallow

does the exact opposite. Additionally, you can use the

Sitemap

directive to help search engines out and crawl your website even faster.

Note that it’s also possible to fine tune the directives for a specific search engine by using the

User-agent

directive in combination with the other directives.

As Sebastian points out and explains thoroughly in another post, pages can still show up in search results in case there are enough links pointing to it, despite explicitly defining these with the

Disallow

directive. This basically means that if you want to really hide something from the search engines, and thus from people using search, robots.txt won’t suffice.

Indexer directives

Indexer directives are directives that are set on a per page and/or per element basis. Up until July 2007, there were two directives: the microformat rel=”nofollow”, which means that that link should not pass authority / PageRank, and the Meta Robots tag.

With the Meta Robots tag, you can really prevent search engines from showing pages you want to keep out of the search results. The same result can be achieved with the X-Robots-Tag HTTP header. As described earlier, the X-Robots-Tag gives you more flexibility by also allowing you to control how specific file(types) are indexed.

Example uses of the X-Robots-Tag

Theory is nice and all, but let’s see how you could use the X-Robots-Tag in the wild!

If you want to prevent search engines from showing files you’ve generated with PHP, you could add the following in the head of the header.php file:

header(&quot;X-Robots-Tag: noindex&quot;, true);

This would not prevent search engines from following the links on those pages. If you want to do that, then alter the previous example as follows:

header(&quot;X-Robots-Tag: noindex, nofollow&quot;, true);

Now, although using this method in PHP has its benefits, you’ll most likely end up wanting to block specific filetypes altogether. The more practical approach would be to add the X-Robots-Tag to your Apache server configuration or a .htaccess file.

Imagine you run a website which also has some .doc files, but you don’t want search engines to index that filetype for a particular reason. On Apache servers, you should add the following line to the configuration / a .htaccess file:

&lt;FilesMatch &quot;.doc$&quot;&gt;
Header set X-Robots-Tag &quot;index, noarchive, nosnippet&quot;
&lt;/FilesMatch&gt;

Or, if you’d want to do this for both .doc and .pdf files:

&lt;FilesMatch &quot;.(doc|pdf)$&quot;&gt;
Header set X-Robots-Tag &quot;index, noarchive, nosnippet&quot;
&lt;/FilesMatch&gt;

If you’re running Nginx instead of Apache, you can get a similar result by adding the following to the server configuration:

location ~* .(doc|pdf)$ {
	add_header  X-Robots-Tag &quot;index, noarchive, nosnippet&quot;;
}

There are cases in which the robots.txt file itself might show up in search results. By using an alteration of the previous method, you can prevent this from happening to your website:

&lt;FilesMatch &quot;robots.txt&quot;&gt;
Header set X-Robots-Tag &quot;noindex&quot;
&lt;/FilesMatch&gt;

And in Nginx:

location = robots.txt {
	add_header  X-Robots-Tag &quot;noindex&quot;;
}

Conclusion

As you can see based on the examples above, the X-Robots-Tag HTTP header is a very powerful tool. Use it wisely and with caution, as you won’t be the first to block your entire site by accident. Nevertheless, it’s a great addition to your toolset if you know how to use it.

Read more: ‘Meta robots tag: the ultimate guide’ »