Duplicate content: Causes and solutions

Search engines like Google have a problem – it’s called ‘duplicate content’. Duplicate content means that similar content appears at multiple locations (URLs) on the web, and as a result search engines don’t know which URL to show in the search results. This can hurt the ranking of a webpage, and the problem only gets worse when people start linking to the different versions of the same content. This article will help you to understand the various causes of duplicate content, and to find the solution to each of them.

What is duplicate content?

Duplicate content is content which is available on multiple URLs on the web. Because more than one URL shows the same content, search engines don’t know which URL to list higher in the search results. Therefore they might rank both URLs lower and give preference to other webpages.

In this article, we’ll mostly focus on the technical causes of duplicate content and their solutions. If you’d like to get a broader perspective on duplicate content and learn how it relates to copied or scraped content or even keyword cannibalization, we’d advise you to read this post: What is duplicate content.

Let’s illustrate this with an example

Duplicate content can be likened to being at a crossroads where road signs point in two different directions for the same destination: Which road should you take? To make matters worse, the final destination is different too, but only ever so slightly. As a reader, you don’t mind because you get the content you came for, but a search engine has to pick which page to show in the search results because, of course, it doesn’t want to show the same content twice.

Let’s say your article about ‘keyword x’ appears at http://www.example.com/keyword-x/ and the same content also appears at http://www.example.com/article-category/keyword-x/. This situation is not fictitious: it happens in lots of modern Content Management Systems. Then let’s say your article has been picked up by several bloggers and some of them link to the first URL, while others link to the second. This is when the search engine’s problem shows its true nature: it’s your problem. The duplicate content is your problem because those links both promote different URLs. If they were all linking to the same URL, your chances of ranking for ‘keyword x’ would be higher.

If you don’t know whether your rankings are suffering from duplicate content issues, these duplicate content discovery tools will help you find out!

Causes of duplicate content

There are dozens of reasons for duplicate content. Most of them are technical: it’s not very often that a human decides to put the same content in two different places without making clear which is the original – it feels unnatural to most of us. There are many technical reasons though and it mostly happens because developers don’t think like a browser or even a user, let alone a search engine spider – they think like a programmer. Take that article we mentioned earlier, that appears on http://www.example.com/keyword-x/ and http://www.example.com/article-category/keyword-x/. If you ask the developer, they will say it only exists once.

Misunderstanding the concept of a URL

No, that developer hasn’t gone mad, they are just speaking a different language. A CMS will probably power the website, and in that database there’s only one article, but the website’s software just allows for that same article in the database to be retrieved through several URLs. That’s because, in the eyes of the developer, the unique identifier for that article is the ID that article has in the database, not the URL. But for the search engine, the URL is the unique identifier for a piece of content. If you explain that to a developer, they will begin to get the problem. And after reading this article, you’ll even be able to provide them with a solution right away.

Session IDs

You often want to keep track of your visitors and allow them, for instance, to store items they want to buy in a shopping cart. In order to do that, you have to give them a ‘session.’ A session is a brief history of what the visitor did on your site and can contain things like the items in their shopping cart. To maintain that session as a visitor clicks from one page to another, the unique identifier for that session – called the Session ID – needs to be stored somewhere. The most common solution is to do that with cookies. However, search engines don’t usually store cookies.

At that point, some systems fall back to using Session IDs in the URL. This means that every internal link on the website gets that Session ID added to its URL, and because that Session ID is unique to that session, it creates a new URL, and therefore duplicate content.

URL parameters used for tracking and sorting

Another cause of duplicate content is using URL parameters that do not change the content of a page, for instance in tracking links. You see, to a search engine, http://www.example.com/keyword-x/ and http://www.example.com/keyword-x/?source=rss are not the same URL. The latter might allow you to track what source people came from, but it might also make it harder for you to rank well – very much an unwanted side effect!

This doesn’t just go for tracking parameters, of course. It goes for every parameter you can add to a URL that doesn’t change the vital piece of content, whether that parameter is for ‘changing the sorting on a set of products’ or for ‘showing another sidebar’: all of them cause duplicate content.

Scrapers and content syndication

Most of the reasons for duplicate content are either the ‘fault’ of you or your website. Sometimes, however, other websites use your content, with or without your consent. They don’t always link to your original article, and therefore the search engine doesn’t ‘get’ it and has to deal with yet another version of the same article. The more popular your site becomes, the more scrapers you’ll get, making this problem bigger and bigger.

Order of parameters

Another common cause is that a CMS doesn’t use nice clean URLs, but rather URLs like /?id=1&cat=2, where ID refers to the article and cat refers to the category. The URL /?cat=2&id=1 will render the same results in most website systems, but they’re completely different for a search engine.

Comment pagination

 In my beloved WordPress, but also in some other systems, there is an option to paginate your comments. This leads to the content being duplicated across the article URL, and the article URL + /comment-page-1/, /comment-page-2/ etc.

Printer-friendly pages

If your content management system creates printer-friendly pages and you link to those from your article pages, Google will usually find them, unless you specifically block them. Now, ask yourself: Which version do you want Google to show? The one with your ads and peripheral content, or the one that only shows your article?

WWW vs. non-WWW

This is one of the oldest in the book, but sometimes search engines still get it wrong: WWW vs. non-WWW duplicate content, when both versions of your site are accessible. Another, less common situation but one I’ve seen as well is HTTP vs. HTTPS duplicate content, where the same content is served out over both.

Conceptual solution: a ‘canonical’ URL

As we’ve already seen, the fact that several URLs lead to the same content is a problem, but it can be solved. One person who works at a publication will normally be able to tell you quite easily what the ‘correct’ URL for a certain article should be, but sometimes when you ask three people within the same company, you’ll get three different answers…

That’s a problem that needs addressing because, in the end, there can be only one (URL). That ‘correct’ URL for a piece of content is referred to as the Canonical URL by the search engines.

canonical_graphic_1024x630

Ironic side note

Canonical is a term stemming from the Roman Catholic tradition, where a list of sacred books was created and accepted as genuine. They were known as the canonical Gospels of the New Testament. The irony is it took the Roman Catholic church about 300 years and numerous fights to come up with that canonical list, and they eventually chose four versions of the same story

Identifying duplicate contents issues

You might not know whether you have a duplicate content issue on your site or with your content. Using Google is one of the easiest ways to spot duplicate content.

There are several search operators that are very helpful in cases like these. If you’d want to find all the URLs on your site that contain your keyword X article, you’d type the following search phrase into Google:

site:example.com intitle:"Keyword X"

Google will then show you all pages on example.com that contain that keyword. The more specific you make that intitle part of the query, the easier it is to weed out duplicate content. You can use the same method to identify duplicate content across the web. Let’s say the full title of your article was ‘Keyword X – why it is awesome’, you’d search for:

intitle:"Keyword X - why it is awesome"

And Google would give you all sites that match that title. Sometimes it’s worth even searching for one or two complete sentences from your article, as some scrapers might change the title. In some cases, when you do a search like that, Google might show a notice like this on the last page of results:

This is a sign that Google is already ‘de-duping’ the results. It’s still not good, so it’s worth clicking the link and looking at all the other results to see whether you can fix some of them.

Read more: DIY: duplicate content check »

Practical solutions for duplicate content

Once you’ve decided which URL is the canonical URL for your piece of content, you have to start a process of canonicalization (yeah I know, try saying that three times out loud fast). This means we have to tell search engines about the canonical version of a page and let them find it ASAP. There are four methods of solving the problem, in order of preference:

  1. Not creating duplicate content
  2. Redirecting duplicate content to the canonical URL
  3. Adding a canonical link element to the duplicate page
  4. Adding an HTML link from the duplicate page to the canonical page

Avoiding duplicate content

Some of the above causes for duplicate content have very simple fixes to them:

  • Are there Session ID’s in your URLs?
    These can often just be disabled in your system’s settings.
  • Have you got duplicate printer friendly pages?
    These are completely unnecessary: you should just use a print style sheet.
  • Are you using comment pagination in WordPress?
    You should just disable this feature (under settings » discussion) on 99% of sites.
  • Are your parameters in a different order?
    Tell your programmer to build a script to always put parameters in the same order (this is often referred to as a URL factory).
  • Are there tracking links issues?
    In most cases, you can use hash tag based campaign tracking instead of parameter-based campaign tracking.
  • Have you got WWW vs. non-WWW issues?
    Pick one and stick with it by redirecting the one to the other. You can also set a preference in Google Webmaster Tools, but you’ll have to claim both versions of the domain name.

If your problem isn’t that easily fixed, it might still be worth putting in the effort. The goal should be to prevent duplicate content from appearing altogether, because it’s by far the best solution to the problem.

301 Redirecting duplicate content

In some cases, it’s impossible to entirely prevent the system you’re using from creating wrong URLs for content, but sometimes it is possible to redirect them. If this isn’t logical to you (which I can understand), do keep it in mind while talking to your developers. If you do get rid of some of the duplicate content issues, make sure that you redirect all the old duplicate content URLs to the proper canonical URLs.

 Sometimes you don’t want to or can’t get rid of a duplicate version of an article, even when you know that it’s the wrong URL. To solve this particular issue, the search engines have introduced the canonical link element. It’s placed in the <head> section of your site, and it looks like this:

<link rel="canonical" href="http://example.com/wordpress/seo-plugin/" />

In the href section of the canonical link, you place the correct canonical URL for your article. When a search engine that supports canonical finds this link element, it performs a soft 301 redirect, transferring most of the link value gathered by that page to your canonical page.

This process is a bit slower than the 301 redirect though, so if you can just do a 301 redirect that would be preferable, as mentioned by Google’s John Mueller.

Keep reading: rel=canonical • What it is and how (not) to use it »

Linking back to the original content

If you can’t do any of the above, possibly because you don’t control the <head> section of the site your content appears on, adding a link back to the original article on top of or below the article is always a good idea. You might want to do this in your RSS feed by adding a link back to the article in it. Some scrapers will filter that link out, but others might leave it in. If Google encounters several links pointing to your original article, it will figure out soon enough that that’s the actual canonical version.

Conclusion: duplicate content is fixable, and should be fixed

Duplicate content happens everywhere. I have yet to encounter a site of more than 1,000 pages that hasn’t got at least a tiny duplicate content problem. It’s something you need to constantly keep an eye on, but it is fixable, and the rewards can be plentiful. Your quality content could soar in the rankings, just by getting rid of duplicate content from your site!

Read on: Rel=canonical: The ultimate guide »

The post Duplicate content: Causes and solutions appeared first on Yoast.

Ask Yoast: Meta descriptions and excerpts

When you’re running a large and busy website, it’s practical and time-saving if you can reuse some of your material. Both meta descriptions and excerpts use a brief passage to summarize the content of a web page. So, it could be handy to use the same text for both. But how do you do that? In this video, Joost explains the easiest way to reuse your text for both meta descriptions and excerpts, and whether Google approves of this reuse.

Renee Lodens sent us an email with the following question:

“Is there a way to bulk copy the Yoast SEO meta descriptions to the excerpt field? Also, is this considered duplicate content?”

Watch the video or read the transcript further down the page! 

Optimize your site for search & social media and keep it optimized with Yoast SEO Premium »

Yoast SEO for WordPress pluginBuy now » Info

Meta descriptions and excerpts

So, what to do if you want to save time and use the same passages for meta descriptions and excerpts?

“Well, let’s start with the first thing. It’s probably easier to do it the other way around. If you put the description that you want in the excerpt field, and then in the back end, in the Yoast SEO Titles & Meta section, you can use the excerpt short code for meta descriptions. We will automatically put your excerpt in your meta description. That’s easier. You can do it the other way around too, but then you’d have to code a bit.

Is this considered duplicate content? No, it’s not. Because they are different things used for different purposes. Your meta description will only show up in the metadata, which will not be shown on the page. And Google considers these two separate things.

So this might actually work well for you if you write really good short excerpts that fit well into your meta description.

Good luck!”

Ask Yoast

In the series Ask Yoast we answer SEO questions from followers. Need some advice about SEO? Let us help you out! Send your question to ask@yoast.com.

Read on: ‘How to create the right meta descriptions’ »

Metadata and SEO part 2: link rel metadata

In the first post of our metadata series, I discussed the meta tags in the <head> of your site. But there’s more metadata in the <head> that can influence the SEO of your site. In this second post, we’ll dive into link rel metadata. You can use link rel metadata to instruct browsers and Google, for example to point them to the AMP version of a page or to prevent duplicate content issues. The link rel tags come in a lot of flavors. I’d like to address the most important ones here.

Use rel=canonical to prevent duplicate content

Every website should use rel=canonical to prevent duplicate content and point Google to the original source of that content. rel=canonical is one of those metadata elements that has an immediate influence on your site’s SEO. If done wrong, it might ruin it. An example: we have seen sites that had the canonical of all pages pointed to the homepage. That is basically telling Google that for all the content on your website, you just want the homepage to rank.
If done right, you could give props to another website for writing an article that you republished.

If you want to read up on rel=canonical, please read this article: Rel=canonical: the ultimate guide.

Add rel=amphtml to point search engines to your AMP pages

In order to link a page to its AMP variant, use the rel=amphtml. AMP is a variation of your desktop page, designed for faster loading and better user experience on a mobile device. It was introduced by Google, and to be honest, we like it. It seriously improves the mobile user experience.

So be sure to set up an AMP site and link the AMP pages in your head section. If you have a WordPress site, adding AMP pages is a piece of cake. You can simply install the AMP plugin by Automattic and you’ll have AMP pages and the rel=amphtml links right after that.

If you’d like to read up about AMP, be sure to check our AMP archive.

dns-prefetch for faster loading

By telling the browser in advance about a number of locations where it can find certain files it needs to render a page, you simply make it easier and faster for the browser to load your page, or (elements from) a page you link to. If implemented right, DNS prefetching will make sure a browser knows the IP address of the site linked and is ready to show the requested page.

An example:
<link rel="dns-prefetch" href="https://cdn.yoast.com/">

Please note that if the website you are prefetching has performance issues, the speed gains might be little, or none. This could even depend on the time of day. Monitor your prefetch URLs from time to time.

Become a technical SEO expert with our Technical SEO 1 training! »

Technical SEO 1 training$ 199 - Buy now » Info

What about rel=author?

Rel=author has no effect whatsoever at the moment. It hasn’t had any effect we know of for quite a while actually, as Joost already mentioned this in October of 2015. You never know what use Google might come up with for it, but for now, we’re not pushing it in our plugin. It was used to point to the author of the post, giving the article more or less authority depending on how well-known an author was. At the time, this was reflected in the search results pages as well (it’s not anymore). No need to include it in your template anymore.

Other rel elements include your stylesheets (make sure Google can use these) and you can set icons for a variety of devices. SEO impact of these is rather low or simply not existing.

Is there more?

So we discussed meta tags and link rel metadata in the <head> . Is there even more metadata that affects SEO? Yes there is! In our next metadata post, I’ll explore social metadata, like OpenGraph and Twitter Cards. In addition to that, we’ll go intohreflang, an essential asset for site owners that serve more than one country or language with their website. Stand by for more!

Read more: ‘Metadata and SEO part 1: the head section’ »

Ask Yoast: duplicate content issues on my shop?

If you own an eCommerce site, you might wonder how to optimize your category pages and your product pages. Could you have the same content on your category page and your product pages? If you have the same content on multiple pages of your website, would Google know what to rank first? Or would it cause duplicate content issues? This Ask Yoast is about the optimization of category and product pages of your online shop. Hear what I have to say about this!

Jeroen Custers from Maastricht, the Netherlands, has emailed us, asking:

“My product pages and category pages have 99% the same description, except for the color. Although the category page gets all the links, one product page ranks. Does Google see my pages as duplicate content?”

Check out the video or read the answer below!

Want to outrank your competitor and get more sales? Read our Shop SEO eBook! »

Shop SEO$ 25 - Buy now » Info

Duplicate content on your shop?

Check out the video or read the answer below!

The answer is simple: Yes. So what should you do is optimize your category page for the product. And only optimize the sub pages, the product pages for the individual product colors, and then make sure that the category page gets all the links for that product. So you should improve your internal linking structure so that when you mention the product, you link to the category page and not to the specific color page underneath that.

If you improve that category structure in the right way, then that should fix it. If it doesn’t, then noindex the product pages and “canonical” all of them back to the category, so that Google really knows that the category is the main thing. That’s what you want people to land on. Most people want to see that you have more than one option.

If they search for the specific product and you do not noindex it, so if you choose for the first option, then Google should send them to the right page. So try that first. If that doesn’t work, noindex as product page and then “canonicalize” them back to the category.

Good luck!

Ask Yoast

In the series Ask Yoast we answer SEO questions from followers. Need help with SEO? Let us help you out! Send your question to ask@yoast.com.

Read more: ‘Crafting the perfect shop category page’ »

Ask Yoast: Duplicate content on LinkedIn Pulse

Social media is not only an important part of your marketing strategy, but it’s important for your SEO strategy as well. LinkedIn publishing platform Pulse is one of the many content publishing platforms out there. You can read stories and news from other publishers, and you can publish your own content. But could you publish the same blog post on Pulse, as the one you post on your own site? Or should you post an excerpt and link back to your site? Does Google consider content on Pulse as duplicate content? Joost will answer this question in this Ask Yoast.

Guy Andefors from Stockholm in Sweden emailed us the following question:

“Can we safely republish an entire blog post on Pulse or should we post an excerpt and link back to our site?”

Check out the video or read the answer below!

Optimize your site for search & social media and keep it optimized with Yoast SEO Premium »

Yoast SEO for WordPress pluginBuy now » Info

LinkedIn Pulse

Read the transcript of the video here:

To be honest, if you post your own blog post first, make sure that it’s indexed in Google and then post it on Pulse with a link underneath the posting: “This post originally appeared on…” linking back to your blog post. If you do this, you should be okay.

It’s not rel=canonical, but Google is smart enough to understand most of that and work its way through, so you should be okay. It might still rank the LinkedIn one higher, if your own domain is not that strong, because it might think that it actually gets a better interaction on LinkedIn. If that’s the case you should think about maybe using excerpts. Just try it a bit, see how it works for you. It really depends on how strong your own domain is and on what you want to achieve. If it works on LinkedIn, maybe leave it on LinkedIn and then make people click from LinkedIn to your site. That’s just as good for you, if it works. 

Good luck!”

Ask Yoast

In the series Ask Yoast we do our best to answer your SEO question! Need some help with your site’s SEO? Send your question to ask@yoast.com. You might get a personal answer on video!

Read more: ‘DIY: Duplicate content check’ »

Ask Yoast: importance of using excerpts

Want to know how to create attractive archive pages? And how to increase click-through rates to your posts or pages? Make sure to write short and appealing excerpts for every post or page. The excerpt should be a teaser to get people to read your post. In this Ask Yoast, Joost explains the importance of using excerpts.

This Ask Yoast is all about the following question:

“Why is it important to use the excerpt? Doesn’t Google consider this to be duplicate content?”

Check out the video or read the answer below!

Optimize your site for search & social media and keep it optimized with Yoast SEO Premium »

Yoast SEO for WordPress pluginBuy now » Info

The importance of using an excerpt

“The excerpt is that bit of the post, that will be shown on archive pages. So, if you write a specific excerpt for a post, then that excerpt is what shows on archive pages.”

excerpt input field wordpress

The excerpt input field in WordPress

 “Sometimes it’s also shown on your front page, if the front page of your site features your blog posts. The excerpt can actually be a very good teaser to get people to read your article.”

excerpt on homepage

Blog post excerpt as shown on our homepage

“The excerpt is not considered to be duplicate content. In fact having excerpts for every post prevents having duplicate content, when you have a long archive page which shows more bits of the post. So you should use the excerpt if you can. It’s a bit more work, because that means writing an excerpt for every post. But you should if you could. Good luck!”

Ask Yoast

In the series Ask Yoast we help you with your SEO question! Not sure what’s best for your site’s SEO? We’ll come to the rescue! Just send your question to ask@yoast.com.

Read more: ‘How to create the right metadescription’ »

DIY: Duplicate content check

Duplicate content is much-dreaded in the world of SEO. If your content lives on multiple pages on your site, or other websites, Google might get confused and won’t know what to rank first. You’ll want to prevent duplicate content as much as possible. So, what can you do, yourself? Here, I’ll explain how to perform a duplicate content check, which you should do from time to time to find copied content. Plus, some tips to avoid duplicate content in the first place. Let’s get started!

Adding a preventive snippet

In the ‘Search Appearance’ > ‘RSS’ section of our Yoast SEO plugin, we have predefined a snippet to add to your feed entry saying “This article first appeared on yourwebsite.com”. The link in this snippet makes sure that every scraper includes the link to the original article. Of course, this already helps to prevent duplicate content, as Google will find that backlink to your website.

Nevertheless, if you write awesome content, your content will be duplicated. And that copy won’t always include a link to your website. All the more reason to do a duplicate content check on a regular basis.

CopyScape duplicate content checker

There are a lot of tools to find duplicate content. One of the best known duplicate content checkers is probably CopyScape.com. This tool works pretty easily: insert a link in the box on the homepage, and CopyScape will return a number of results, presented a bit like Google’s search result pages.

copyscape duplicate content checker results
The results page of a CopyScape scan

You can click the results for more details and to see which parts of your text are duplicate. Let’s look at an example from our popular post on 6 common SEO mistakes, which was first published on 3 October 2017. Copyscape found that 170 words, or 9% of this post, were copied:

CopyScape highlights passages that are duplicate

In this case, the first paragraph from our article, discussing low site speed as a common SEO mistake, was copied and turned into a short blog post. CopyScape clearly highlights the text they found to be duplicate, which gives an idea of how severe the copying is. If it’s just a small percentage of the page, I wouldn’t worry. If it’s like over 40%, and makes up quite a large part of the other page, I would simply email them to change the copied text.

Use the CopyScape duplicate content checker to find copied content from your website on other websites. Again, it’s one of many tools, but this one’s free and easy to use. Keep in mind, though, you won’t get unlimited scans for one website. If you want to dive a bit deeper into your duplicate content, CopyScape also offers a premium version for more insights.

Tip: Duplicate content on product pages

Using CopyScape, we frequently find manufacturer descriptions used in online shops to be duplicate. Usually, these are automatically imported into the shop’s content management system. Usually, not just for your website. Be aware of this. I understand it’s quite the hassle to write unique product descriptions for every product. But, don’t your best-selling products, at the least, deserve as much? So start now and take it from there!

Siteliner internal duplicate content check

Siteliner is CopyScape’s brother that searches for internal duplicate content. So, this duplicate content checker will find duplicate content on your own site.

Internal duplicate content

Internal duplicate content, how does that happen, you ask? Well, a very common example of this is when a WordPress blog doesn’t use excerpts but shows the entire blog post on the blog’s homepage. That means that the blog post is available on at least two pages: the homepage and the post itself. And it’s probably on the category and tag overview pages as well. That’s four versions of the same article on your own website already.

Using excerpts (rather than showing the entire post) has the advantage that the excerpt always has a proper link to the post. This link will tell Google that the original content is not on that blog/category/tag page but in the post itself. We often recommend the use of excerpts.

Using Siteliner

The Siteliner duplicate content check will show you a lot of things, but limited to 250 pages and once every 30 days. Again, there is a premium version, but the free one will already give you a good impression. Just do a search and you’ll end up on the overview page. You’ll see the percentage of internal duplicate content at the top left. Don’t panic when you see high numbers, as this duplicate content check also considers excerpts duplicate content:

Siteliner results overview
The siteliner overview page

Simply click one of the links and check if it’s indeed the excerpt. The excerpt obviously links to the post, so if that’s the case, you’re covered.

Siteliner highlights the content it considers internal duplicate content and tells you where to find it

Sidenote on using duplicate content checkers

While Google understands what a sidebar is, CopyScape and Siteliner appear to include all text on a page in their percentage calculations. This means that the actual percentage of the duplicate content, when just looking at the main content of a page, might be higher. Please keep this in mind when you use one of these duplicate content checkers. Just a heads-up!

Manual duplicate content check

CopyScape and Siteliner are nice, easy-to-use duplicate content checkers. However, if you want to see what’s duplicate according to Google, you could also just use Google itself.

If you have a certain page that you’d like to check, simply go to that page. Copy a text snippet, preferably from a section that you think might be attractive for others to copy. Let’s take a passage from our common SEO mistakes article: “If your page title is too long (currently 400 to 600 pixels), it will get cut off in Google. You don’t want potential visitors to be unable to read the full title in the SERPs.” (Note that Google only takes the first 32 words into account). Insert the exact snippet in Google between double quotation marks like this:

Duplicate content check in Google

This search query returns ‘about 208 results’ according to Google, which is well over the 10 results CopyScape returned.

Check your own duplicate content

Use a duplicate content checker like CopyScape to find what has been copied from your site, and use Google to see where else on the internet this content ended up. These are simple tools that serve a higher goal: to prevent duplicate content. If you want to read more on duplicate content, start with our Duplicate content: causes and solutions article.

Read more: rel=canonical: the ultimate guide »

The post DIY: Duplicate content check appeared first on Yoast.

Ask Yoast: www and duplicate content

If content on different urls is the same, search engines don’t know which url to show in the search results. We call this a duplicate content issue. And it can hurt your rankings! Unfortunately it happens more often than you’d think. Did you, for instance, ever think about the consequences of www or non www versions of your site?

At Ask Yoast, we received a question about this from Steve Blundell of Avonsci:

“Do the www and non www versions of a page create duplicate content, and if so how can I deal with it?”

Watch the answer in the video below!

www or not?

“The answer is yes, it creates duplicate content. It’s not the worst kind of duplicate content, because Google knows that these things happen, but it’s better to fix it nonetheless. The best way of fixing it is to choose one, either the www or the non www version and to redirect the other to it. So on Yoast.com we redirect www.yoast.com to yoast.com. We did that, because we think it’s cooler and www is a bit old fashioned. But, choose whatever suits you best, redirect the other and you’re done!”

Do you have a question about duplicate content, link building or copywriting? Just ask! We’ll be glad to help you out if we can. Send your SEO question to ask@yoast.com!

Read more: ‘Duplicate content: causes and solutions’ »

Yoast SEO’s hidden features that secretly level up our SEO

If you use Yoast SEO on your website, you’re probably familiar with features like the SEO analysis or the snippet preview. You might even know that you can easily link to related posts or create redirects in the premium version of the plugin. But there’s (much) more. For instance, the Yoast SEO plugin has so-called hidden features. You won’t find them in your settings, but they do great work. Today, we’ll dive into these hidden features: which ones do we have and how do they lighten your load?

Why hidden features?

There are many choices on how to optimize your site. When developing our Yoast SEO plugin, we don’t translate all these choices into settings. In fact, we try to make as few settings as possible. If we believe something is beneficial for every Yoast SEO user, it’s on. We call these features hidden features because as a user you’re not necessarily aware of their existence. You might even think we don’t have certain features because there’s no setting for it. While in fact we just take care of things for you.

The hidden features of Yoast SEO

To help you understand what Yoast SEO does for your website in the background, we’ve listed some of the hidden features for you below. Let’s go through them one by one!

1. A structured data graph

Yoast SEO outputs a fully-integrated structured data graph for your posts and pages. What’s that? And how does that help you optimize your site?

Some years ago, search engines came up with something called Schema.org to better understand the content they crawl. Schema is a bit like a glossary of terms for search engine robots. This structured data markup will help them understand whether something is a blog post, a local shop, a product, an organization or a book, just to name a few possibilities. Or, whether someone is an author, an actor, associated with a certain organization, alive or even a fictional character, for instance.

For all these items there’s a set of properties that specifically belongs to that item. If you provide information about these items in a structured way – with structured data – search engines can make sense of your site and the things you talk about. As a reward, they might even give you those eye-catching rich results.

Hence, adding structured data to your site’s content is a smart thing to do. But, as the number of structured data items grows, all these loose pieces of code can end up on a big pile of Schema markup on your site’s pages. Yoast SEO helps you prevent building this unorganized pile of code. For every page or post, it creates a neat structured data graph. In this graph, it connects the loose pieces of structured data with each other. Therefore, a search engine can understand, for instance, that a post is written by author X, working for organization Y, selling brand Z.

A structured data graph: Yoast SEO connects blobs of Schema markup in one single graph, so search engines understand the bigger picture.

If you want to learn more about this, we’d advise reading Edwin’s story on how Yoast SEO helps search engine robots connect the dots.

2. Self-referencing canonicals

Canonicals were introduced quite some years ago as an answer to duplicate content. Duplicate content means that the same or very similar content is available on multiple URLs. This confuses search engines: If the same content is shown on various URLs, which URL should they show in the search results?

Duplicate content can exist without you being aware of it. In an online store, for instance, one product might belong to more than one categories. If the category is included in the URL, the product page can be found on multiple URLs. Or perhaps you add campaign tags to your URLs if you share them on social or in your newsletter? This means the same page is available on a URL with and without a campaign tag. And there are more technical causes for duplicate content such as these.

The solution for this type of duplicate content issues is a self-referencing canonical. A canonical URL lets you say to search engines: “Of all the options available for this URL, this URL is the one you should show in the search results”. You can do so by adding a rel=canonical tag on a page, pointing to the page that you’d like to rank. In this case, you’d need the canonical tag to point to the URL of the original page.

So, should you go through all your posts now and add it? Not if you’re using Yoast SEO. The plugin does this for you, everywhere on your site: single posts and pages, homepages, category archives, tag archives, date archives, author archives, etc. If you’re not such a techy person, the canonical isn’t easy to wrap your head around. Or, perhaps, you just don’t have the time to focus on it. So let Yoast SEO take care of it and move on to more exciting stuff!

Read more: rel=canonical: the ultimate guide »

3. Paginated archives with rel=next / rel=prev

Another hidden feature in Yoast SEO is rel=next / rel=prev. It’s a method of telling search engines that certain pages belong to an archive: a so-called paginated archive. A rel=next / prev tag in the header of your site lets search engines know what the previous and the next page in that archive is. Nobody else than people looking at the source code of your site and search engines see this piece of code.

Not so long ago, Google announced that it isn’t using rel=next/prev anymore. Does this mean we should do away with this feature? No, certainly not! Bing and other search engines still use it, so Yoast SEO will keep on adding rel=next / prev tags to paginated archives.

Keep reading: Pagination and SEO: best practices »

4. Nofollow login & registration links

If you have a WordPress site, you most likely have a login link and a registration link somewhere on your site. But the login or registration page of your WordPress site are places visitors, nor search engines will ever have to be.

Therefore, Yoast SEO tells search engines not to follow links for login and registration pages. Yoast SEO makes sure that search engines will never follow these links. It’s a tiny tweak, but it saves a lot of unneeded Google action. 

5. Noindex your internal search results

This hidden feature is based on Google’s Webmaster Guidelines. Google wants to prevent users from going from a search result in Google to a search result page on a website site. Google, justly, considers that bad user experience.

You can tell search engines not to include a certain page in their search results by adding a noindex tag to a page. Because of Google’s guidelines, Yoast SEO tells search engines that they should not display your internal search results pages in their search results with a noindex tag. They just tell them not to show these pages in the search results; the links on these pages can still be followed and counted which is better for SEO.

Read on: Which pages should I noindex or nofollow on my site »

6. Removal of replytocom variables

This last hidden feature is quite a technical one. In short, it prevents your site from creating lots of URLs with no added value. WordPress has a replytocom feature that lets you reply to comments without activating JavaScript in your browser. But this means that for every comment, it creates a separate URL with ?replytocom variables.

The disadvantage of this is that if you get a lot of comments search engines have to index all those URLs, which is a waste of your crawl budget. Therefore we remove these variables by default.

Keep on reading: Why you should buy Yoast SEO Premium »

The post Yoast SEO’s hidden features that secretly level up our SEO appeared first on Yoast.

Ask Yoast: RSS importers and duplicate content

At Yoast we’re happy to help you with your SEO question! This time we received a question from Diana. She’s asking:

“Does Google consider it to be duplicate content when I import my articles from one blog to my main blog using an RSS Importer? Will I be penalized by Google in some way?”

Watch the answer in this video or find it in the transcript below!

Transcript

Well you won’t necessarily be penalized, but Google will only show one of the two sites that have the blog articles. Your best bet probably, is to choose either one of them, and to make sure that the canonical on the one that you don’t want to rank is set to the one that you do want to rank. This might be slightly technical, but it’s in the Advanced tab of Yoast SEO. Underneath each post you can set the canonical to point to another url than the url of the post. And by doing that you’ll tell Google which one you want to rank. This solves any duplicate content issues. And you should be able to get your RSS importer to do that for you. Good luck!

Read more: ‘RSS feeds in the age of Panda and Penguin’ »

In the series Ask Yoast we help you with your SEO query. Don’t hesitate and send your SEO question to ask@yoast.com!