Writing content for another site as guest author or blogger can have many benefits. It could help you get more exposure, especially if you’re writing for a site that’s a lot bigger than yours. Working with other sites also gives you the opportunity to build potentially worthwhile (business) relationships and broaden your network. You might even get paid for your guest articles. Another obvious advantage, of course, is gaining valuable backlinks to your site.

Now, this doesn’t mean you should start sending out loads of mediocre articles to every blog that’s even remotely relevant to your own. A better strategy: guest-write great content for the right website, perhaps a few guest posts, and engage with the audience: you’ll surely get noticed.

But, once you’ve invested a lot of time in writing a great article that you’re very proud of, odds are you also want to put that content on your own site. Preferably without creating duplicate content issues. What are your options, in that case?

Yossi sent us a question that shows this dilemma:

I sometimes write articles for a third party website. I’d also like to put them on my own site. But I noticed the other site set a rel=”canonical” attribute pointing to their page. So, how can I put the articles I wrote on my site and benefit from them, without getting a penalty from Google?

Watch the video or read the transcript further down the page for my answer!

Reposting guest-authored content

“The problem is, Yossi, if you publish them on that other site first, and that other site is bigger than yours, then the chance of you ranking with that content is close to zero.

Learn how to write online copy that ranks!

  • Covers all from picking keywords to publishing
  • Includes personal feedback
  • On-demand SEO training by Yoast
  • End up with a ready-to-use blog post!
More info
So, if you want to rank with content, you need to decide where to put it first, and otherwise, you have to put a canonical on that third-party website to yours. But if they’re paying you to write that, or if there’s another sort of deal, they’ll probably not be willing to do that. So, decide where you want to rank with content, publish it there first, otherwise put a canonical from the page that you want not to rank to the page that you want to rank.

But if you can’t do any of that, then I would not go through the trouble of publishing it again on your own site. Because it really doesn’t make all that much sense. And you’d be better off just publishing a short snippet on your own site, saying, “Hey, I wrote this on that other site.” Good luck.”

Ask Yoast

In the series Ask Yoast, we answer SEO questions from our readers. Do you have an SEO-related question? A pressing SEO dilemma you can’t find the answer to? Send an email to ask@yoast.com, and your question may be featured in one of our weekly Ask Yoast vlogs.

Note: you may want to check our blog and knowledge base first, the answer to your question could already be out there! For urgent questions, for example about the Yoast SEO plugin not working properly, please contact us through our support page.

Read more: The ultimate guide to content SEO »

The post Ask Yoast: Can I repost my guest-authored content? appeared first on Yoast.

It can happen to you: other people copy content from your site and republish it on their own site. You have gone the extra mile to write an awesome article for your website, when, all of a sudden, another website takes possession of it. It can be frustrating to see this happen, and it happens more often than you think. If your website reaches a certain number of visitors and stands out from the crowd, there will be people that try to benefit from your content for their own gain.

Simple example: after this post is published, it will appear in our RSS feed. And this will cause other websites to publish the article automatically on their own website. They fully automated that process. Not the nicest way to express appreciation, right? But it happens.

Optimize for synonyms and related keywords and prevent broken pages on your site with Yoast SEO Premium! »

Yoast SEO: the #1 WordPress SEO plugin Info

In this article, we’d like to show you a number of ways that people can copy your content. We’ll also show you what possible actions you can take, without directly asking your lawyer to take action.

People copy content via your RSS feed

Most content management systems publish an RSS (Really Simple Syndication) feed for your website. Being the fossil that I am, I still use these feeds in my RSS reader (I’m using Feedly) so I can read up on a number of websites at once.

However, some websites use RSS to include news from other websites on their website. That can be done by including a list of your latest article titles that link to your website. You’ll probably have no problem with it if someone does this. But if it’s done to republish your content on their website without that link to you, it’s a different story. This is one of the reasons our Yoast SEO plugin allows you to add an extra line to your feed items. That line could say “The article (article title) was first published on (your URL)”. There is a line like that included by default, by the way. This ensures that, if people copy content from your website via your RSS feed, there will always be a link back to your website. Google will find that link and understand you are the original source.

Make sure there is a link back to the original article in the RSS feed. That way, the website that copies your content won’t get all the credits for your article.

Manually copying your content

If someone manually copies your content or removes that line directing the reader and Google to your website from your RSS feed, chances are you won’t even notice they copied it. But if you do, first, try to get them to add that link back to your article in there. Just send an email and hope that the ‘thief’ is willing to add that link.

We have had people telling us that the only reason they copy content from our website was that they felt their readers should know about that specific issue or tip as well. There didn’t appear to be bad intentions and the link was added immediately after our email.

The best way: canonical link

The best way to make sure search engines understand that your content is the original source for the content is by adding a canonical link back to your website. If the other website is willing to do so and is running our Yoast SEO plugin, this is easy as pie. If the website at hand has no bad intentions, they will be willing to add that link.

What if people copy content from your site? Canonical urls as a solution for duplicate content

Get rid of that copy altogether

It’s trickier when people have less good intentions for stealing your content. If they copy content from your website only for their own benefit, they might not even respond to your email. In that case, you may have to use your copyright as the original author to have that content removed. Google suggests contacting the host of the website and filing a request at Google as well (see the last paragraph of that article).

Translating your content

There is another way that websites can copy your content. If this article, for instance, gets translated into Italian, we might not find even out about it. But usually, articles like that do surface on Twitter. And you obviously have a saved search for your brand on social, right? Or one of the internal links that you added could remain in the translated article and show up in Google Analytics. You might find a Google Alert in our inbox showing that article. There are ways to find a translated article.

Do you want to be associated with that website?

Now I hear you think “If a canonical would help to link duplicate content cross-domain, I need hreflang here.” But you probably don’t. There is no use adding that hreflang tag if the other site isn’t linking back to you using the same method. And you have no control over the translation whatsoever, so you might not want to be linked to that domain anyway.

If the translation makes sense in your book, I would ask the other website to add a link in the article, stating that the original article appeared on your website, in English (or whatever language you originally wrote it in). If the translated article is for an audience that you’re not targeting, I wouldn’t even put too much effort into it.

Artwork

I’d like to wrap up this article with a small remark about artwork. We use artwork heavily in our publications (and branding). Every single illustration we have on our website is our own.

What if people copy content from your site? Use the copyright on your own artwork for instance

In case a website, Youtube video or social media publication uses that artwork, we have the option to have that publication taken down because of that. Usually, using this copyright angle is the easiest way to get rid of non-responsive thieves of your content. Simply send the website an email first, and ask the hosting company to take action if you get no response. Another reason to stay away from stock photos and use your own media to enhance your website!

Good luck!

Read more: DIY: Duplicate content check »

The post What if people copy content from your site? appeared first on Yoast.

You’ve probably come across the term duplicate content quite a lot, but what is it? Duplicate content is content that lives in several locations — i.e., URLs. Duplicate content can harm your rankings and many people say that copious amounts of it can even lead to a penalty by Google. That’s not true, though. There is no duplicate content penalty, but having loads of duplicate or copied content can get Google to influence your rankings negatively.

Optimize your site for search & social media and keep it optimized with Yoast SEO Premium »

Yoast SEO: the #1 WordPress SEO plugin Info

What does duplicate content mean?

Duplicate content is all content that is available on multiple locations on or off your site. It often lives on a different URL and sometimes even on a different domain. Most duplicate content happens accidentally or is the result of a sub-par technical implementation. For instance, your site could be available on both www and non-www or HTTP and HTTPS — or both at the same time, the horror! Or maybe your CMS uses excessive dynamic URL parameters that confuse search engines. Even your AMP pages could count as duplicate content if not linked properly. Duplicate content is everywhere.

Google’s definition of duplicate content is as follows:

“Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin.”

That last part is important. If you scrape, copy and spin existing content — Google calls this copied content — with the intention of deceiving the search engine to get a higher ranking you will be on dangerous ground.

Google says this type of malicious intent might trigger an action:

“Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results”

Michiel has some great tips for discovering duplicate content on your site: DIY Duplicate content check. Google’s documentation is also a goldmine for working with duplicate content.

Duplicate content vs. copied content vs. thin content

The topic of duplicate content confuses a lot of people. For Google, most duplicate content has a technical origin, but it will also look at the content itself. “I have two URLs for the same article, which one should I choose?” While most regular people will probably think of pieces of similar content that appear elsewhere on a site. “I have used this piece of text in several other places, is that bad?” This is all duplicate content, but for determining rankings, search engines make a distinction between duplicate content, copied content and thin content.

Your duplicate content might classify as copied content if you use an existing text and rehash it quickly to reuse it on your site. It doesn’t matter if you give it a little spin or put in a few keywords, this behavior is not acceptable.  Throw in a couple of thin content pages — pages that have little to no quality content — and you’re in dangerous territory. Site quality is an issue and these tactics can bring serious harm to your site. Remember Panda?

Don’t block duplicate content on your site

Google is pretty apt at discovering and handling duplicate content. The search engine is smart enough to figure out what to do with most of the duplicate content it finds. If it finds multiple versions of a page it will fold these into the version it finds best — in most cases, this will be the original article/page. What it does need, though, is complete access to these URLs. If you block Googlebot in your robots.txt from crawling these URLs, it cannot figure these things out by itself and you will run the risk of Google treating these pages as separate instances. Here are a couple of things you should do:

  • Allow robots to crawl these URLs
  • Mark the content as duplicate by using rel=canonical (read more about this below)
  • Use Google’s URL Parameter Handling tool to determine how parameters should be handled
  • Use 301 redirects to send users and crawlers to the canonical URL

There’s more you can do to fight duplicate content on your site as Joost describes in his article on duplicate content: causes and solutions.

Use rel=canonical!

One of the essential tools in your duplicate content fighting toolkit is rel=”canonical” . You can use this piece of code to determine what the original URL is of a piece of content, something we call the canonical URL. We have an excellent ultimate guide to rel=”canonical” that shows you everything there is to know about it.

Focus on original, fresh and authoritative content

Another tool in your arsenal to fight duplicate, copied and unoriginal content are your writing skills. Google is focused on quality. It is always on the lookout for the best possible piece of content that fits the users intent best. Your goal should not be to make a quick buck but to leave a lasting impression. Watch out for thin content and make sure to make it original and of high quality.

The same goes for similar content on your site. We’ve talked about keyword cannibalization before and this is an extension of that. Folding several comparable posts into one can achieve much better results, both in terms of rankings as well as fighting duplicate content.

Here’s Google’s take on similar content:

“Minimize similar content: If you have many pages that are similar, consider expanding each page or consolidating the pages into one. For instance, if you have a travel site with separate pages for two cities, but the same information on both pages, you could either merge the pages into one page about both cities or you could expand each page to contain unique content about each city.”

Duplicate content is everywhere — know what to do about it

Ex-Googler Matt Cutts once famously said that 20% to 30% of the web consists of duplicate content. While I’m not sure these numbers are still accurate; duplicate content continues to pop up on every site. This doesn’t have to be bad news. Fix what you can and don’t try and turn duplicate content and its siblings copied content and thin content into a viable SEO strategy.

Read more: Content maintenance for SEO »

The post What is duplicate content? appeared first on Yoast.

As your site grows, you’ll have more and more posts. Some of these posts are going to be about a similar topic. Even if you’ve always categorized it well, your content might be competing with itself: you’re suffering from keyword cannibalization. At the same time, some of your articles might get out of date, and not be entirely correct anymore. To prevent all of this, you need to perform content maintenance.

Learn how to write awesome and SEO friendly articles in our SEO Copywriting training »

SEO copywriting training Info

In a lot of cases, content maintenance is going to mean deleting and merging content. I’m going to run you through some of that maintenance work as we did it at Yoast, to show you how to do this. In particular, I’m going to show you my thinking around a cluster of keywords around keyword research.

Step 1: Audit your content

The first step in my process was finding all the content we had around keyword research. Now, most of that was simple: we have a keyword research tag, and most of the content was nicely tagged. This was also slightly shocking: we had quite a few posts about the topic.

A site:search in Google gave me the missing articles that Google considered to be about keyword research. I simply searched for site:yoast.com "keyword research" and Google gave me all the posts and pages on the site that mentioned the topic.

I had found a total of 18 articles that were either entirely devoted to keyword research or had large sections that mentioned it. Another 20 or so mentioned it in passing and linked to some of the other articles.

The reason I started auditing the content for this particular group of keywords simple: I wanted to improve our rankings around the cluster of keywords around keyword research. So I needed to analyze which of these pages were ranking, and which weren’t. This content maintenance turned out to be badly needed.

Step 2: Analyze the content performance

I went into Google Search Console (the new beta) and went to the Performance section. In that section I clicked the filter bar:

Search Console Performance section

I clicked Query and then typed “keyword research” into the box like this:

performance filter: keyword research queries

This makes Google Search Console match all queries that contain the words keyword and research. This gives you two very important pieces of data:

  1. A list of the keywords your site had been shown in the search results for and the clicks and click-through rate (CTR) for those keywords;
  2. A list of the pages that were receiving all that traffic and how much traffic each of those pages received.

I started by looking at the total number of clicks we had received for all those queries and then looked at the individual pages. Something was immediately clear: three pages were getting 99% of the traffic. But I knew we had 18 articles that covered this topic. Obviously, it was time to clean up. Of course, we didn’t want to throw away any posts that were getting traffic that was not included in this bucket of traffic. So I had to check each post individually.

I removed the Query filter and used another option that’s in there: the Page filter. This allows you to filter by a group of URLs or a specific URL. On larger sites you might be able to filter by groups of URLs, in this case, I looked at the data for each of those posts individually.

Step 3: Decision time

As I went through each post in this content maintenance process, I decided what we were going to do: keep it, or delete it. If I decided we should delete it (which I did for the majority of the posts), I decided to which post we should redirect it. The more basic posts I decided to redirected to our SEO for Beginners post: what is keyword research?.  The posts about keyword research tools were redirected to our article that helps you select (and understand the value of) a keyword research tool. Most of the other ones I decided to redirect to our ultimate guide to keyword research.

For each of those posts, I evaluated whether they had sections that we needed to merge into another article. Some of those posts had paragraphs or even entire sections that could just be merged into another post.

I found one post that, while it didn’t rank for keyword research, still needed to be kept: it talked about long tail keywords specifically. It had such a clear reach for those terms that deleting it would be a waste, so I decided to redirect the other articles about the topic to that specific article.

Optimize your site for search & social media and keep it optimized with Yoast SEO Premium »

Yoast SEO: the #1 WordPress SEO plugin Info

Step 4: Take action

Now it was time to take action! I had a list of action items: content to add to specific articles after which each of the articles that piece of content came from could be deleted. Using Yoast SEO Premium, it’s easy to 301 redirect a post or page when you delete it, so that process was fairly painless.

With that, we’d taken care of the 18 specific articles about the topic, and retained only 4. We still had a list of ~20 articles that mentioned the topic and linked to one of the other articles. We went through all of them and made sure each linked to one or more of the 4 remaining articles in the appropriate section.

Content maintenance is hard work

If you’re thinking: “that’s a lot of work”. Yes, it is. And we don’t write about just keyword research, so this is a process we have to do for quite a few terms, multiple times a year. This is a very repeatable content maintenance strategy though:

  1. Audit, so you know which content you have;
  2. Analyze, so you know how the content performs;
  3. Decide which content to keep and what to throw away;
  4. Act.

Now “all” you have to do is go through that process at least once a year for every important cluster of keywords you want your site to rank for.

Read more: Keyword research: the ultimate guide »

The post Content maintenance for SEO: research, merge & redirect appeared first on Yoast.

Keyword cannibalization means that you have various blog posts or articles on your site that can rank for the same search query in Google. If you optimize posts or articles for similar search queries, they’re eating away each other’s chances to rank. Here, I’ll explain why keyword cannibalism is bad for your SEO, how you can recognize keyword cannibalization and how to solve it.

Learn how to write awesome and SEO friendly articles in our SEO Copywriting training »

SEO copywriting training Info

What is keyword cannibalization?

If you optimize your articles for similar terms, you might suffer from keyword cannibalization: you’ll be devouring your own chances to rank in Google. Google will only show 1 or 2 results from the same domain in the search results for any specific query. If you’re a high authority domain, you might get away with 3.

Why is keyword cannibalism bad for SEO?

If you cannibalize your own keywords, you’re competing with yourself for ranking in Google. Let’s say you have two posts on the same topic. In that case, Google isn’t able to distinguish which article should rank highest for a certain query. As a result, they’ll probably both rank lower. Therefore our SEO analysis will give a red bullet whenever you optimize a post for a focus keyword you’ve used before.

But, keyword cannibalism can also occur if you optimize posts for focus keywords that are not exactly, but almost the same. For instance, I wrote two posts about whether or not readability is a ranking factor. The first post is optimized for ‘does readability rank’, while the second post is optimized for the focus keyword ‘readability ranking factor’. The posts have a slightly different angle but are still very similar. For Google, it is hard to figure out which of the two article is most important. As a result, you could end up ranking low with both articles.

How to recognize keyword cannibalization?

Checking whether or not your site suffers from keyword cannibalism is rather easy. You should search your site for any specific keyword you suspect might have multiple results. In my case, I’ll google site:yoast.com readability ranks. The first two results are the articles I suspected to suffer from cannibalization.

Googling ‘site:domain.com “keyword” will give you an easy answer to the question whether you’re suffering from keyword cannibalism.

Solve keyword cannibalization with internal linking

You can help Google to figure out which article is most important, by setting up a decent internal linking structure.  You should link from posts that are less important, to posts that are the most important to you. That way, Google can figure out (by following links) which ones you want to pop up highest in the search engines.

Your internal linking structure could solve a part of your keyword cannibalism problems. You should think about which article is most important to you and link from the less important long tail articles, to your most important article. Read more about how to do this in my article about ranking with cornerstone content.

Solve keyword cannibalism by combining articles

In many cases, the best way to solve the keyword cannibalization problem is by combining articles. Find the articles that focus on similar search queries. If two articles are both attracting the same audience and are basically telling the same story, you should combine them. Rewrite the two post into one amazing, kickass article. That’ll really help with your ranking (Google loves lengthy and well-written content) and solve your keyword cannibalization problem. That’s exactly what I should do with my two posts about whether or not readability is a ranking factor. In the end, you’ll delete one of the two articles and adapt the other one. And don’t forget: don’t just press the delete button; always make sure to redirect the post you delete.

Keyword cannibalism will affect growing websites

If your site gets bigger, your chances increase to face keyword cannibalism on your own website. You’ll be writing about your favorite subjects and without even knowing it, you’ll write articles that end up rather similar. That’s what happened to me too. Once in a while, you should check the keywords you want to rank for the most. Make sure to check whether you’re suffering from keyword cannibalism. You’ll probably need to make some changes in your site structure or to rewrite some articles every now and then.

Read more: Keyword research: the ultimate guide »

The post Keyword cannibalization appeared first on Yoast.

There are many occassions when you may want to put a PDF on your site. For example, when you’ve made an online magazine, when an article you wrote was featured in a book or magazine, or when you’ve written detailed instructions for a DIY project. So far, so good.

But things can get a bit more complicated when you also have the content from this PDF somewhere else on your site, or on another website. To avoid duplicate content, you need to set a canonical URL. But how do you do that for a PDF document? And what is the best way to do that? Let’s discuss in today’s Ask Yoast!

Karen Schousboe emailed us her question:

I plan to publish a PDF magazine under medieval.news. Some of the articles in each issue will also be freely available on a sister website. How should I handle that? Do I link canonical from the articles to the PDF magazine or from the magazine to the website?”

Watch the video or read the transcript further down the page for my answer!

Canonicalization and PDFs

“Well, you can have a canonical HTTP header and what I would suggest doing is canonicalizing from the PDF magazine to the sister website, because HTML pages just rank a lot better than PDFs, usually.

In fact, I would suggest publishing everything in HTML and not necessarily in PDF because PDF is just not very easy to land on from search. You can’t do any tracking, you can’t do a whole lot of things that you can do with HTML. So I would seriously consider doing all of it in HTML pages and then canonicalizing between them. Good luck.”

Become a technical SEO expert with our Technical SEO 1 training! »

Technical SEO 1 training Info

Ask Yoast

In the series Ask Yoast, we answer SEO questions from our readers. Have an SEO-related question? Maybe we can help you out! Send an email to ask@yoast.com.

Note: please check our blog and knowledge base first, the answer to your question may already be out there! For urgent questions, for example about our plugin not working properly, we’d like to refer you to our support page.

Read more: ‘rel=canonical: the ultimate guide’ »

The post Ask Yoast: Canonical for PDF magazine appeared first on Yoast.

Search engines like Google have a problem. It’s called ‘duplicate content.’ Duplicate content means that similar content is being shown on multiple locations (URLs) on the web. As a result, search engines don’t know which URL to show in the search results. This can hurt the ranking of a webpage. Especially when people start linking to all the different versions of the content, the problem becomes bigger. This article will help you to understand the various causes of duplicate content, and to find the solution for each of them.

What is duplicate content?

You can compare duplicate content to being on a crossroad. Road signs are pointing in two different directions for the same final destination: which road should you take? And now, to make it ‘worse’ the final destination is different too, but only ever so slightly. As a reader, you don’t mind: you get the content you came for. A search engine has to pick which one to show in the search results. It, of course, doesn’t want to show the same content twice.

Let’s say your article about ‘keyword x’ appears on http://www.example.com/keyword-x/ and the same content also appears on http://www.example.com/article-category/keyword-x/. This situation is not fictitious: it happens in lots of modern Content Management Systems. Your article has been picked up by several bloggers. Some of them link to the first URL; others link to the second URL. This is when the search engine’s problem shows its real nature: it’s your problem. The duplicate content is your problem because those links are both promoting different URLs. If they were all linking to the same URL, your chance of ranking for ‘keyword x’ would be higher.

Learn how to write awesome and SEO friendly articles in our SEO Copywriting training »

SEO copywriting training Info

Table of contents

1 Causes for duplicate content

There are dozens of reasons that cause duplicate content. Most of them are technical: it’s not very often that a human decides to put the same content in two different places without distinguishing the source: it feels unnatural to most of us. The technical reasons are plentiful though. It happens mostly because developers don’t think as a browser or a user, let alone a search engine spider, they think as a developer. That aforementioned article, that appears on http://www.example.com/keyword-x/ and http://www.example.com/article-category/keyword-x/? If you ask the developer, he’ll say it only exists once.

1.1 Misunderstanding the concept of a URL

Has that developer gone mad? No, he’s just speaking a different language. You see a database system probably powers the whole website. In that database, there’s only one article, the website’s software just allows for that same article in the database to be retrieved through several URLs. That’s because, in the eyes of the developer, the unique identifier for that article is the ID that article has in the database, not the URL. For the search engine though, the URL is the unique identifier to a piece of content. If you explain that to a developer, he’ll start getting the problem. And after reading this article, you’ll even be able to provide him with a solution right away.

1.2 Session IDs

You often want to keep track of your visitors and make it possible, for instance, to store items they want to buy in a shopping cart. To do that, you need to give them a ‘session.’ A session is a brief history of what the visitor did on your site and can contain things like the items in their shopping cart. To maintain that session as a visitor clicks from one page to another, the unique identifier for that session, the so-called Session ID, needs to be stored somewhere. The most common solution is to do that with cookies. However, search engines usually don’t store cookies.

At that point, some systems fall back to using Session IDs in the URL. This means that every internal link on the website gets that Session ID appended to the URL, and because that Session ID is unique to that session, it creates a new URL, and thus duplicate content.

1.3 URL parameters used for tracking and sorting

Another cause for duplicate content is the use of URL parameters that do not change the content of a page, for instance in tracking links. You see, http://www.example.com/keyword-x/ and http://www.example.com/keyword-x/?source=rss are not the same URL for a search engine. The latter might allow you to track what source people came from, but it might also make it harder for you to rank well. A very unwanted side effect!

This doesn’t just go for tracking parameters, of course. It goes for every parameter you can add to a URL that doesn’t change the vital piece of content, whether that parameter is for ‘changing the sorting on a set of products’ or for ‘showing another sidebar’: they all cause duplicate content.

1.4 Scrapers & content syndication

Most of the causes for duplicate content are all your own or at the very least your website’s ‘fault.’ Sometimes, however, other websites use your content, with or without your consent. They do not always link to your original article, and thus the search engine doesn’t ‘get’ it and has to deal with yet another version of the same article. The more popular your site becomes, the more scrapers you’ll often have, making this issue bigger and bigger.

1.5 Order of parameters

Optimize your site for search & social media and keep it optimized with Yoast SEO Premium »

Yoast SEO: the #1 WordPress SEO plugin Info
Another common cause is that a CMS doesn’t use nice and clean URLs, but rather URLs like /?id=1&cat=2, where ID refers to the article and cat refers to the category. The URL /?cat=2&id=1 will render the same results in most website systems, but they’re completely different for a search engine.

1.6 Comment pagination

In my beloved WordPress, but also in some other systems, there is an option to paginate your comments. This leads to the content being duplicated across the article URL, and the article URL + /comment-page-1/, /comment-page-2/ etc.

If your content management system creates printer friendly pages and you link to those from your article pages, in most cases Google will find those, unless you specifically block them. Now, which version should Google show? The one laden with ads and peripheral content,  or the one with just your article?

1.8 WWW vs. non-WWW

One of the oldest in the book, but sometimes search engines still get it wrong: WWW vs. non-WWW duplicate content, when both versions of your site are accessible. A less common situation but one I’ve seen as well: HTTP vs. HTTPS duplicate content, where the same content is served out over both.

2 Conceptual solution: a ‘canonical’ URL

As determined above, the fact that several URLs lead to the same content is a problem, but it can be solved. A human working at a publication will normally be able to tell you quite easily what the ‘correct’ URL for a certain article should be. The funny thing is, though, sometimes when you ask three people in the same company, they’ll give three different answers…

That’s a problem that needs solving in those cases because, in the end, there can be only one (URL). That ‘correct’ URL for a piece of content has been dubbed the Canonical URL by the search engines.

canonical_graphic_1024x630

Ironic side note

Canonical is a term stemming from the Roman Catholic tradition, where a list of sacred books was created and accepted as genuine. They were dubbed the canonical Gospels of the New Testament. The irony is: it took the Roman Catholic church about 300 years and numerous fights to come up with that canonical list, and they eventually chose four versions of the same story

3 Identifying duplicate contents issues

You might not know whether you have a duplicate content issue on your site or with your content. Let me give you some methods of finding out whether you do.

3.1 Google Search Console

Google Search Console is a great tool for identifying duplicate content. If you go into the Search Console for your site, check under Search Appearance » HTML Improvements, and you’ll see this:

If pages have duplicate titles or duplicate descriptions, that’s almost never a good thing. Clicking on it will reveal the URLs that have duplicate titles or descriptions and will help you identify the problem. The issue is that if you have an article like the one about keyword X, and it shows up in two categories, the titles might be different. They might, for instance, be ‘Keyword X – Category X – Example Site’ and ‘Keyword X – Category Y – Example Site’. Google won’t pick those up as duplicate titles, but you can find them by searching.

3.2 Searching for titles or snippets

There are several search operators that are very helpful for cases like these. If you’d want to find all the URLs on your site that contain your keyword X article, you’d type the following search phrase into Google:

site:example.com intitle:"Keyword X"

Google will then show you all pages on example.com that contain that keyword. The more specific you make that intitle part, the easier it is to weed out duplicate content. You can use the same method to identify duplicate content across the web. Let’s say the full title of your article was ‘Keyword X – why it is awesome’, you’d search for:

intitle:"Keyword X - why it is awesome"

And Google would give you all sites that match that title. Sometimes it’s worth even searching for one or two complete sentences from your article, as some scrapers might change the title. In some cases, when you do a search like that, Google might show a notice like this on the last page of results:

Duplicate content noticed by Google

This is a sign that Google is already ‘de-duping’ the results. It’s still not good, so it’s worth clicking the link and looking at all the other results to see whether you can fix some of those.

4 Practical solutions for duplicate content

Once you’ve decided which URL is the canonical URL for your piece of content, you have to start a process of canonicalization (yeah I know, try to say that three times out loud fast). This means we have to let the search engine know about the canonical version of a page and let it find it ASAP. There are four methods of solving the problem, in order of preference:

  1. Not creating duplicate content
  2. Redirecting duplicate content to the canonical URL
  3. Adding a canonical link element to the duplicate page
  4. Adding an HTML link from the duplicate page to the canonical page

4.1 Avoiding duplicate content

Some of the above causes for duplicate content have very simple fixes to them:

  • Session ID’s in your URLs?
    These can often just be disabled in your system’s settings.
  • Have duplicate printer friendly pages?
    These are completely unnecessary: you should just use a print style sheet.
  • Using comment pagination in WordPress?
    You should just disable this feature  (under settings » discussion) on 99% of sites.
  • Parameters in a different order?
    Tell your programmer to build a script to always order parameters in the same order (this is often referred to as a so-called URL factory).
  • Tracking links issues?
    In most cases, you can use hash tag based campaign tracking instead of parameter-based campaign tracking.
  • WWW vs. non-WWW issues?
    Pick one and stick with it by redirecting the one to the other. You can also set a preference in Google Webmaster Tools, but you’ll have to claim both versions of the domain name.

If you can’t fix your problem that easily, it might still be worth it to put in the effort. The goal would be to prevent the duplicate content from appearing altogether. It’s by far the best solution to the problem.

4.2 301 Redirecting duplicate content

In some cases, it’s impossible to entirely prevent the system you’re using from creating wrong URLs for content, but sometimes it is possible to redirect them. If this isn’t logical to you (which I can understand), do keep it in mind while talking to your developers. If you do get rid of some of the duplicate content issues, make sure that you redirect all the old duplicate content URLs to the proper canonical URLs. 

Learn how to write awesome and SEO friendly articles in our SEO Copywriting training »

SEO copywriting training Info

4.3 Using rel=”canonical” links

Sometimes you don’t want to or can’t get rid of a duplicate version of an article, even when you do know that it’s the wrong URL. For that particular issue, the search engines have introduced the canonical link element. It’s placed in the section of your site, and it looks like this:

<link rel="canonical" href="http://example.com/wordpress/seo-plugin/">

In the href section of the canonical link, you place the correct canonical URL for your article. When a search engine that supports canonical finds this link element, it performs what is a soft 301 redirect. It transfers most of the link value gathered by that page to your canonical page.

This process is a bit slower than the 301 redirect though, so if you can do a 301 redirect that would be preferable, as mentioned by Google’s John Mueller.

Read more: ‘ rel=canonical • What it is and how (not) to use it ’ »

4.4 Linking back to the original content

If you can’t do any of the above, possibly because you don’t control thesection of the site your content appears on, adding a link back to the original article on top of or below the article is always a good idea. This might be something you want to do in your RSS feed: add a link back to the article in it. Some scrapers will filter that link out, but some others might leave it in. If Google encounters several links pointing to your article, it will figure out soon enough that that’s the actual canonical version of the article.

5 Conclusion: duplicate content is fixable, and should be fixed

Duplicate content happens everywhere. I have yet to encounter a site of more than 1,000 pages that hasn’t got at least a tiny duplicate content problem. It’s something you need to keep an eye on at all times. It is fixable though, and the rewards can be plentiful. Your quality content might soar in the rankings by just getting rid of duplicate content on your site!

Keep reading: ‘ Ask Yoast: webshops and duplicate content’ »

The post Duplicate content: causes and solutions appeared first on Yoast.

At Yoast, we like to say ‘Content is king’. By this, we mean that you cannot rank for any keyword if you don’t write meaningful and original content about it. In this SEO basics post, I’ll explain why you absolutely need content to make your site attractive for your visitors. Also, I’ll clarify why Google dislikes low quality or thin content and what you can do about it.

Learn how to write awesome and SEO friendly articles in our SEO Copywriting training »

SEO copywriting training Info

Thin content

So what is thin content? Thin content is content that has little or no value to the user. Google considers doorway pages, low quality affiliate pages, or simply pages with very little or no content as thin content pages. But don’t fall into the trap of just producing loads of very similar content: non-original pages, pages with scraped and duplicate content, are considered thin content pages too. On top of that, Google doesn’t like pages that are stuffed with keywords either. Google has gotten smarter and has learned to distinguish between valuable and low quality content, especially since Google Panda.

What does Google want?

Google tries to provide the best results that match the search intent of the user. If you want to rank high, you have to convince Google that you’re giving the answer to the question of the user. This isn’t possible if you’re not willing to write extensively on the topic you like to rank for. Thin content rarely qualifies for Google as the best result. As a minimum, Google has to know what your page is about to know if it should display your result to the user. So try to write enjoyable, informative copy, to make Google, but first an foremost, your users happy.

Read more: ‘SEO basics: What does Google do?’ »

Be the best result

We recommend writing meaningful copy about the keywords you’d like to rank for. If you keep a blog about your favorite hobby, this shouldn’t be much of a problem, right? If you write about something you love and know everything about, then it’s easy to show Google that your pages contain the expert answer they are looking for!

We do understand that every situation is different and that it’s not always possible to write an elaborate text about everything. For instance, if you own an online shop that sells hundreds of different computer parts, it can be a challenge to write an extensive text about everything. But at least make sure that every page has some original introductory content, instead of just an image and a buy button next to the price. If you sell lots of products that are very alike, you could also choose to optimize the category page instead of the product page or to use canonicals to prevent duplicate content issues.

How do we help you?

The Yoast SEO plugin helps you write awesome copy. It does that by providing content analysis checks. One of these checks is to write at least 300 words per page or posts. We also check if you haven’t used the same keyword before, which helps prevent you from creating similar content over and over. Another check that’s useful for this, is our keyword density check. If your score is too high, you’re probably stuffing your keyword into your copy, giving it an unnatural feel. So make sure at least these bullets are green.

content checks thin content

On top of that, you can use our readability check to make sure the quality of your text is good and readers can easily understand the text you’ve written.

Really want to learn how to create content that ranks? Then our SEO copywriting training probably is what you need. It guides you through the entire process of keyword research and content creation, helping you to develop the skills to write awesome content for your website!

Keep reading: ‘Content SEO: the ultimate guide’ »

The post SEO basics: What is thin content? appeared first on Yoast.

If your online business is doing well in your country, you might consider expanding to international markets. To be successful in new markets requires some extra investments in SEO though. You’d better start thinking about multilingual SEO, if you want to be sure your website will be found and used well in other countries! Here, I’ll explain what multilingual SEO is, why it’s important and which elements it consists of.

What is multilingual SEO?

Multilingual SEO deals with offering optimized content for multiple languages or multiple locations. Let’s explain this with an example. Imagine you have an online shop: you sell WordPress plugins in many countries. To increase your sales in Germany, you’ve decided to translate your content into German and create a German site. Now, you have two variations of the same page: an English and German version. Pretty straight-forward, you’d say? Well, there’s more.

Especially if you want to target countries with similar languages or countries where multiple languages are used, this will pose some challenges. Let’s explore the situation displayed in the image below. This is a simplified example; there are obviously many more potential audiences than we’ve included, like British users.

multilingual SEO

Multilingual SEO scenario: targeting audiences with German and English content

Obviously, you want people who search in German to be directed to the German site. Maybe you even want to have a specific site for German speakers in Switzerland. It would be even better to have a French alternative for speakers of French in Switzerland as well, of course. Let’s assume for now that you don’t have the required resources for that, though. In that case, it’s probably best to send users from Switzerland who speak French to the English site. On top of that, you need to make sure that you send all other users to your English site, as they are more likely to speak English than German. In a scenario like this, you need to set up and implement a multilingual SEO strategy.

Because it’s not easy to get the right website ranking in the right market we decided to set up a Multilingual SEO training, which will be available soon! In this course we’ll guide you step by step through all important multilingual SEO elements. Don’t miss the launch, subscribe to our newsletter now!

Why is multilingual SEO a thing?

You want your website to be found with Google. In a standard SEO strategy, you optimize your content for one language: the language your website is written in.  Sometimes, however, you want to target audiences in multiple countries and regions. These audiences are probably similar, but there are always differences. This presents you with an opportunity. By targeting your audiences specifically, it is easier to address their needs. One of these differences is the language they speak. When you make your site available in several languages and target specific regions, you achieve two things:

  • You expand your potential audience;
  • You improve your chances of ranking for a specific region and in several languages.

Let’s revisit the example we discussed before in light of this. By making a German variation of your original English site, you’ve made it possible for users searching in German to find your product. In the end, multilingual SEO is all about addressing the needs of your users.

It all sounds rather clear-cut, but multilingual SEO can be hard. A lot can go wrong, and a bad multilingual implementation can hurt your rankings. This means that you need to know what you’re doing.

One of the biggest risks of multilingual SEO is duplicate content. If you present very similar content on your website on multiple pages, Google won’t know which content to show in the search engines. Duplicate pages compete with each other, so the individual rankings of the pages will go down. You can avoid this particular issue with hreflang, an element of your multilingual SEO strategy. But there’s more to multilingual SEO. Let’s discuss the main aspects below.

Multilingual SEO: content, domains and hreflang

Content for international sites

Content is a very important aspect of your multilingual SEO strategy. If you want to write content in different languages, you’ll need to adapt existing content or create new content. Adapting your content while maintaining good SEO can be a challenge.

Your content strategy should always start with keyword research for the region and language you’re targeting. You can’t just translate your keywords using Google Translate. You’ll have to get inside the heads of your new audience. You need to know which words they are using. Same words can have different meanings in languages used in multiple countries, as my colleague Jesse explained before.

Translating content is a challenge as well. Take into account the cultural differences that exist between countries. Otherwise, your copy won’t be appealing to your new audience. If possible, you should have native speakers translate or at least check your translated content to prevent your from making awkward mistakes. If you want a complete list of what to consider when translating content read Marieke’s post on how to create SEO-friendly copy in a foreign language.

Domain structure for international sites

To successfully target your audiences, you need to consider which pages you want them to land on. There are several options as to what domain structure you’re going to use. Do you need to get the ccTLD (country code Top Level Domain) like example.de for Germany? Or could you create subdirectories for countries like example.com/de? Or, will you use a subdomain like de.example.com? And what about countries where multiple languages are spoken? How do you set up a domain structure for those countries?

There’s a lot you have to consider to take these decisions. This is where domain authority, but also the size of your business and marketing capacities in your target countries come into play. If you want to really dive into this, you should check out our Multilingual SEO training, that we’ll launch February 7!

Hreflang

Hreflang is the technical implementation you’ll need to put in place if you’re offering your content in multiple languages. Simply put, you’ll tell Google which result to show to whom in the search engines. It’s not as easy as it might sound though and this is something that often goes wrong, even on the big sites. Joost wrote an extensive post on how to implement hreflang the right way.

International ambitions? Get your multilingual SEO right!

Multilingual SEO focuses on optimizing content for different languages for the search engines. With a proper multilingual SEO strategy, people in different countries will be able to find your website for their market, in their native language. Multilingual SEO can be hard though and you need to know what you’re doing. It touches on a lot of different aspects of website optimization. If you really want to get it right, take our Multilingual SEO training!

Read more: ‘How to create SEO friendly copy in a foreign language’ »

 

The post What is multilingual SEO? appeared first on Yoast.

Low-quality pages are pages that don’t contribute much to your SEO. In most cases, these pages add little value for your visitors as well. You can have different types of low-quality pages on your site, sometimes without even knowing it. Like thin content – pages holding little information – and duplicate content – pages showing the same information as on other pages. Especially the latter can work against you if you want to rank well. Read how to find and fix those pages here.

What are low-quality pages?

In general, thin content pages aren’t useful for your visitors nor the search engines. That could be because these pages hold little information, or contain just an image, like most attachment pages in WordPress. These pages are only used as a placeholder for an actual image. They are often linked when clicking, for instance, an image on a WordPress blog.

The second type of low-quality content is duplicate content. The same goes for these duplicate pages: they add little value. Their content is already in Google’s index, on your site or another site. These low-quality pages can have a strong influence on your site’s rankings. Google might even penalize you for having them. 

Learn how to write awesome and SEO friendly articles in our SEO Copywriting training »

SEO copywriting training Info

In addition to these indicators of low-quality content, there’s a third issue that you can fix yourself: poorly written content. Google gave us a kind of checklist in 2011 already. I think most of what’s in there is still relevant seven years later.

Panda

We’ve written quite a bit about Google’s Panda update. We’ve seen our share of websites whose rankings dived being hit by that algorithm update. The Panda update handles quality control, so to say. If your website has a lot of low-quality pages, you can bet on it that Google will someday find these. All of a sudden, your website drops a few or even a lot of places in Google’s rankings. You’re not sure why, and then you remember this post. It might be your low-quality content.

As Google has integrated this Panda update (that used to be on a specific day) in its algorithm, so it’s sometimes hard to find the exact reason for the drop. But be sure to analyze if you have any low-quality pages first. It makes all the sense in the world to me that if Google considers the majority of your pages thin content, it will lower your rankings.

How to identify low-quality pages

It’s pretty hard to give you one trick, or one application to identify the pages you want to address because we’re talking about all the pages that don’t help Google and your visitors.

If we’re talking about duplicate content, please read our article on it: Duplicate content: causes and solutions. You might have duplicate content without even knowing it! Tools like Copyscape are your first help, but please investigate a bit more like described in the article.

If you want to rule out attachment pages in WordPress, you should simply query your site in Google:
Low-quality pages: attachments

If you use this as a query – replace example.com with your domain – it will return all attachment pages that are indexed for your website (or none, which is good):
site:example.com inurl:attachment_id

Screaming Frog

One of the main tools I use myself to identify low-quality content is Screaming Frog SEO Spider. After clicking through a website for some time, you will learn what the default page structure is, perhaps remember the main pages’ URLs and their structure.

When you run a query for your website in the SEO spider, you will get a list of all the URLs on your site. Now scroll through that list and visit every URL that makes no sense to you. The thing is that low-quality pages often occur in groups, not as a single page. Think along the line of old .html pages where you end your URLs with a trailing slash now. Think some attachment pages, think anything with too many numbers in it. These should all make you feel suspicious. Visit the page, see if it shows low-quality content that shouldn’t be on Google. Test if these pages are indexed and see if there are more pages like them. Just go about it like that and if present, you’ll find these low-quality pages in no-time.

Moz describes an even more in-depth analysis of low-quality pages in one of their Whiteboard Fridays you might want to check as well, by the way.

How to fix low-quality pages

Here’s where logic comes in and you’ll need to trust your instincts in some way. You’ll need to determine if you still need these pages and what you want to do with it.

Remove pages (periodically)

Step one will be to find out if you need these low-quality pages. This isn’t one-time maintenance; I’d recommend that you’d do this, for instance, once a year – depending on how much content you write per year, obviously. If you are using a content management system, it pays to check your first posts, from way back. If you find any posts that have no use anymore because they don’t touch your current business anymore, it is probably safe to remove those.

What to do with the URLs? If they still receive a decent amount of traffic, redirect them. To a similar page or post if possible, otherwise to a related category or tag page, and if all of that doesn’t fit, to the homepage. If there is little to no traffic, simply remove them and let Google find the 404 or 410 error message. Your page will vanish from the search results and Google will be able to focus on relevant pages on your site instead.

Noindex

If the page itself still holds relevant links to other parts of your website and has some traffic due to, for instance, links from other websites, why not use noindex, follow in your robots meta tag. This way Google can find the page, follow the relevant links, but it will keep the page itself out of the search results. Note that this is a different approach than merely deleting the page.

Write better content

Oh, the obvious. Write better content, write unique content. Try to become the source for people instead of copying that source. If you write unique, insightful, useful content, people will be much more inclined to share that content on social media and link to it. Google will see that content as an addition to their index. There’s a lot you’ll have to do yourself, but our Yoast SEO plugin guides you with the readability analysis as well, and we offer courses like SEO Copywriting that will give you plenty of insights on how to write more engaging, better content as well.

All of this will give Google a website that truly helps their visitors, and in the end, simply answers their question. As soon as you have cleaned up all that low-quality content and all high-quality pages surface in Google, you know you’ve made yet another sustainable step towards better rankings. Have fun!

Read more: ‘Content SEO: the ultimate guide’ »

The post Low-quality pages and how to fix them appeared first on Yoast.