In addition to site features and configuration issues that can generate duplicate content listings, there are additional external factors that can cause similar issues.
Review Embedding
What it is
It is known that providing reviews for products and services can be a highly effective quality signal to potential customers. Leveraging things like google or amazon reviews makes good sense from a conversion standpoint.
Why it matters
However, embeddable reviews may also be having a very significant impact on the amount of duplicate content in circulation. In theory Google and the other search engines should be able to distinguish the original source of such content but only if the original is indexed before it is duplicated. However, in practice this doesn’t always happen as it should.
What to do
- Ensure that all embed tools contain can rel=canonical links back to the original source of the text.
- Note also that Google also prohibits sites from using widget embeds as a link building tool.
CDN Indexing
What it is
A Content Delivery Network (CDN) can be used to improve site performance by providing a network of geographically distributed servers for caching and delivering your site’s content.
Why it matters
This means the location where these files are cached will have a different address than the ones on your host server. If these URLs are indexed search engines will see duplicate content.
What to do
- Ensure that the CDN subdomain is not indexable. Note that this should be undertaken after other indexation measures have been applied, as the wrong subdomain is better than no listing at all.