What a canonical URL is
A canonical URL is the URL that a search engine should treat as the master copy of a piece of content. It is declared with a rel canonical link in the page head, or in HTTP headers, or in the XML sitemap. The directive tells Google that even though the content might be accessible at several different URLs, only one of them should be indexed and ranked.
Canonicals are a hint, not a guarantee. Google considers the canonical tag along with other signals like internal linking, redirects, and the sitemap. Most of the time it honors the canonical, but a strong contradicting signal can override it.
Why canonical URLs matter
Search engines do not want to rank ten near-identical versions of the same page. When duplicates appear, Google picks one to index and either drops the rest or treats them as alternates. If you do not control which version it picks, it may pick the wrong one. Your authority and link equity then concentrate on the wrong URL, which can have stale content, no analytics, or a worse layout.
The canonical tag puts the decision back in your hands. You declare which URL should accumulate ranking signals, and everything else points to it.
Common duplicate content scenarios
Tracking parameters: example.com/product and example.com/product?utm_source=email are the same page to a human and different URLs to a crawler. The product URL should be canonical to itself. UTM variants should not exist as separate index entries.
WWW versus non-WWW and HTTP versus HTTPS: only one variant should be the canonical, and the others should redirect with a 301.
Faceted navigation on e-commerce: filters like sort, color, and size often generate hundreds of URL variants for the same product list. Canonicals or noindex directives keep the index clean.
Syndicated content: if you republish a blog post on a partner site, the partner site should canonicalize back to your original to preserve your authority.
How to set canonicals correctly
Every page should have exactly one rel canonical tag. On unique pages the canonical should point to the page itself, with the exact URL the page is served from, including the protocol and the trailing slash convention. Self-referencing canonicals are not redundant, they are a defense against accidental duplication via tracking parameters.
Avoid chains. A canonical pointing to a page that itself canonicalizes elsewhere confuses crawlers and dilutes the signal.