Aller au contenu principal
Free tool · Editorial content

Duplicate Content Detector

Paste 2 URLs: the tool fetches both pages, extracts editorial text, compares via 4-word shingles. To check whether content was copied.

2 URLs to compare

Similarity
0
Words URL 1
0
Words URL 2
0
Shingles URL 1
0
Shingles URL 2

Frequently asked questions

How does it work technically? +

The tool tokenizes the text of both pages (chrome stripped, lowercase, accents removed), then generates shingles (sequences of 4 consecutive words). The Jaccard score is the ratio of common shingles / total shingles. ≥ 85% = identical (pure copy-paste). 50-85% = near-duplicate (recycled paragraphs). 20-50% = derived (paraphrase, shared topic). < 20% = original.

Why 4-grams? +

The 4-word shingle is the academic standard for duplicate detection (Broder 1997 paper). Too short (2-3 words) and you catch noise (« the », « and the »). Too long (8-10 words) and a light paraphrase breaks all shingles. 4-grams capture full or near-full sentences — exactly what copy-paste produces.

What if the page is translated from another language? +

The tool only detects word-for-word duplication (or near). A FR→EN translation won't be flagged — by design, otherwise we'd false-positive on every multilingual site. To detect translation-copy, you need a multilingual embedding approach, not feasible in pure JS without a backend.

My content and a competitor's are at 35% — is that bad? +

Often no. On a niche topic, two articles will cite the same facts, sources — 30% shingle similarity is coherent. The red flag is above 60%: it means the sentence structures themselves are recycled, not just the facts. If you see a score > 70% on an article you know is original, check the shared shingles — often boilerplate (signature, footer, CTA) polluting the score.

Unique content deserves unique placement

When we write an original article on the Stringer network, we write it once and publish it exclusively on one network media. No syndication, no duplication.