Duplicate Content Detector
Paste 2 URLs: the tool fetches both pages, extracts editorial text, compares via 4-word shingles. To check whether content was copied.
2 URLs to compare
Also worth exploring
All toolsMeasure your English text readability: Flesch Reading Ease, Gunning Fog, long sentences, lexical diversity, complex words. Calibrate tone to your audience.
Extract title, meta description, H1 and H2 from up to 50 pages at once. Ideal for auditing the editorial consistency of a category or a competitor.
Measure whether the gap between two variants (titles, layouts, snippets) is statistically significant. Two-proportion Z-test with p-value and confidence interval.
E-A-T score based on objective signals: identified author, published and modified dates, citations to authority sources, Article+Person Schema.org markup.
Frequently asked questions
How does it work technically? +
The tool tokenizes the text of both pages (chrome stripped, lowercase, accents removed), then generates shingles (sequences of 4 consecutive words). The Jaccard score is the ratio of common shingles / total shingles. ≥ 85% = identical (pure copy-paste). 50-85% = near-duplicate (recycled paragraphs). 20-50% = derived (paraphrase, shared topic). < 20% = original.
Why 4-grams? +
The 4-word shingle is the academic standard for duplicate detection (Broder 1997 paper). Too short (2-3 words) and you catch noise (« the », « and the »). Too long (8-10 words) and a light paraphrase breaks all shingles. 4-grams capture full or near-full sentences — exactly what copy-paste produces.
What if the page is translated from another language? +
The tool only detects word-for-word duplication (or near). A FR→EN translation won't be flagged — by design, otherwise we'd false-positive on every multilingual site. To detect translation-copy, you need a multilingual embedding approach, not feasible in pure JS without a backend.
My content and a competitor's are at 35% — is that bad? +
Often no. On a niche topic, two articles will cite the same facts, sources — 30% shingle similarity is coherent. The red flag is above 60%: it means the sentence structures themselves are recycled, not just the facts. If you see a score > 70% on an article you know is original, check the shared shingles — often boilerplate (signature, footer, CTA) polluting the score.
Unique content deserves unique placement
When we write an original article on the Stringer network, we write it once and publish it exclusively on one network media. No syndication, no duplication.