Copyleaks uses a complex AI algorithm to detect patterns in writing. These patterns are present in both human and AI-generated text, but differ in ways that allow for detection.
Some of the patterns that our AI detector identifies are:
- Frequency ratios: Our detector compares against massive datasets; some composed of all human-written content and some composed of all AI-written content. It then searches for phrases which are more commonly used by AI over humans.
- Parts of Speech: Our algorithm analyzes the grammar and syntax of written content.
- Syllable Dispersion: Our detector examines the average number of syllables and how they are dispersed throughout the text.
- Hyphen Use: Our algorithm looks for patterns in how hyphens are used.
The Copyleaks algorithm analyzes a large number of different patterns in addition to those above. It quantifies these patterns and uses complex models to determine if a text is likely AI-generated. It is also sensitive to language and writing level; making adjustments based on these factors. Because we’ve trained our AI on billions of human-written documents collected over the past 10 years (both before and after the proliferation of generative AI in late 2022) our algorithm is able to identify differences between a human dataset and the more recent AI writing dataset.
AI Insights
AI Insights is a feature of our AI detector that was released in Oct. 2024. It is a first-of-its-kind, patent-pending feature that provides more transparency and understanding of why something has been deemed as likely AI. Instead of simply highlighting sections of text as AI or not AI, we can now display a heat map of phrases that are more commonly found in AI-generated content versus human-written content.
Key aspects of AI Insights and the AI portion of Copyleaks reports include:
- Overall Score: The overall score represents the total percentage of the document that is identified as probable AI, based on a 99% accuracy rate. Note, this is not a confidence score- if, for example ChatGPT was used to write 50% of the document and the other half was written by a human the overall score would be 50%.
- Heat Map: The heat map highlights phrases that appear more frequently in AI-generated text compared to human-written text. The darker the color, the higher the likelihood of the phrase being AI-produced.
- Phrase Frequency: On the right side of the report is displayed the highlighted phrases from the document in order of frequency.
- Frequency Ratio: When clicking on a phrase, raw data is displayed showing the frequency ratio between its use in AI and human-written content. For example, a phrase may be used 1,000 times more in AI-generated content than in human-written content.
- Resources Tab: A resources tab is included in the report that provides links to the Copyleaks help center, blog posts, and AI detector FAQs. This tab also includes a disclaimer about AI-powered writing tools such as Grammarly that may be flagged as AI because they use the same models as ChatGPT.