AI Writing Detectors Are Not Reliable and Often Generate Discriminatory False Positives

Teachers and schools are being tricked into wasting time and money on these tools that can be better invested in training faculty.

Aug 20, 2023

Related: How to defeat writing detectors with a few simple steps, plus how to add spelling and grammar errors to your output to trick your teacher.

Last updated: 3/29/24

I substantially updated this post after the American Federation of Teacher’s decision to promote an AI text detector. This updates and re-organizes the extensive criticism of such approaches. The evidence cited here is based on studies, and the studies are linked throughout the text. You can find plenty of expert testimony that challenges the value of these detectors here.

Relying on AI-text detectors to stop students from using AI to write their essays and papers is a lost cause, as the detectors will generate a number of false positives and false negatives, even if there is nothing done to perturb the output of the generated text, as the tools only look for patterns in the text output, and no pattern is certainly produced by AI. This “pattern” is only becoming less and less common as the tools advance to mimic the human writing of the author (see below). Since there is no certain pattern, false positives (false identification of an AI writing pattern by a student) and false negatives (failure to detect AI-text) are often produced.

AI Text Detectors Are Discriminatory

In my mind, the biggest problem with these detectors is that they are discriminatory. The false positive patterns are more likely to be found in the writing of non-native speakers who often write in the pattern the detectors are trained to look for, resulting in discriminatory false positives and discriminatory false accusations (Liang et al, July 2023).

They are biased against non-native speakers. “Our findings reveal that these detectors consistently misclassify non-native English writing samples as AI-generated, whereas native writing samples are accurately identified. Furthermore, we demonstrate that simple prompting strategies can not only mitigate this bias but also effectively bypass GPT detectors, suggesting that GPT detectors may unintentionally penalize writers with constrained linguistic expressions.” Liang et al. T

So you can’t think, “no harm, no foul.” This is from the Liang article —

A most recent study (October 23, 2023) concurs:

[Note: 149 is Liang et al. above; 72 is Wang, Yuxia, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Chenxi Whitehouse et al. "M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection." arXiv preprint arXiv:2305.14902 (2023); 148 is Chaka, Chaka. "Detecting AI content in responses generated by ChatGPT, YouChat, and Chatsonic: The case of five AI content detection tools." Journal of Applied Learning and Teaching 6, no. 2 (2023).]

Unsurprisingly, it’s also causing disproportionate disciplinary referrals for special education students. Off Task: EdTech Threats to Student: Privacy and Equity in the Age of AI (September 2023)

Major Universities Pull the Detectors

This false-positives issue is well-known and well-established, so as a school administrator who may face a potential lawsuit based on wrongly penalizing a student based on an AI writing detector, you cannot claim ignorance/that the false-positives problem was not known.

This is why schools such as Vanderbilt University, the University of Pittsburgh, Michigan State, and the University of Texas, as well as Australian universities, have woken up to these problems and are finally moving away from the detectors.

Open AI recently (7/23/23) pulled its own detector because it just didn’t work and was discriminatory.

Teachers and schools are being tricked into wasting time and money on these tools that can be better invested in training faculty to teach with AIs and develop new approaches that will not expose them to well-deserved lawsuits.

Detectors Prevent a Shift to How We Need to Teach Students

AI writing tools are already integrated into industry and accessible to a billion people (Google and Microsoft are integrating these writers into their products, and they each have a billion users). People who write successfully in the future will be able to do so with the tools, not without them. If we attempt to teach students to write without the tools, we are just teaching them to write in a way they will not write in the future. So, yes, we are pointing unreliable and discriminatory detectors at students, resulting in false accusations while simultaneously undermining their job skills.

The Detectors are Not Accurate, Even with No Human Disturbance of the Output

The next set of reasons deals with problems related to their accuracy that are unrelated to bias and don’t rely on students making small disturbances in the output to defeat the detectors.

There are two reasons the detectors are not always accurate, even when confronted with text that is entirely either human or machine-generated (though misclassifications in these instances are less common).

First, detectors often fail when confronted with what are called “out-of-distribution challenges” (Wu et al., October 22). The term "out-of-distribution" (OOD) primarily pertains to machine learning and refers to the challenge posed by data that significantly diverges or differs from the distribution of the training data. In simpler terms, when a trained machine learning model encounters input data that it hasn't seen before or that is significantly different from its training data, it's facing an out-of-distribution challenge.

The problem is that machine learning models, especially deep neural networks, tend to be overly confident in their predictions, even when faced with OOD data (see Guo et al, 2017). This means that a model might produce a high-confidence prediction for an input it has never seen before or doesn't know how to handle correctly (AKA, in this context, a false positive).

Second, since so much training data is now a mix of human-written and machine-written data, it is starting to become more difficult to train on the two diverse sets of data so that conclusions can be drawn with much confidence. This problem is confounded by the fact that data that was originally misclassified can then be used to further train detectors (Alemmohammad).

More studies on the accuracy problem: Kumarage (2023); W. Antoun et al (2023); Yi et al (2023); Sadasivan (2023) — pointing out that they fail so badly that they could cause reputational damage to their developers; Liang (2023); Ren (2023); He (2023; Shi (2023) ; Koike (2023). There are quotes from some of these studies below.

Grrr…On the one the American Federation of Teachers is promoting: ‘ We also express concerns over 52 false positives (of 114 human-written submissions) generated by GPTZero. Finally, we note that all LLM-generated text detectors are less accurate with code, other languages (aside from English), and after the use of paraphrasing tools (like QuillBot).” Orenstrakh 2023

[Krishan (2023) claimed it had a defense against using paraphrasing to beat the detector, but Sadasivan (2023) pointed out it can be defeated by re-paraphrasing (something a paid-for tool will know how to do].

So, even when an AI-writing detector confronts text that is entirely human-generated, it may still falsely classify it, though, as noted, this is less common.

Students Can Easily Defeat the Detectors

[Note: This is obviously more likely to be done by high-SES students who can afford services such as undetectable.ai and more academically advanced students who know how these tools work.]

The next set of these problems assumes students make at least some effort (not a lot) to disguise the writing, which most students are doing.

If you read no further, just read the most recent (July 2023) study on this:

A recent (March 28, 2024) study showed the following:

Studies such as the one above are a “dime-a-dozen” and are cited throughout this paper. Also, some of the studies above that discuss general failure rates also talk about adversarial action to perturb the content.

I didn’t need such a study; I defeated them with two words by slightly perturbing the text — I asked it to increase the “perplexity” and “burstiness” of the output in order to fool the detector, which looks for low scores in those areas when flagging something as, “AI Written.” And adding the simple phrase “elevate the provided text by employing literary language” (Liang et al) will trip up the detectors.

It’s also impossible for these detectors to reliably work because ChatGPT4 and other tools can be trained to write in a person’s own voice. For example, you can say, Write in the style of SAMPLE. This will work better and better as technology advances and students start working with their own learning bots. Claude2 from Anthropic makes it super easy.

Note: “Write in your own voice and style';” “Write like a human.”

TurnItIn Asia’s director has acknowledged as much. And Sam Altman said OpenAI’s own detection tool is nothing more than a stopgap measure.

Apple’s new developments also permit personalization: “Apple's new transformer model in iOS 17 allows sentence-level autocorrections that can finish either a word or an entire sentence when you press the space bar. It learns from your writing style as well, which guides its suggestions.” Edwards. The same article claims it can write personal stories based on the photos in an iPhone.

Students can significantly reduce the probability of detection by randomly integrating some of their own writing and running it in and out of text translators. They’ve been doing that with copied and pasted text for years.

“In this paper, we systematically test the reliability of the existing detectors, by designing two types of attack strategies to fool the detectors: 1) replacing words with their synonyms based on the context; 2) altering the writing style of the generated text. These tactics involve giving LLMs instructions to produce synonym substitutions or write directives that alter the style without human intervention, and detectors can also protect the LLMs used in the attack. Our research reveals that our attacks effectively compromise the performance of all tested detectors, thereby underscoring the urgent need for the development of a more robust machine-generated text detection system.” Shi et al

“We propose a novel Substitution-based In-Context example Optimization method (SICO) to automatically generate such prompts. On three real-world tasks where LLMs can be misused, SICO successfully enables ChatGPT to evade six existing detectors, causing a significant 0.54 AUC drop on average. Surprisingly, in most cases these detectors perform even worse than random classifiers.” Liu

“In this paper, both empirically and theoretically, we show that these detectors are not reliable in practical scenarios. We show that paraphrasing attacks, which use a light paraphraser on top of a generative text model, can break a wide range of detectors, including those that use watermarking schemes, neural network-based detectors, and zero-shot classifiers.” Sadasivan

Strong prompting can also break it –

“The study reveals that although the detection tool identified 91% of the experimental submissions as containing some AI-generated content, the total detected content was only 54.8%. This suggests that the use of adversarial techniques regarding prompt engineering is an effective method in evading AI detection tools and highlights that improvements to AI detection software are needed.” Perkins

New apps such as Conch.ai and Undetecble.ai are being marketed to students to help them write substantial chunks of their papers and even guarantee they’ll pass the detectors. These tools can do things cited above in the studies noted above such as rephrasing.

Any hope for these detectors is eliminated by how the copilot writing tools actually work. They will “co-write” text with you and enable a person to write in their own voice. Google/Bard Co-Pilot Microsoft Co-Pilot. This reinforces #2 – what percentage of text do you allow?

When someone “co-writes” with Google Docs (“Help me Write”) or (soon) Microsoft Word Copilot, are you going to try to say they used random words and sentences from AI writing tools?

The new Quillbot writing assistant features substantially reduce the workability of detection because you can substantially alter the text very easily by choosing different styles.

Don’t use Detector Results to Punish Students

There are a number of different issues related to taking action against students, even if it’s determined that AI text may have been used.

First, no detector predicts more than a probability of plagiarism. What’s your probability threshold for giving a student a failing grade or launching a plagiarism investigation? Do you only use one detector or many? They often give inconsistent results.

Second, there is no way to determine what percentage of AI-text should be allowed. How much of the text must be AI-written before it is considered cheating? 100% 80% 50% 10%? Does it matter if the sentences are interspersed with human-written sentences (or sentence fragments)? Is your answer that you allow zero? No AI text detector can establish that no text in an essay was written with AI. I’ve literally never seen it return a Zero score.

Fourth, there is no way to prove it. Large language models won’t ever reproduce the same exact text output twice; these aren’t databases! And even if they did, how would you prove a human didn’t also write it?

The detectors produce a lot of false positives. Almost everyone who has experimented with one of these detectors has tried putting in some of their own work and had it return a high probability that it was written with AI. Do you want to falsely accuse students?

The detectors produce a lot of false negatives. Given the number of false negatives it produces, is it really a solution to the problem?

Fith It’s too time-consuming. High school teachers I often speak to tell me that a majority of their students are using ChatGPT or similar AIs to complete at least parts of their assignments. Do you have time to investigate them all, including the false positives? I spoke with one teacher who evaluated 130 papers in 4 text detectors for one assignment!

Sixth, it creates an adversarial classroom. If faculty are constantly scanning and checking individual papers for cheating, an adversarial relationship is created between teachers and students. This is magnified when a large number of false positives. Tension between teachers and students will undermine learning.

Seventh, They end up just punishing your already weaker students because the more advanced students just know more about playing the games. And the wealthier ones can just buy the new apps, such as conch.ai

Garbage-In, Garbage Out

Some people have no idea what they are doing and don’t even use detectors properly. Multiple teachers and professors are putting papers into ChatGPT itself and asking if it wrote it.

A few problems.

(1) ChatGPT isn’t the only AI writing tool. A few other other big ones are Claude, Pi, and Bard/Google. I’v e actually written a paper with Claude and asked ChatGPT if it wrote it and it said, Yes. LOL.

(2) ChatGPT and other LLMs are not databases. They base output on statistical relationships among words. There is nothing for it to check against.

(3) It could never definitively answer, “yes” and “mean it” because the output is based on a statistical relationship amongst words!

Learning to write is still very important, but we need new approaches, not AI-writing detectors and punishments for suspected use.

Related

Else H. Abstracts written by ChatGPT fool scientists. Nature.2023;613:423

Why technical solutions for detecting AI-generated content in research and education are insufficient

Hype alert: new AI writing detector claims 99% accuracy

AI Detectors: Why I Won’t Use Them

The Use of AI-Detection tools in the Assessment of Student Work

We tested a new ChatGPT-detector for teachers. It flagged an innocent student.

Professors are using ChatGPT detector tools to accuse students of cheating. But what if the software is wrong?

What to do when you’re accused of AI cheating