Updated: 8-20-23
Stefan Bauschard, LinkedIn, AIBoot Camp (Adults), Educating4ai.com (grades 6-12)
Two of the main “signals” plagiarism detectors rely on are “perplexity” and “burstiness.”
Perplexity is how much a sentence varies from the “next word” prediction. As we know, LLMs “predict the next word.” If a writing sample appears to use a lot of common “next words” it is probably produced by an AI-writing tool.
Burstiness is how much the structure of the sentences varies across a piece of writing. LLMs generally have low burstiness, using similar sentence structures across an essay.
Acting on a tip, I tried an experiment where I first asked ChatGPT4 to write a common essay.
I removed the “Paragraph 1, 2, etc” notes so as to note to. make it obvious to a detector.
ZeroGPT, a common detector, caught it.
I think asked it to rewrite the essay with a high degree of “burstiness” and “perplexity”
ZeroGPT Detector Score:
Now, this is very perplex and “bursty,” so as a student I’d probably try to limit the “flowerly” language ‘a bit’ before I turned it in.
And if “perplexity” and “burstiness” are too much for you, you can “ “elevate the provided text by employing literary language.” From: GPT detectors are biased against non-native English writers
P.S. If you think about it, this is why these systems are more likely to tag work written by English language learners (and anyone with weaker language skills) as being written by AI: They are more likely to use common words that follow and to have less sentence variation.
P.S.2. Do you know of any other common detection measures? Try to write a simple prompt to circumvent it. It can also be instructed to write with grammar errors :).