AI Detection Tools in Schools: What Parents Need to Know
Schools use Turnitin, GPTZero, and other AI detectors to flag student work. Here's what the research says about their reliability, and what that means for your family.
A 2024 Center for Democracy and Technology survey found that 68% of teachers use AI detection tools regularly to evaluate student work. Turnitin, GPTZero, Originality.ai, and similar products have become a standard part of how schools are responding to widespread AI use among students. Most families have no idea how these tools work, how reliable they are, or what recourse exists when they flag work incorrectly.
Understanding AI detection tools is now a practical necessity for any family whose child uses AI in any form for school, including for research, brainstorming, or editing, not just for generating complete drafts.
How AI detection tools work
AI detection tools generally operate by analyzing statistical patterns in text. AI-generated text tends to produce language that is highly predictable -- the next word is usually what a probability model would have selected. Human writing is less predictable; it includes unexpected word choices, tonal shifts, idiosyncratic constructions, and the kinds of small stylistic decisions that reflect individual voice.
Detection tools compare a piece of text against these statistical patterns and produce a probability estimate: this text looks like AI-generated output, or it doesn't. The output is often a percentage score or a confidence level.
The fundamental limitation is that this is a probabilistic judgment, not a deterministic one. Some human writing is very regular and will score high. Some AI-generated writing is prompted toward irregularity and will score lower. The score is evidence of a statistical pattern, not proof of AI use.
What the research shows about accuracy
A 2023 Stanford study by Liang and colleagues tested seven major AI detection tools against a set of TOEFL essays written by non-native English speakers. The results were stark: the tools falsely flagged 61.22% of those essays as AI-generated. Non-native English speakers write with more regular, predictable patterns -- patterns that overlap with what detection tools flag as AI characteristics.
A follow-up analysis by Common Sense Media found that 20% of Black teen respondents reported having work flagged as AI-generated, compared to 7% of White teen respondents. The disparity reflects the same underlying mechanism: the statistical patterns detection tools are trained on are not neutral. They reflect and penalize writing styles that diverge from the dominant register those tools were calibrated on.
It's worth being accurate about the scope of these findings: the Stanford study used 91 essays from a single Chinese educational forum, and detection tools have been updated since that research was conducted. The tools' developers have claimed improvements. What hasn't appeared is independent peer-reviewed evidence establishing that those improvements have eliminated the reliability problems in real-world conditions. The self-reported improvements exist. The verified improvements don't yet.
GPTZero, one of the most widely used tools, has publicly stated that it aims for false positive rates under a certain threshold. That's a reasonable goal. What matters for families is understanding that a threshold is not zero, and that their child could be below that threshold while still being flagged in an academically consequential way.
What happens when work is flagged
The process varies enormously by school and district. In the best cases, a flag triggers a conversation with the student, an opportunity to explain their work process, and a judgment by the teacher that considers multiple factors. In worse cases, a flag leads directly to a disciplinary referral with the detection score as the primary evidence.
The Newby v. Adelphi University case illustrates what can happen in the latter scenario. A first-year college student had his work flagged as 100% AI-generated by Turnitin. He had worked extensively with disability support tutors and had done the work himself. His family spent $100,000 in legal fees before a court ruled in January 2026 that the accusations were without valid basis and ordered his record expunged.
The Harris v. Adams case at Hingham High School, filed in federal court in 2024, offers a more complicated picture: a student flagged for AI use, with both sides presenting evidence that the court weighed. That case is a reminder that detection disputes are not always straightforward, and that the facts matter in each specific situation.
What families should do
The most practical response is not adversarial. It's preparatory. Teach your child to document their process on any AI-assisted work. Notes about what they searched for, what they wrote themselves, how they used AI as a tool versus as a producer -- these take minutes to create and provide the kind of specific evidence that a detection score cannot.
The second response is informational: know your school's policy before an incident. Ask directly whether the school uses detection tools and what the process is when work is flagged. Ask whether detection scores alone are sufficient to begin a disciplinary process, or whether additional evidence is required.
The third response is to prepare your child to explain their work. A student who can discuss what a paper argues, why it makes certain claims, and what sources support those claims is demonstrating engagement that a detection score cannot demonstrate. That ability to explain is both a learning goal and a protection.
Detection tools are going to be part of educational life for the foreseeable future. Families who understand their limitations are better positioned than those who don't.
Sources cited in this post:
- Dwyer, M. and Laird, E. (2024). "Up in the Air: Educators Juggling the Potential of Generative AI with Detection, Discipline, and Distrust." Center for Democracy and Technology. https://cdt.org/insights/report-up-in-the-air-educators-juggling-the-potential-of-generative-ai-with-detection-discipline-and-distrust/
- RAND Corporation (2025). "AI Use in Schools Is Quickly Increasing but Guidance Lags Behind: Findings from the RAND Survey Panels." RAND Corporation. https://doi.org/10.7249/RRA4180-1
- Liang, W., et al. (2023). "GPT Detectors Are Biased Against Non-Native English Writers." Cell Press Patterns. https://doi.org/10.1016/j.patter.2023.100779
- Common Sense Media (2024). "The Dawn of the AI Era: Teens, Parents, and the Adoption of Generative AI at Home and School." Common Sense Media. https://www.commonsensemedia.org/research/the-dawn-of-the-ai-era-teens-parents-and-the-adoption-of-generative-ai-at-home-and-school
- Harris v. Adams, Case No. 1:24-cv-12437 (D. Mass. 2024).
- Nassau County Supreme Court (2026). Newby v. Adelphi University. Ruling: January 29, 2026. https://www.cbsnews.com/newyork/news/orion-newby-adelphi-university-ai-plagiarism-accusations/