We Tested the AI Chatbots Georgia Parents Are Already Using. Here’s What Went Wrong, and What Went Right

By Michael Waller

A young mother came to us recently after her elementary-school son was suspended for ten days for fighting. She was scared and didn’t know what to do. Before she called Georgia Appleseed, she asked ChatGPT for help.

The chatbot told her that her son’s school had probably violated federal law. It said the school should have tested her child for a learning disability because he showed signs of ADHD. It drafted her a letter — a threatening letter that accused the school of violating the Individuals with Disabilities Education Act and demanded immediate action or the family would pursue legal remedies.

Here’s the problem: none of that was right. The child has some behaviors associated with ADHD, but they are subtle — his mother had never asked the school to evaluate him, and nothing in the facts she described to the chatbot suggested an obvious disability the school should have noticed or triggered a legal responsibility address. The chatbot assumed far too much and produced a letter so aggressive it could have permanently damaged this mother’s relationship with the teachers and administrators her son sees every day.

If she had sent that letter, she would have walked into her next parent-teacher conference as the mom who threatened to sue. And her son would still need those adults on his side.

She didn’t send it. She called us instead. Her son went back to school the following week.

Why We Tested ChatGPT, Claude, and Gemini

That mother’s experience isn’t unusual. Over the past few months, many of the parents and caregivers who contact us for help have already consulted ChatGPT, Claude, or Gemini. Some of the advice they’ve received has been reasonable. Some has been wrong in ways that worry us.

So I decided to test it myself. I gave three AI chatbots — ChatGPT, Claude, and Gemini — four detailed questions based on real conversations that Georgia parents have had with our own AI tool, Seedmore. I designed the questions to reflect the situations we see most often: a child facing a long suspension, a foster child struggling with enrollment, a parent who suspects her child needs special education services, a family navigating a disciplinary tribunal.

Three raters — a staff attorney, an advocate, and an AI model — used blinded scoring to independently evaluate each of the twelve responses across six dimensions: legal accuracy, actionability, referrals to counsel, plain language, appropriate caveats, and risk of harm. (See our methodology and results.)

What the Chatbots Got Right

All three models understood federal education law reasonably well. They could identify relevant statutes — IDEA, Section 504, the Every Student Succeeds Act. On a 40-point scale, overall scores ranged from 26.81 (ChatGPT) to 31.27 (Claude), with Gemini at 28.79. Plain language was a consistent strength: all three chatbots scored 4.42 out of 5 on writing accessibly for stressed, non-lawyer parents. If you need a general overview of federal special education law, these chatbots can provide one.

One model even cited Georgia-specific statutes in one scenario and got them right — including SB 431, a foster care enrollment bill Georgia Appleseed helped pass through the Georgia General Assembly this session. That was impressive. An AI model had learned about a piece of Georgia legislation Georgia Appleseed helped shape and move, within weeks of its passage out of the legislature — even though, as of this writing, the bill is still awaiting the governor’s signature.

What They Got Wrong

But the errors we found were not small. Individual raters flagged five of twelve responses as potentially harmful — meaning a parent who followed the advice could miss a deadline, waive a right, or damage her position. ChatGPT drew three of those flags; Claude and Gemini each drew one.

The chatbot played doctor and lawyer at the same time. In the scenario drawn from the mother’s story above, one chatbot diagnosed a child with a disability based on limited behavioral descriptions, then built an entire legal strategy on that diagnosis. A competent lawyer would never do this. A lawyer would ask questions first: Has the child been evaluated? Has the parent requested an evaluation? What has the school observed? The chatbot skipped all of that, jumped to a conclusion, and drafted a letter that could have ruined the family’s relationship with the school.

Georgia law was mostly invisible. Federal law gives parents certain rights. But what a parent in Georgia actually needs to know is how the local disciplinary tribunal works, what the timelines are under Georgia’s specific hearing procedures, and whom to contact. The chatbots rarely cited Georgia statutes, local legal aid organizations, or advocacy groups. One model missed a Georgia law that would have told a foster parent exactly how to enroll his child in a new school — information our advocates provide routinely.

No model consistently said “call a lawyer.” When our attorney reviewed the responses, the most common note was: the chatbot should have told the parent to seek legal help. In every scenario, the correct answer includes a referral — to Georgia Appleseed, to a legal aid organization, to a local attorney. The chatbots answered as if they were the last stop, when they should have been the first signpost.

Why This Matters

According to the Georgia Office of Student Achievement, more than 130,000 Georgia students are suspended or expelled every year — and for many of their families, the process is confusing and the consequences are severe. For those parents, an AI chatbot may be the only source of legal guidance they can access before a disciplinary hearing. Georgia Appleseed provides about 300 families a year with legal services and trains around 1,000 advocates. We cannot be everywhere. The chatbots already are.

That’s why we built Seedmore, our own AI tool, trained specifically on Georgia education law and designed to refer families to us when the question is too complex for a chatbot to handle safely. But Seedmore isn’t what most parents will find first. ChatGPT, Claude, and Gemini are.

What Comes Next

This was a pilot: four scenarios, three models, twelve responses. It showed us enough to know the problem is real and specific enough to measure. I am now designing a larger study with more scenarios and more raters to produce findings that AI companies and policymakers will act on. That study will also include scenarios in Spanish to support the more than 155,000 children are enrolled in English-learner programs in Georgia public schools.

Few legal services organizations in the country, much less Georgia, have the capacity to build AI tools, fewer still are testing them. But this work needs to happen in Georgia. These are our families. If we don’t test whether AI chatbots are giving Georgia parents accurate advice about Georgia law, we can’t expect companies in Silicon Valley to do it for us.

No parent should have to choose between bad advice and no advice at all. If you are a funder, a researcher, or an AI company that takes this seriously, I’d like to hear from you.

Contact Michael Waller at mwaller@gaappleseed.org.