A French teacher recently took on an unusual challenge: grading a high school philosophy final written entirely by ChatGPT. The results were surprising and sparked a lively debate about the role and reliability of artificial intelligence in education. While the teacher gave the essay a modest score, several AI tools rated the same work much higher. What can this tell us about AI in testing and learning?
Grading a ChatGPT philosophy essay: teacher vs. AI tools
Neither TV nor music: the top hobby for people over 60 to boost mobility
On June 16, during the French high school final exams, a regional France 3 news outlet decided to test ChatGPTโs abilities. They asked the AI to write a philosophy essay responding to the question: โIs the truth always convincing?โ The prompt instructed ChatGPT to produce a student-level essay with a clear introduction, development, and conclusion, including philosophical references and examples.
Once the essay was completed, a professional philosophy teacher graded it just like any other student paper. The teacher knew from the start that the essay was written by an AI but tried to offer an objective evaluation. The result was a score of 8 out of 20 pointsโa clear indication that the teacher found significant flaws.
Meanwhile, various AI-based grading tools scored the same essay much higherโranging from 15 to nearly 20 points. These tools praised the essayโs structure, clear argumentation, and coherence. None of the systems mentioned a major mistake that the teacher immediately flagged in the introduction, where ChatGPT slightly shifted the essayโs core question, which matters a lot in philosophy.
5 habits to avoid at all costs if you want to live a happier life after 50
Understanding the teacherโs critique of ChatGPTโs essay
From the teacherโs perspective, the essay suffered from a few key issues. The biggest was that ChatGPT altered the original essay question from โIs the truth always convincing?โ to โIs the truth enough to convince?โ In philosophy, even such subtle changes can completely shift the meaning and weaken the argument. This led the teacher to mark down the essay for misunderstanding the prompt.
Other concerns related to the essayโs logical flow. While ChatGPTโs writing followed the classic three-part essay form, the teacher found the transitions awkward and the arguments too formulaic. For example, phrases like โIn reality, things are more complicatedโ felt out of context. The conclusion, while circling back to the topic, seemed to lack genuine reflection on why truth alone might not convince everyone.
The teacher summarized the essay as too superficial to meet the standards of a rigorous philosophy exam, judging it less an insightful exploration and more a combination of rehearsed talking points. The final 8-point score reflected these reservations.
Why AI tools rated the essay much higher
The stark contrast between the teacherโs grade and AI toolsโ scores raises questions about AI evaluation methods. Several AI graders gave the essay scores between 15 and 19.5 out of 20. They applauded the clear tripartite structure, logical progression of ideas, and polished language. None flagged the critical error related to the shift in the central question.
What explains this discrepancy? AI grading tools seem to prioritize formal elements such as organization, grammar, and coherence over deeper philosophical accuracy. Since they operate on algorithms trained to recognize well-formed essays, the tools viewed ChatGPTโs writingโas AI-generated and well structuredโas high quality.
Itโs important to remember that AI grading can vary depending on the exact prompt wording, the toolโs training data, and the module versions. Even the same AI might produce different assessments of the same essay at different times.
Reflecting on AIโs role in education and exams
This experiment highlights some key thoughts for students, teachers, and anyone curious about AI in education. For one, it reminds us of the importance of context. A skilled human teacher can catch subtle errors and inconsistencies that technology might miss. At the same time, AI can offer helpful initial feedback, especially for formatting and clarity.
Personally, Iโm reminded of a time when I relied on automated spell checkers for a big school paper. While they caught surface mistakes, several confusing sentences went unnoticed until a patient teacher pointed them out. Human judgment still holds nuances that machines struggle to grasp.
The teacherโs clear bias knowing the essay came from AI cannot be ignored either. Would an unaware teacher have graded more leniently? Probably. But this bias could also push educators to sharpen their criteria and adapt teaching strategies as AI tools become more prevalent.
What do you think? Could AI write your best essay? Should AI tools be part of the grading process, or reserved for initial drafts and suggestions? Are human teachers irreplaceable when it comes to understanding deeper meaning? Share your experiences and thoughts belowโletโs start a conversation about the future of education in the AI era.
The quality of the human being to measure a philosofical question can not be substituted by a machine. Teacher has a heart that reinforce brain to decide the righteous. AI remain as a powerful tool to assist the human brain. Integrity is a character of a person.
On the assumption that this article is not written by AI and it probably wasn’t because if AI had written it, presumably it would have spoken more highly of AI’s ability to grade, the article is enlightening particularly in the depth to which it goes to explain where AI is strong and weak.
braid,
The teacher “tried”, but they were still biased. The conclusions are useless until there would be a proper double blind, placebo confrolled experiment.
We did similar experiment on the final calculus exam of business student at our university the students average score in 2022, 23, 24, 25 was 37, 36, 38, 36%, chatgpt 32, 85, 100, 100
“AI” is artificial. Not intelligent. It doesn’t understand what it is writing, hence it can achieve “work” which superficially seems to be of high quality but lacks depth or nuance. It’s like a smart kid who masks a lack of knowledge or understanding behind eloquent words.
Mythbusters(tm) demonstrated that you can polish poop.
AI has mastered the techniques to glaze garbage, it is eloquence without substance.
Teacher should not have known it was ai generated to make a fair assessment. Things get interesting when ai can address her issues re-write that perfect essay even specifically catered to her asks in fractions of a second
Yes, agreed, it should have been assessed without teacher’s knowledge, then the argument would have held even higher. That being said, the writer makes valid points which should be taken into account when using AI. Another reason I would say we also see that AI marks higher is that AI is currently like a young, inexperienced teacher. It does not have previous experience to compare to, except book knowledge or programmed behaviour, thus it marks relatively well structurally but not ’emotionally’ or ‘humanly’. I have noticed when I ask AI questions and it ‘re-interprets’ my question, on correction it does change it’s answer to more accurate interpretation/s. However, should we use AI for mass marking we do not have the time to check on each AI’s correctness. For now I side with the authors suggestions on value during process and suggestions, however finals should be teacher marked until AI is more stable, accurate and ‘experienced’. That being said there are parts of assessments that can be marked by AI. Interesting experiment, thank you for bringing this to the table.