Improvements in Generative AI Raise Speculation

Apr 21

Apr 21 Improvements in Generative AI Raise Speculation

Last December, OpenAI released the most advanced version of its large language model to date: ChatGPT 4-5. GPT 4-5, among other AI models designed for specific tasks, make up OpenAI’s GPT Pro plan—available to users for $200 per month. “What sets the model apart is its ability to engage in warm, intuitive, naturally flowing conversations, and we think it has a stronger understanding of what users mean when they ask for something,” said OpenAI’s Vice President of Research, Mia Glease, in an interview with the New York Times.

Despite these promises, generative AI programs continue to draw the ire of students and teachers alike—albeit for different reasons. Students lament lower-tiered models’ inability to produce anything of substance to upsell users to more expensive plans. Teachers, on the other hand, argue that a student’s choice to turn to AI destroys the teacher-student relationship in which personal work is produced in good faith to be graded holistically, taking into account the process. Not to mention the plagaristic consequences of submitting ChatGPT’s writing as one’s own. However, setting aside the ethical questions of submitting ChatGPT’s writing for a grade, a more pragmatic question exists: Can ChatGPT write an essay that will not only pass as human-written, but is also of the same quality as one written by a qualified student?

To test this question, The Review Review input a prompt from each English I and History I into the premium GPT 4-5. The experiment was to take the essays ChatGPT generated as well as an essay an adept freshman had written from the same prompt to the Writing Center. In the interest of producing valid results, neither of the essays was to be supplemented with human revisions, if possible.

This proved more challenging in practice. After two hours and a combined total of 74 prompts and revision requests, the essays GPT 4-5 generated were reaching a quality capable of stumping a Riverdale teacher. However, the essays still retained hallmarks of AI generation, such as the program’s tendency to preface its paraphrasing with unnecessary phrases like “scholars argue” or its affinity for long, confusing lists of adjectives. Having hit a wall with the large language model’s ability to refine specific language, several phrases were cut out of the essays, but none were added or rearranged. In spite of the essays’ seemingly natural phrasing to the naked eye, an online AI detector still flagged the essays as having a 98% probability of having been generated by ChatGPT. It seemed unlikely that the essays would pass a teacher’s evaluation.

The essay written from the English I prompt was given to Upper School English Teacher Dr. Carin McLain at the Writing Center. Dr. McLain read the human-written essay first before skimming the AI-generated essay more quickly. In order not to bias Dr. McLain towards thinking that there might be something off about one of the essays or that something sneaky was in the works, she was only asked to skim the essays and give her general thoughts on each. Dr. McLain said that she thought ChatGPT’s essay was “sharper” and that it lacked some of the repetition of its human-written counterpart.

Upon revealing to her that one of the essays was AI-generated, Dr. McLain went back through the essay more carefully and pointed out several passages that she deemed “suspicious” from an AI perspective; not to mention that Turnitin—an anti-plagiarism software that Riverdale’s humanities teachers use—would have flagged the essay easily given the AI-generated writing’s poor performance fooling the online detector. Dr. McLain did acknowledge that, although the essay given to her might not have gone undetected if submitted for a grade, ChatGPT’s analytical skills and facility of phrasing have increased rapidly in the few years since its release. “In a few years, we may not be able to detect [AI writing] at all,” Dr. McLain said. The fact that so much human thought was put into the specific prompts and omissions from the essays may also have contributed to their competence, Dr. McLain said.

While ChatGPT and other generative AI programs may not yet be on par with real human thought, they have come a long way since their release. As the capacity of these programs to quickly and relatively inexpensively do the work that humanities teachers at Riverdale assign develops, so must the way the school addresses AI use. Riverdale’s handbook and the documentation around the use of generative AI from the History and English departments both cite AI’s present unreliability when tasked with answering a complex question. However, the present limitations of these programs are only a short-term solution to the ethical questions of using them, which will only grow more challenging as large language models can competently do more of our work.

Some teachers have already begun experimenting with the ways AI could be used to supplement learning. Upper School History Teacher Mr. Omar Qureshi said that this is the first year he “experiment[ed] with AI-generated term paper questions.” While none of the options that ChatGPT proposed met his standards for the important assignment, other teachers have already begun to distribute AI-generated review materials and worksheets, Mr. Qureshi said.

Much as Riverdale’s humanities have grappled before with students’ use of CliffsNotes or infamous New York City tutors, advances in AI are just another challenge but not an existential threat to their academic fields. As the English Department AI use policy reads, “we place particular value on the most beautiful, deep, and inspiring creations of the human mind,” something even the most capable of AIs will not be able to match.