OpenAI's chatGPT-4.5 Fails Turing Test with 73% Success Bitcoin Recovery

OpenAI’s ChatGPT has reached a goal that had been considered impossible for decades: it convinced the majority of Turing Test participants to believe its ChatGPT was actually human.

GPT-4.5 passed the classic three-party Turing Test in 73% of the text-based conversational tests, according to a study conducted by the University of California San Diego.

This study shows that the latest large-language model is superior to earlier versions, like GPT-4.0, ELIZA, and LLama-3.1-405 B.

Cameron Jones is a postdoctoral research fellow at UC San Diego. He says that GPT 4.5, released by OpenAI, in February was able the detect subtle language clues.

“If you ask them what it's like to be human, the models tend to answer well and can convincingly pretend to have emotional and sexual experiences,” Jones told Decrypt. They struggle to understand things such as real-time events or information.

Turing Test is an evaluation of whether or not a computer can fool a human by convincingly mimicking human conversations. Turing Test has been proposed by British Mathematician Alan Turing, in 1950. The machine passes the Turing Test if the judge cannot reliably tell the difference between the computer and the person.

To evaluate the AI models' performance, researchers tested two prompt types: a baseline prompt with minimal instruction and a more detailed prompt that directed the model to adopt the voice of an introverted, internet-savvy young person who uses slang.

The study’s researchers said that they selected the witnesses based upon an exploratory survey in which we tested five different LLMs as well as seven different prompts. We found the GPT 4.5 and LLaMa-3.1-405B performed the best.

This study examined the wider social and economic consequences of the Turing Test passing large language models, as well as the potential for misuse.

Jones explained that some risks are misinformation and astroturfing – where robots pose as people in order to inflate the interest for a particular cause. “Others involve fraud or social engineering—if a model emails someone over time and seems real, it might persuade them to share sensitive information or access bank accounts.”

OpenAI has announced that the GPT-4.1 model is the new iteration to its flagship GPT. The AI in this new model is more sophisticated and capable of processing large documents, codebases or novels. OpenAI announced that it will sunset GPT-4.5 this summer and replace it GPT 4-1.

Jones notes that Turing may not have been around to witness today’s AI world, but the test Turing proposed in 1950 still holds true.

He said that the Turing Test was still valid in the sense intended by Turing. In his paper, Turing talks about the Turing Test and says that the best way to create something that can pass the Turing Test would be by building a computational kid that is able to learn from lots of information. It’s basically how the modern machine-learning models operate.”

Jones, when asked to comment on criticisms of the study and its findings, acknowledged the value but clarified what the Turing Test does not measure.

“The main thing I’d say is the Turing Test isn’t a perfect test of intelligence—or even of human-likeness,” he said. It is useful for the purpose it serves: to determine whether or not a computer can fool a human into believing it is human. It’s worth measuring because it has a real impact.”

Sebastian Sinclair edited the book

OpenAI’s chatGPT-4.5 Fails Turing Test with 73% Success