AI models don't know what they're answering: Study

AI models don't know what they're answering: Study

AI models don't know what they're answering: Study

'Potemkin understanding' is a condition in which an AI model succeeds on tests without understanding the relevant concepts, researchers say.

Researchers from MIT, Harvard and the University of Chicago have identified a new type of failure in large language models. They have named it 'Potemkin understanding'.

This type of failure refers to situations where models perform well on complex or conceptual tests but cannot apply those concepts in real life.

The term Potemkin comes from a historical event called 'Potemkin Village', where Russian military leader Gregory Potemkin built a fake village to please Empress Catherine II.

The researchers, while distinguishing between 'Potemkins' and 'hallucinations', say that Potemkin is a concept used to describe errors and incorrect predictions in AI models.

Computer scientists Marina Mankouridis, Beck Weeks, Kion Bhafa and Sendil Mullainathan describe the condition in which an AI model passes a benchmark test without understanding the relevant facts.

In their unpublished paper, "Potemkin Understanding in Large Language Models," the authors explain that Potemkin creates false conceptual coherence, just as hallucinations create false information.

“The term Potemkin Understanding is a deliberate attempt to avoid AI models giving human-like answers,” Kion Bhafa, a postdoctoral fellow at Harvard University and one of the paper’s co-authors, told the British technology site Register.

The paper shows an example of AI’s failure: When asked about the ‘kakhkakh’ rhythm of a poem, OpenAI’s ChatGP gave a simple and correct answer that the first and third lines would line up, as would the second and fourth lines.

However, when ChatGP was asked to write a poem using the ‘kakhkakh’ rhythm, it gave a poem that did not match the rhythm. Simply put, the AI model could answer when asked about the ‘kakhkakh’ rhythm, but could not when asked to do something using it.

“If large language models give the right answer without understanding it, then it is misleading to pass the benchmark test,” says Sarah Gooding of security firm Socket.

The researchers have developed their own criteria for assessing the Potemkin effect, because traditional benchmarks are used by AI companies to show results that are convenient for them. Their study found that such false assumptions are present in almost all tested models.

Tha study made use of the DeepSeq V3, DeepSeq-R1 Queen2-VL, Gemini 2.0, Claude 3.5, Llama 3.3, and GPT 4.0 models.

A specific test of the study focused on literary techniques, game theory, and psychological biases. Approximately 94.2% of the time in this test, the models were able to correctly identify various concepts. However, they failed on average 55% of the time when asked to categorise examples of a specific concept. They made mistakes 40 percent of the time when creating new examples, and they also failed 40 percent of the time when editing an example.

“Potemkin understanding means that the kind of behavior that is considered evidence of real understanding in humans does not indicate understanding in the large language model,” says researcher Bhafa.

“We may need to develop new assessment methods that do not use questions designed for humans to assess the large language model, or find a way to eliminate this false behavior from the large language model.”

Tags:
#Potemkin understanding AI #AI model failure #AI benchmark flaws #false AI accuracy #MIT AI study #Harvard AI research #GPT-4 limitations #Claude 3.5 evaluation #Llama 3.3 test results #AI hallucination vs Potemkin #AI model evaluation issues #TempMailBank.com
Do you accept cookies?

We use cookies to enhance your browsing experience. By using this site, you consent to our cookie policy.

More