On May 31, OpenAI announced its efforts to enhance ChatGPT’s mathematical problem-solving capabilities, aiming to reduce instances of artificial intelligence (AI) hallucinations. OpenAI emphasized mitigating hallucinations as a crucial step toward developing aligned AI.
In March, the introduction of the latest version of ChatGPT — ChatGPT-4 — further propelled AI into the mainstream. However, generative AI chatbots have long grappled with factual accuracy, occasionally generating false information, commonly referred to as “hallucinations.“ The efforts to reduce these AI hallucinations were announced through a post on OpenAI’s website.
AI hallucinations refer to instances where artificial intelligence systems generate factually incorrect outputs, misleading or unsupported by real-world data. These hallucinations can manifest in various forms, such as generating false information, making up nonexistent events or people, or providing inaccurate details about certain topics.
OpenAI conducted research to examine the effectiveness of two types of feedback: “outcome supervision” and “process supervision.“ Outcome supervision involves feedback based on the final result, while process supervision provides input for each step in a chain of thought. OpenAI evaluated these models using math problems, generating multiple solutions and selecting the highest-ranked solution according to each feedback model.
After thorough analysis, the research team found that process supervision yielded a superior performance as it encouraged the model to adhere to a human-approved process. In contrast, outcome supervision proved more challenging to scrutinize consistently.
OpenAI recognized that the implications of process supervision extend beyond mathematics, with further investigation necessary to understand its effects in different domains. It expressed the possibility that if the observed outcomes hold in broader contexts, process supervision could offer a favorable combination of performance and alignment compared with outcome supervision. To facilitate research, the company publicly released the complete data set of process supervision, inviting exploration and study in this area.
Although OpenAI did not provide explicit instances that prompted its investigation into hallucinations, two recent occurrences exemplified the problem in real-life scenarios.
In a recent incident, lawyer Steven Schwartz in the Mata vs. Avianca Airlines case acknowledged relying on the chatbot as a research resource. However, the information provided by ChatGPT turned out to be entirely fabricated, highlighting the issue at hand.
OpenAI’s ChatGPT is not the only example of artificial intelligence systems encountering hallucinations. During a demonstration of its chatbot technology in March, Microsoft’s Bing AI chatbot examined earnings reports and generated inaccurate figures for companies like Gap and Lululemon.