close
close
Local

OpenAI Introduces CriticGPT, an AI Tool That Helps Coders Identify Bugs and Improve Code Quality

OpenAI introduced CriticGPT, a new AI model based on GPT-4, designed to identify errors in the code produced by ChatGPT. In testing, CriticGPT improved code review results by 60% when it was used compared to those that did not.

CriticGPT is expected to be integrated into OpenAI's Reinforcement Learning from Human Feedback (RLHF) labeling pipeline, with the goal of providing AI trainers with better tools to evaluate complex AI results.

The GPT-4 models that power ChatGPT are designed to be useful and interactive via RLHF. This process involves AI trainers comparing different responses and evaluating their quality. As ChatGPT's reasoning improves, its errors become more subtle, making it more difficult for trainers to identify inaccuracies. This highlights a key limitation of RLHF: advanced models can become so well-informed that human trainers struggle to provide meaningful feedback.

CriticGPT has been trained to write reviews that highlight inaccuracies in ChatGPT answers. While its suggestions aren't always perfect, they significantly help trainers identify more problems than when working without AI help. In experiments, teams using CriticGPT produced more complete reviews and identified fewer false positives compared to those working alone. A second instructor preferred critiques from the Human+CriticGPT team over those from an unassisted reviewer more than 60% of the time.

CriticGPT was trained using a method similar to ChatGPT but focused on identifying errors. AI trainers inserted errors into ChatGPT code and provided example comments. These trainers then compared multiple reviews of the modified code to evaluate CriticGPT's performance. CriticGPT reviews were preferred in 63% of cases involving naturally occurring bugs, in part because they generated fewer unnecessary complaints and fewer mind-numbing issues.

Despite its success, CriticGPT has limitations. It was trained on short ChatGPT responses and needs to be further developed to handle longer, more complex tasks. Additionally, while models still hallucinate and trainers sometimes make labeling errors, the focus on single errors needs to be expanded to address errors spread across multiple parts of a response.

Related Articles

Back to top button