In trying to maintain up with (or forward of) the competitors, mannequin releases proceed at a gradual clip: GPT-5.2 represents OpenAI’s third main mannequin launch since August. GPT-5 launched that month with a brand new routing system that toggles between instant-response and simulated reasoning modes, although customers complained about responses that felt chilly and medical. November’s GPT-5.1 replace added eight preset “persona” choices and centered on making the system extra conversational.
Numbers go up
Oddly, despite the fact that the GPT-5.2 mannequin launch is ostensibly a response to Gemini 3’s efficiency, OpenAI selected to not checklist any benchmarks on its promotional web site evaluating the 2 fashions. As an alternative, the official weblog publish focuses on GPT-5.2’s enhancements over its predecessors and its efficiency on OpenAI’s new GDPval benchmark, which makes an attempt to measure skilled information work duties throughout 44 occupations.
In the course of the press briefing, OpenAI did share some competitors comparability benchmarks that included Gemini 3 Professional and Claude Opus 4.5 however pushed again on the narrative that GPT-5.2 was rushed to market in response to Google. “It is very important observe this has been within the works for a lot of, many months,” Simo advised reporters, though selecting when to launch it, we’ll observe, is a strategic determination.
Based on the shared numbers, GPT-5.2 Considering scored 55.6 % on SWE-Bench Professional, a software program engineering benchmark, in comparison with 43.3 % for Gemini 3 Professional and 52.0 % for Claude Opus 4.5. On GPQA Diamond, a graduate-level science benchmark, GPT-5.2 scored 92.4 % versus Gemini 3 Professional’s 91.9 %.
OpenAI says GPT-5.2 Considering beats or ties “human professionals” on 70.9 % of duties within the GDPval benchmark (in comparison with 53.3 % for Gemini 3 Professional). The corporate additionally claims the mannequin completes these duties at greater than 11 occasions the velocity and fewer than 1 % of the price of human specialists.
GPT-5.2 Considering additionally reportedly generates responses with 38 % fewer confabulations than GPT-5.1, in keeping with Max Schwarzer, OpenAI’s post-training lead, who advised VentureBeat that the mannequin “hallucinates considerably much less” than its predecessor.
Nonetheless, we at all times take benchmarks with a grain of salt as a result of it’s simple to current them in a approach that’s optimistic to an organization, particularly when the science of measuring AI efficiency objectively hasn’t fairly caught up with company gross sales pitches for humanlike AI capabilities.
Unbiased benchmark outcomes from researchers outdoors OpenAI will take time to reach. Within the meantime, if you happen to use ChatGPT for work duties, anticipate competent fashions with incremental enhancements and a few higher coding efficiency thrown in for good measure.
Keep forward of the curve with NextBusiness 24. Discover extra tales, subscribe to our e-newsletter, and be a part of our rising group at nextbusiness24.com

