OpenAI o3 Crushes AI Benchmarks – Still Not AGI
OpenAI has introduced o3, a large language model that significantly outperforms previous LLMs on reasoning tasks, achieving record-breaking scores on the ARC-AGI-1 and Frontier Math benchmarks. While not considered AGI, the model marks a major leap in AI versatility and problem-solving.
Maria Deutscher writes for SiliconANGLE
OpenAI today detailed o3, its new flagship large language model for reasoning tasks. The model’s introduction caps off a 12-day product announcement series that started with the launch of a new ChatGPT plan. Compared with earlier LLMs, o3 demonstrated significant improvements across benchmarks, including ARC-AGI-1, which tests AI on tasks it wasn't specifically trained for.
Currently, o3 is available to select researchers as OpenAI refines its safety mechanisms before broader release.