A Merry o3 Holidays
OpenAI released their “newest” model, o3, yesterday. This comes with a jaw-dropping result on the ARC Prize benchmark – a high compute score of 87.5%. This benchmark is intended to focus on the most challenging unsolved problems in AI.
However, OpenAI disclosed that they trained the o3 model on 75% of the Public Training set.1 This entails the question – is it overfitted?
Well, the whole world (aside OpenAI) is unsure. What we DO know for a fact though, Claude, Gemini and all other frontier models have not presented their benchmarks for ARC. So what’s there to compare atm?
Additionally, I don’t think this is AGI just yet. I believe in the human mind and think the ability to just freely retrieve information via the internet isn’t true sentient intelligence. Something or someone that can be characterized as intelligent must be able to think intuitively and craft an approach to any given task without an external context. Thus, AGI must be able to solve all levels of problems, and the thing is, these models can still mess up on instructions that are well formatted in markdown and verbose, so it makes you wonder…