I think what has become clear over the past year is that the AGI label was never very useful.
It is both true that (1) 2025-era Reasoners would meet many prior definitions of AGI & (2) the idea of a single dimensional "intelligence" factor does not help us understand AI impacts
The Sparks paper was an innovative attempt at trying to figure out ways of pointing at GPT-4 & saying “there is something unexpected here that is hard to measure right now”
I think Early Science Acceleration feels similar. A blurry picture that will become clearer coming years
Our lack of any reliable measures of human error rates across intellectually demanding tasks and fields is a huge hindrance to understanding the thresholds of hallucination and reliability that AI might cross incrementally that could lead to sudden leaps in usefulness & adoption.