7 Summers in the First Scaling Age
reflections on ai progress
There were five and a half years between “Attention Is All You Need” and the release of ChatGPT. Another three years have passed since.
I’ve spent most of today trying to string together words that somehow make sense of these years. When “Attention” dropped I was 19, working in an industrial ML lab on document parsing.
Language modeling was undoubtedly a cultural backwater then, especially in healthcare. IBM Watson had just incinerated a few billion dollars during the first great NLP fever dream, and the idea that even a few million dollars would reenter this field seemed outlandish.
The next summer the generative pre-training paper dropped. I remember getting forwarded it by the then-boyfriend of a then college hallmate that ran a document processing company in a Boston industrial lowrise.
If you squinted your eyes, together they formed enough of a picture of a research agenda to get really, really good industrial NLP. You could imagine curating datasets, and maybe even finding enough money to string a few GPUs together, you might get something useful.
I quit my job that fall to work on models.
The spiritual center of the “AI is kinda real” community at this point was a set of Berkeley grouphouses that were convinced they had solved human psychology. It would still be a few more years before they collapsed in a demon summoning scandal.
And still another two years from that point until scaling laws became clear.
And still much, much longer until ChatGPT.
That gap is what haunts me. Five and a half years was an eternity, but we talk about it now like it was a straight line.
It wasn’t. It was a wandering path with huge amounts of capital incinerated, firms destroyed, and dozens of dead ends.
Now everyone assumes the deployment phase will be instant. That because we have the intelligence, the economy will just fluidly reshape itself around it.
But I look at the systems we’re trying to inject this stuff into—human flesh and blood processes—it feels impossible to not feel that same sense of time dilation.
Getting the model to work was a technology problem. Getting the world to work with it is anything but.
The road to actual economic diffusion is going to be so much longer than capital markets will allow for. Except this time it’s not IBM burning a few billion. It’s everyone. Every mega cap. Every startup. Trillions in market cap betting on deployment timelines that assume human organizations behave like technology products.
They don’t. They won’t.
Say a prayer.

