LLMs : The gift that keeps on giving
It did not have to be like this
It strikes me that many promising results in AI are results that did not need to happen. Large language models just start exhibiting “intelligence” beyond a certain scale(Emergent Abilities of Large Language Models.), which is unleashed through simple instruction tuning. It is not obvious why, but this was the key to the breakthrough OpenAI made in 2022 with InstructGPT. A breakthrough that will be talked about for centuries to come.
I’ve had a similar feeling about a few other results since ChatGPT. The first is that LLMs are in-context learners. They can learn functions with just a few examples provided in context(Language Models are Few-Shot) Learners. This has immense implications because it challenges the paradigm of machine learning. Machine learning used to work like this: you have a problem, there is something to predict, you collect enough data, set up training infrastructure, hire some ML folk, and get them to train and deploy a model. But LLMs can learn in context. If you can collect even a little bit of data and feed it into the LLM’s prompt, you’ve got yourself a model. Again, this did not need to be true. No one asked for this. It just is.
The third result that I put in the same category is that LLMs can be used as optimizers for their own context(Large Language Models as Optimizers). They can introspect about what they did and come up with better ways just by examining their context and the quality of their own previous outputs (as judged by a verifier).
All three of these results are remarkable. The first has largely sunk in, but the latter two - where LLMs can learn without retraining through in-context learning and that they can optimise themselves by looking at their own work - are where I find a lot of interesting things happening. The manifestations are projects like GEPA (Genetic-Pareto), ACE (Agentic Context Engineering), and AlphaEvolve. In all of these, LLMs iteratively improve themselves by generating solutions and evolving to get better through feedback on their work from a verifier.
With gains in pretraining being squeezed out, I think test-time scaling - running many LLM calls with adaptive contexts and smart verifiers is likely the next frontier.


Hey, great read as always. Your 'in-context learners' point is brilliant.
Thanks for sharing these landmark papers!
In-context learning is indeed quite fascinating. Even with in-context learning and optimization, any theoretical understanding of the error bounds of LLMs seems to be missing.
I'm interested in seeing how / whether / where tools which are probably maybe likely most-of-the-time correct are productized and if they add real value over deterministic but non-exhaustive techniques.