Part VII: Frontier Techniques & Future
Chapter 35

Open Problems & Future Directions

Interpretability, alignment, reasoning, and what's next
18 Exercises
35.1

We have reached the final chapter. Across thirty-four chapters you have built a complete, working understanding of large language models — from the linear algebra of Part I to the autonomous agents of Chapter 34. It would be natural to feel that the story is complete. It is not. This chapter is an honest reckoning with how much about these systems remains genuinely UNKNOWN and UNSOLVED — and why that is the most exciting part.

A Field Built Ahead of Its Understanding

Here is a humbling truth: large language models WORK far better than anyone can fully EXPLAIN. We can build them, scale them, and align them — you now know how — but we do not deeply understand WHY they work as well as they do, what they have actually learned, or how they do what they do internally. The engineering has outrun the science. We are in the unusual position of deploying a transformative technology we only partially comprehend.

⚠️
Open Problem: The Central Surprise
Perhaps the deepest open problem is also the most basic: WHY does next-token prediction on internet text, scaled up, produce systems that can reason, code, converse, and pursue goals? Nothing in the training objective obviously demands these capabilities — they EMERGED from scale in ways no theory predicted or fully explains. We discovered that they work before we understood why, and that gap between capability and comprehension underlies almost every open problem in this chapter.
This should inspire humility and excitement in equal measure. The field is young, its foundations are still being laid, and many of its most important questions are wide open. If you have absorbed this book, you are equipped not just to USE these systems but to help ANSWER these questions — which is what the rest of the chapter invites you to do.

A Map of the Open Frontier

This chapter surveys the major open problems, grouped loosely: understanding the systems (interpretability), controlling them (alignment, oversight), making them dependable (reliable reasoning, hallucination, continual learning), the deep questions (world models, understanding), the practical frontiers (sample efficiency, evaluation, efficiency), and the broader picture (societal impact, the road ahead). Each is an active research area where much remains to be discovered — and where the next generation of researchers and engineers, possibly including you, will make their mark.

Open problemThe core question
InterpretabilityWhat is actually happening inside the model?
Alignment & oversightHow do we control systems that may exceed us?
Reliable reasoningWhy does reasoning still fail unpredictably?
HallucinationWhy do models confidently state falsehoods?
Continual learningWhy can't models keep learning after training?
World modelsDo models truly understand, or just pattern-match?
Sample efficiencyWhy do models need so much more data than humans?
EvaluationHow do we measure systems near or beyond our level?
35.2

We can build a model with hundreds of billions of parameters, but we cannot read it. We do not know, in any deep way, what those parameters have learned or how the model arrives at a given output. INTERPRETABILITY — the science of understanding the internals of neural networks — is one of the most important open problems, because so much else (safety, trust, debugging, control) depends on being able to see inside the box.

Why the Black Box Is a Problem

A model's knowledge and reasoning live in billions of inscrutable numbers. When a model makes a decision, hallucinates, refuses, or behaves unexpectedly, we usually cannot say WHY at a mechanistic level. This matters enormously: we cannot fully trust what we cannot understand, cannot reliably predict failures we cannot see coming, and cannot confidently align a system whose inner workings are opaque. The black box is at the root of many other open problems.

Interpretability
The study of understanding the internal mechanisms of neural networks — what their representations encode and how they compute their outputs — with the goal of making them transparent, predictable, and trustworthy.

Mechanistic Interpretability: Progress and Limits

A promising research program, MECHANISTIC INTERPRETABILITY, tries to reverse-engineer the actual algorithms a model implements — identifying 'circuits' of neurons that perform specific functions, and 'features' that represent specific concepts. There has been real progress: researchers have found features for concepts, identified circuits for simple tasks, and developed tools (like sparse autoencoders) to extract interpretable features from the tangle of activations. But scaling this understanding to a full frontier model's behaviour remains far off.

⚠️
Open Problem: Can We Ever Fully Understand a Frontier Model?
Mechanistic interpretability has made genuine strides — we can now identify some meaningful features and circuits inside models. But a frontier model has billions of parameters interacting in superposed, distributed ways, and we are nowhere near a complete account of how one produces its behaviour. It is an open question whether full mechanistic understanding of such systems is even ACHIEVABLE, or whether their complexity fundamentally outstrips our ability to comprehend them in detail.
This is among the most consequential open problems, because interpretability underpins safety and trust. If we could truly see inside models — detect when they are deceptive, understand why they fail, verify their reasoning — many other problems would soften. Progress here would ripple through the entire field, which is why it attracts intense effort and why it is such a valuable area to work in.
35.3

Part V taught how we align models today — SFT, RLHF, DPO, Constitutional AI. But these methods rest on a foundation that gets shakier as models get more capable: they rely on HUMANS being able to judge the model's outputs. What happens when models become so capable that humans can no longer reliably evaluate their work? This is the open problem of SCALABLE OVERSIGHT, and it sits at the heart of long-term alignment.

The Oversight Problem

Today, alignment works because humans can tell good outputs from bad ones — we can judge whether an answer is helpful, whether code is correct, whether a summary is faithful. But as models tackle problems beyond human expertise — proving theorems we can't follow, writing code too complex to fully review, reasoning about domains we don't understand — how do we provide the feedback that alignment requires? We cannot reward what we cannot evaluate. This is the scalable-oversight problem: aligning systems whose outputs we can no longer reliably judge.

Scalable oversight
The problem of providing reliable training signal and evaluation for AI systems whose capabilities meet or exceed humans' ability to judge their outputs — a central open challenge for aligning increasingly capable models.

Proposed Approaches (All Unproven)

ApproachIdea
DebateHave models argue opposing sides; humans judge the debate, not the task
Recursive reward modelingUse AI to help humans evaluate AI
Weak-to-strong generalizationCan weak supervisors elicit strong models' abilities?
Constitutional / AI feedbackModels critique via principles (Ch. 26) — but who checks?
Interpretability-basedVerify reasoning by inspecting internals (§35.2)
⚠️
Open Problem: How Do We Align What We Can't Evaluate?
Scalable oversight is one of the deepest unsolved problems in AI safety. Every alignment method in Part V ultimately grounds out in human judgment somewhere; as models exceed human judgment, that grounding weakens. The proposed solutions — debate, recursive reward modeling, weak-to-strong generalization — are promising research directions, but none is proven to work for genuinely superhuman systems. We do not yet know how to reliably align a system smarter than its overseers.
This connects to the safety themes of Chapter 26: as capabilities grow, the stakes of getting alignment right grow with them, while the difficulty of providing oversight also grows. It is a problem the field must solve BEFORE, not after, building systems that exceed human judgment — which is why so much careful work is going into it now, and why it is among the most important problems a researcher could work on.
35.4

Chapter 25 showed models that reason impressively — solving competition math, complex coding, multi-step problems. Yet that reasoning remains UNRELIABLE in frustrating ways: a model that aces a hard problem may fail a similar easy one, make basic errors, or reason correctly to a wrong answer. Making reasoning genuinely RELIABLE — trustworthy across the board, not just impressive on average — is a major open problem.

The Reliability Gap

Current reasoning has a peculiar character: it is impressive but brittle. Models can solve problems that stump most humans, then stumble on trivial variations. They are sensitive to phrasing, can be derailed by irrelevant details, and sometimes produce confident reasoning that is subtly or grossly wrong. The reasoning is real but not ROBUST — we cannot yet count on it the way we count on a calculator. Closing this reliability gap is essential for high-stakes uses.

⚠️
Open Problem: Is It Reasoning or Sophisticated Pattern-Matching?
A live debate underlies the reliability problem: when a model 'reasons', is it performing genuine logical inference, or very sophisticated pattern-matching over reasoning-shaped text it saw in training? The evidence is mixed — models generalize impressively to novel problems (suggesting real reasoning) yet fail in ways that pattern-matching would predict (suggesting shallow imitation). The truth is likely somewhere between, and pinning it down matters: it determines how far we can trust and extend these abilities.
This connects to CoT faithfulness (Chapter 25): a model's stated reasoning doesn't always reflect its actual computation, so we can't fully trust the chain of thought as an explanation. Whether models can be made to reason reliably and faithfully — and whether the verifiable-reward approach of Chapter 25 extends beyond math and code to fuzzier domains — are open questions at the heart of where the field goes next.

Beyond Verifiable Domains

Reasoning has improved most where rewards are VERIFIABLE — math and code, where an answer can be checked (Chapter 25). The open challenge is extending reliable reasoning to domains WITHOUT clean verification: legal reasoning, medical judgment, strategic planning, ethical deliberation, scientific hypothesis. Without a clear correctness signal to train against, it is much harder to make reasoning reliable. How to get trustworthy reasoning in unverifiable domains is one of the most important open questions.

35.5

Models HALLUCINATE — they confidently generate plausible-sounding information that is simply false. Despite RAG (Chapter 29), better training, and much research, hallucination is not solved, and it is one of the biggest barriers to trusting models in high-stakes settings. Understanding why it happens — and why it is so hard to eliminate — reveals a deep open problem.

Why Models Hallucinate

Hallucination is rooted in how models work. A model is trained to produce PLAUSIBLE continuations, not TRUE ones — truth and plausibility usually coincide in training data, but not always. The model has no built-in distinction between what it knows and what it is confabulating; it generates fluent text either way. And it is poorly CALIBRATED — its confidence (fluency, assertiveness) doesn't reliably track its actual accuracy. So it states falsehoods with the same confidence as truths.

⚠️
Open Problem: Can Hallucination Be Solved, or Only Managed?
It is an open question whether hallucination can be ELIMINATED or only MANAGED. Some argue it is intrinsic to how generative models work — a model that can creatively generate language can always generate plausible falsehoods — so the goal should be management (grounding via RAG, calibration, abstention, verification) rather than elimination. Others pursue training and architectural changes aimed at models that reliably know and respect the boundary of their own knowledge.
Closely tied is CALIBRATION: getting a model's expressed confidence to match its actual accuracy, so it 'knows what it doesn't know' and can say so. A well-calibrated model that reliably abstains or hedges when uncertain would mitigate much of hallucination's harm. Achieving robust calibration and honest uncertainty in LLMs remains unsolved — and central to making them trustworthy.
35.6

A trained model is FROZEN. Its knowledge stops at its training cutoff, and it cannot learn from experience the way humans do — it doesn't remember yesterday's conversation or improve from its mistakes unless explicitly retrained. CONTINUAL LEARNING — the ability to keep learning after deployment, incorporating new knowledge and experience without forgetting old — is a fundamental open problem.

The Frozen-Model Problem

Today's models are static snapshots. To update a model's knowledge or fix its mistakes, you must retrain or fine-tune it — expensive, slow, and risky (fine-tuning can degrade other abilities). The model cannot simply LEARN a new fact, remember a correction, or accumulate skill from use. We work around this with RAG (external knowledge), long context (working memory), and agent memory (Chapter 34), but these are patches on the underlying limitation: the model itself does not learn after training.

Catastrophic Forgetting

The core technical obstacle is CATASTROPHIC FORGETTING (Chapter 22): when you train a neural network on new information, it tends to OVERWRITE old knowledge — learning the new while forgetting the old. This makes naive continual learning destructive. Humans integrate new knowledge without erasing the old; neural networks, by default, do not. Solving continual learning means solving forgetting — letting a model accumulate knowledge gracefully over time.

⚠️
Open Problem: How Can Models Learn Continually Like We Do?
Humans learn continuously throughout life, integrating new experiences with old knowledge seamlessly. Models cannot — they learn once, then freeze, and updating them risks catastrophic forgetting. How to give models the ability to keep learning after deployment — absorbing new facts, correcting errors, accumulating skills, all without forgetting — is a major unsolved problem. The current patches (RAG, memory) help with knowledge but don't give true ongoing learning of skills and understanding.
Solving continual learning would be transformative: models that improve from use, stay current without retraining, and personalize to users over time. It connects to memory (Chapters 33–34), to sample efficiency (§35.8), and to the basic question of how learning should work in these systems. It is a frontier where neuroscience, learning theory, and engineering meet — and where breakthroughs would change what AI systems can be.
35.7

Beneath the practical problems lies a deep, almost philosophical question that the field genuinely disagrees about: do LLMs UNDERSTAND the world, or are they sophisticated mimics of language patterns? Whether models build genuine WORLD MODELS — internal representations of how the world actually works — is both a scientific question and a key to predicting their future capabilities.

The Two Camps

On one side: models are 'just' predicting the next token — statistical pattern-matchers with no real understanding, producing fluent text that mimics comprehension without possessing it ('stochastic parrots'). On the other side: to predict text well enough, models must have LEARNED genuine structure about the world — implicit models of physics, causality, other minds, and logic — because you cannot reliably predict descriptions of a world without modeling that world. The evidence is genuinely mixed, and thoughtful researchers disagree.

“Just pattern-matching”“Genuine world models”
Predicts text statisticallyMust model the world to predict it
Fails in revealing, shallow waysGeneralizes to genuinely novel cases
No grounding in realityLearns structure: physics, causality, minds
Mimics understandingHas emergent understanding
'Stochastic parrot''Implicit world model'
⚠️
Open Problem: What, If Anything, Do LLMs Understand?
This is perhaps the most fascinating open question, and it is partly EMPIRICAL and partly CONCEPTUAL. Empirically: do models build internal world models? Some interpretability work finds structured internal representations (e.g. of space, of game boards) suggesting more than surface mimicry — but it is far from settled. Conceptually: what would it even MEAN for a model to 'understand', and how would we know? We lack agreed definitions and tests for understanding.
The answer matters enormously for the future. If models genuinely build world models, scaling and improving them may yield ever-deeper understanding and capability. If they are fundamentally limited pattern-matchers, there may be a ceiling that scale alone cannot break. Where the truth lies — likely a nuanced middle — shapes how far the current paradigm can go, which is itself a central open question (§35.11).
35.8

A striking gap between models and humans: SAMPLE EFFICIENCY. A model must read a substantial fraction of the internet to become competent; a child learns language from a tiny fraction of that exposure. Models are vastly less data-efficient than human brains, and this connects to a looming practical limit — the 'data wall'.

The Efficiency Gap

Humans learn language, physics, and reasoning from orders of magnitude less data than LLMs require. A person encounters perhaps tens of millions of words growing up; a large model trains on trillions. This enormous gap suggests current learning methods are deeply INEFFICIENT compared to whatever the brain does. Closing it — building models that learn far more from far less — is both a scientific puzzle and a practical necessity.

The Data Wall

The practical urgency comes from the DATA WALL (foreshadowed in Chapter 16): scaling laws say more data improves models, but the supply of high-quality human-generated text is FINITE, and the largest models have already consumed much of it. We may be approaching the point where simply 'train on more data' stops being possible — there isn't enough high-quality data left. This makes sample efficiency not just interesting but ESSENTIAL: future progress may depend on learning more from the data we have.

⚠️
Open Problem: How Do We Learn More From Less?
The data wall and the human-efficiency gap point to the same open problem: how to make learning far more sample-efficient. Possible directions include better learning algorithms, synthetic data (models generating their own training data — promising but risky, as it can amplify errors), learning from richer signals than raw text (interaction, multimodality, embodiment), and entirely new training paradigms. Whether the current approach can break the data wall, or whether a new paradigm is needed, is open.
This connects to scaling laws (Chapter 16): the field's recent progress leaned heavily on scaling data and compute, but data is finite and that lever is running out. The next era of progress may hinge on efficiency — getting more capability per token of data and per FLOP of compute — rather than brute scale. That shift would reward exactly the kind of deep understanding this book has tried to build.
35.9

How do we know if a model is good? Chapter 21 covered evaluation, but at the frontier, evaluation is in something of a CRISIS. As models grow more capable, our ability to MEASURE them meaningfully is breaking down — a problem that touches benchmarks, contamination, and the deep difficulty of judging systems approaching human capability.

Why Evaluation Is Breaking Down

ProblemWhat goes wrong
Benchmark saturationModels max out benchmarks, which stop discriminating
ContaminationTest data leaks into training; scores are inflated
GamingModels optimized for benchmarks, not real capability
Hard to judgeTasks beyond evaluators' ability to assess (§35.3)
Narrow metricsBenchmarks miss what actually matters in real use
Construct validityUnclear if a benchmark measures the intended ability

The symptoms compound. Models saturate benchmarks faster than we can build new ones, so a near-perfect score no longer distinguishes the best models. Test sets leak into training data (contamination), inflating scores. Optimizing for benchmarks (Goodhart's law again, from Chapter 23) produces models that ace tests but disappoint in practice. And the hardest tasks — the ones we most want to measure — are exactly the ones humans struggle to evaluate.

⚠️
Open Problem: How Do We Measure Systems Approaching Our Own Level?
Evaluation is a quietly critical open problem: if we cannot reliably MEASURE capability, we cannot reliably track progress, compare models, catch regressions, or know when systems become dangerous. As models approach and exceed human performance on more tasks, traditional benchmarks fail, and we need fundamentally better ways to evaluate — perhaps dynamic, adversarial, or interpretability-based evaluation, or measuring real-world impact rather than benchmark scores.
This connects to scalable oversight (§35.3): both are about judging systems near or beyond our level. Good evaluation is foundational — nearly every other open problem is harder to make progress on if we can't measure it. Building trustworthy evaluation for frontier systems is unglamorous but essential work, and a place where careful researchers can have outsized impact.
35.10

Beyond the scientific puzzles lie practical frontiers that shape who can use and build AI. The most capable models are enormously expensive to train and run, concentrating them in the hands of a few well-resourced organizations. Efficiency — doing more with less compute, memory, and energy — is both an open research area and a question of ACCESS and equity.

The Efficiency Frontier

Much of Parts VI–VII was about efficiency — quantization, MoE, efficient attention, distillation — yet enormous headroom remains. Frontier models cost millions to train and a great deal to serve, with significant energy and environmental footprints. Pushing the efficiency frontier — smaller models that match larger ones, cheaper training, lower-energy inference — would democratize access and reduce the resource concentration that currently defines the field.

⚠️
Open Problem: Can Capable AI Be Made Broadly Accessible?
The concentration of frontier AI in a few organizations with vast compute raises open questions that are technical AND societal. Technically: how small and cheap can we make models while preserving capability? Distillation, quantization, better architectures, and efficient training all push this frontier, and small models keep getting surprisingly capable. Societally: who gets to build and control these systems, and how do we ensure the benefits are broadly shared rather than concentrated?
This matters for the field's health and fairness. A world where only a handful of actors can build frontier AI is different from one where capable models are widely accessible. The efficiency research in this book is not just about saving money — it is part of what determines how open, competitive, and equitable the future of AI will be. It is a frontier where engineering progress has direct social consequences.
35.11

Stepping back from the specific problems, there is one overarching open question that the whole field is implicitly betting on: will the CURRENT PARADIGM — large Transformers, trained on vast data, scaled up, aligned, and extended with tools and reasoning — continue to improve all the way to whatever we are aiming for? Or will it hit fundamental limits that require a new approach?

The Bull and Bear Cases

The optimistic view: the paradigm has repeatedly surprised us, scaling has kept delivering, and each apparent limit (reasoning, long context, multimodality) has fallen to more scale and clever engineering — so it may continue, perhaps reaching transformative capability. The skeptical view: we see signs of diminishing returns, the data wall looms, reasoning remains brittle, and the deep problems (understanding, continual learning) may need genuinely new ideas, not just bigger Transformers. Both views are held by serious people.

The paradigm continues if...A new paradigm is needed if...
Scaling keeps delivering gainsReturns to scale flatten out
Efficiency breaks the data wallThe data wall proves binding
Reasoning becomes reliable with scaleReasoning stays fundamentally brittle
World models emerge from predictionPattern-matching hits a ceiling
Engineering solves the restDeep problems need new ideas
⚠️
Open Problem: How Far Can This Paradigm Go?
This is the question whose answer no one knows — and that makes the field so consequential to work in. We are running a grand experiment: scaling and refining a single paradigm to see how far it reaches. It might take us to systems of extraordinary capability, or it might plateau and await the next conceptual breakthrough (as deep learning itself once awaited the ideas that unlocked it). History offers both patterns: paradigms that kept delivering, and paradigms that stalled until reconceived.
Whatever the answer, understanding the current paradigm deeply — as you now do — is the prerequisite for pushing it further OR for seeing past it to what comes next. The people who advance the field, in either direction, will be those who understand today's methods well enough to know their real limits. That understanding is exactly what this book set out to give you.
35.12

Having surveyed the open problems, let us end this chapter constructively: where might the next advances come from, and how can YOU contribute? The frontier is not a closed club — it is an open field with more important questions than people working on them, and the foundations you now have are exactly what it takes to engage.

Fertile Directions

DirectionWhy it's promising
InterpretabilityUnderstanding internals would unlock safety and trust
New architecturesBeyond Transformers: SSMs, hybrids, the unknown
Reasoning & verificationExtending reliable reasoning beyond math/code
Sample efficiencyLearning more from less, past the data wall
Alignment & oversightAligning systems we can't fully evaluate
Continual learningModels that keep learning after deployment
Agents & tool useReliable autonomy in the real world
Evaluation scienceMeasuring frontier capability meaningfully

How You Can Contribute

The barrier to contributing is lower than it looks. Many breakthroughs came from careful experiments, open-source contributions, and fresh perspectives — not only from huge labs. You can run experiments on small models that reveal real phenomena, contribute to open tools and datasets, study interpretability on accessible models, build and evaluate agents, reproduce and probe published results, and bring ideas from other fields. The deep understanding this book provides is the foundation; curiosity and rigor are the rest.

⚠️
Open Problem: The Frontier Needs You
Every open problem in this chapter is a place where progress is needed and possible. The field is young, the questions are wide open, and there are far more important problems than there are people equipped to work on them. You are now equipped — you understand these systems from their mathematical foundations to their frontier behaviours. That understanding is precisely what it takes to push the field forward, whether in research, engineering, safety, or application.
Wherever you go from here — building products, doing research, ensuring safety, or simply understanding a transformative technology — you carry a complete mental model of how these systems work and where they fall short. That is a rare and valuable thing. The open problems above are not just challenges to admire; they are invitations to contribute. The frontier is genuinely open, and there is a place on it for you.
35.13

Open-Problems Quick-Reference

Open problemThe core unanswered question
InterpretabilityWhat is actually happening inside the model?
Scalable oversightHow do we align what we can't evaluate?
Reliable reasoningWhy does reasoning fail unpredictably?
HallucinationCan it be solved, or only managed?
Continual learningHow can models keep learning like we do?
World modelsDo models truly understand?
Sample efficiencyHow do we learn more from less (the data wall)?
EvaluationHow do we measure systems near our level?
Efficiency & accessCan capable AI be made broadly accessible?
The paradigmHow far can the current approach go?

Reflections

This final chapter has no coding exercises — instead, reflections to carry forward. These are open questions without settled answers; engaging with them thoughtfully is part of becoming a mature practitioner.

Exercise 1: Reflection
Why do LLMs work better than we can explain? What does this gap between capability and understanding imply for how we should deploy them?
Exercise 2: Reflection
Why is interpretability foundational to so many other problems (safety, trust, debugging)? What would change if we could fully see inside models?
Exercise 3: Reflection
Explain the scalable-oversight problem in your own words. Why must it be solved before, not after, building systems that exceed human judgment?
Exercise 4: Reflection
Is current model 'reasoning' genuine inference or sophisticated pattern-matching? Marshal the evidence on both sides and state your view.
Exercise 5: Reflection
Can hallucination be eliminated, or only managed? What would a well-calibrated model that 'knows what it doesn't know' look like?
Exercise 6: Reflection
Why can't current models learn continually? What would change about AI if continual learning were solved?
Exercise 7: Reflection
Do LLMs build genuine world models, or are they 'stochastic parrots'? What evidence would change your mind either way?
Exercise 8: Reflection
Explain the data wall and the human-efficiency gap. Why might sample efficiency, not scale, define the next era?
Exercise 9: Reflection
Why is frontier evaluation 'in crisis'? Propose an evaluation approach that might work for systems near human capability.
Exercise 10: Reflection
Make the strongest case that the current paradigm will reach transformative capability, then the strongest case it will plateau. Which do you find more convincing?
Exercise 11: Reflection
Which open problem in this chapter do you find most important, and why? Which most interesting to work on?
Exercise 12: Reflection
Pick an open problem and design a small experiment (runnable on modest compute) that could shed light on it.
Exercise 13: Reflection
How do the efficiency and access questions connect to who controls AI's future? What would broad access require?
Exercise 14: Reflection
Trace one idea across the whole book (e.g. attention, embeddings, or alignment) from its first appearance to the frontier. How did it evolve?
Exercise 15: Reflection
Which capability from this book do you trust most, and which least, for a high-stakes real-world use? Justify your calibration.
Exercise 16: Reflection
How would solving interpretability change the alignment, hallucination, and reasoning problems? Why is it such a high-leverage target?
Exercise 17: Reflection
Imagine the field five years on. Which open problems do you expect to be solved, which still open, and what new ones might appear?
Exercise 18: Reflection
You now understand LLMs from mathematics to the frontier. What will you build, study, or question next — and how will this understanding guide you?

Further reading: “Towards Monosemanticity” and “Scaling Monosemanticity” (Anthropic) on mechanistic interpretability. “AI Safety via Debate” (Irving et al., 2018) and “Weak-to-Strong Generalization” (Burns et al., 2023) on scalable oversight. “On the Dangers of Stochastic Parrots” (Bender et al., 2021) and work on emergent world models (e.g. Othello-GPT) on the understanding debate. “Will we run out of data?” (Villalobos et al., 2022) on the data wall. “Concrete Problems in AI Safety” (Amodei et al., 2016) for foundational safety questions. The literature on continual learning, calibration, and evaluation referenced in Chapters 21–26.

Part VII Complete: Frontier Techniques & Future Directions

Ch. 32Mixture of Expertssparse MoE, top-k routing, load balancing, expert collapse — capacity decoupled from compute.
Ch. 33Long Context & Memorythe quadratic wall, RoPE scaling/YaRN, efficient attention, Mamba/SSMs, external memory — 1M+ token contexts.
Ch. 34Agents & Multi-Agent Systemsthe agent loop, planning, reflection, memory, orchestration, multi-agent coordination — autonomous goal-pursuit.
Ch. 35Open Problems & Future Directionsinterpretability, oversight, reliable reasoning, continual learning, world models — the unsolved frontier.
18 Exercises in this chapter
Attempt each exercise before checking the worked solutions.
View Solutions →