Five LLM techniques based on how our brain works

Introduction
1. Pausing to think in the middle of the task (not just at the beginning)
2. Thinking fast and thinking slow
3. Short-term and long-term memory, guided by surprise
4. Sleeping to consolidate (and forgetting what gets in the way)
5. Learning from easy to hard (and removing the training wheels in time)
In summary
References

Introduction

There is a well-known recipe for improving a language model, and it is known to work: more data, more parameters, more compute. But beyond that path, studies keep appearing that show how borrowing certain ideas about the way our brain works can also improve the performance of LLMs.

The point is not to imitate biology to the letter, since an artificial neural network is not a neuron, but to borrow principles from cognitive psychology and neuroscience that have been studied for decades: that we think at two speeds, that sleep consolidates what we have learned, that forgetting is as important as remembering, or that we learn better by going from easy to hard.

Below, we summarize in a simple way, without getting into excessive detail, five relatively recent techniques that apply these intuitions and measurably improve the performance of LLMs. For each one, the link to the original paper is included in the references section so you can access the full detail.

An honest disclaimer before we start: the analogies with the brain are mainly a source of inspiration for designing the technique, not proof that the model “works like a person.” What is interesting is that these analogies can deliver real results.

1. Pausing to think in the middle of the task (not just at the beginning)

How the brain does it: an expert programmer does not plan all the code in their head and then type it out in one go. They write as they go and, when they reach a tricky part, such as an edge case or a delicate algorithm, they stop, think, and continue. Mental effort is spent where it is needed. This is, in essence, metacognitive control: deciding on the fly how much to reason and when.

The technique: today's reasoning models do almost the opposite. They concentrate all of their “thinking” in a block at the start and then generate the answer. The work Think-Anywhere [1] proposes that the model be able to open a reasoning block at any position while it is writing the code, not only at the beginning. To achieve this, they combine two phases: first a cold-start with examples that teach this pattern, and then reinforcement learning with verifiable rewards (they run the code and check whether it passes the tests) so that the model itself discovers where it is worth pausing to think.

Why it matters: the paper's analysis shows that the model learns to invoke reasoning precisely at the positions of greatest uncertainty (high entropy), which is where a human would also hesitate. And how is that uncertainty detected? At each step, the model does not pick a word directly, but spreads a probability across all possible continuations. If that probability concentrates on one clear option, there is little uncertainty. If it is spread across many almost equally plausible alternatives, the uncertainty (that is what “entropy” means) is high. That spread is the signal that flags the points where it is worth pausing to think. Across four code-generation benchmarks (LeetCode, LiveCodeBench, HumanEval and MBPP) it reaches an average of 70.3%, about 9 points above the base model, beating both reasoning at the start (CoT) and other interleaved variants. The elegant detail is that it does not waste compute on trivial code and saves it for the hard parts.

2. Thinking fast and thinking slow

How the brain does it: this is Daniel Kahneman's famous System 1 and System 2. One mode of thinking is fast, intuitive and automatic, and the other is slow, deliberate and costly. We use intuition for everyday matters and reserve careful reasoning for the problems that truly require it. And, crucially, we also verify: we check our first answer before accepting it.

The technique: the work Thinker: Learning to Think Fast and Slow [2] reorganizes the question-answering task into four explicit stages inspired by dual-process theory: fast thinking (answering with a strict token budget), verification (the model evaluates its own answer), slow thinking (it refines the answer with more deliberation) and summarization (it distills the result into precise steps). Intuition and deliberation are trained as distinct but complementary systems.

Why it matters: separating the two modes improves accuracy (for example, from 45.9% to 51.0% on a 1.5-billion-parameter model from the DeepSeek-R1 family) and, in addition, fast mode alone solves a good share of cases using fewer than 1,000 tokens, which saves compute on simple questions. It is worth noting that the experiments were run with small models, so this is a solid proof of concept rather than a large-scale result, but the direction is clear: not everything deserves the same effort.

3. Short-term and long-term memory, guided by surprise

How the brain does it: we do not remember everything equally. We have a limited, high-fidelity working memory for the immediate present, and a long-term memory that stores what matters in a more compressed form. And there is a well-known bias: what surprises us, meaning what breaks our expectations, sticks far better. Surprise (in technical terms, prediction error) is a signal that something is worth storing.

The technique: Titans: Learning to Memorize at Test Time [3], by Google, introduces a long-term neural memory module that learns to memorize while the model is running (at inference time), adjusting its own weights. It decides what to keep using a gradient-based “surprise” metric, so that the more unexpected a token is the more it gets memorized, combined with an adaptive forgetting mechanism so it does not become saturated. In this architecture, the Transformer's classic attention acts as short-term memory (precise but limited in capacity) and the new module acts as long-term memory (more persistent and compressed).

Why it matters: this division of labor lets Titans scale to contexts of more than 2 million tokens and outperform Transformers and modern linear recurrent models on language modeling, common-sense reasoning and, above all, on “needle in a haystack” tasks where you have to retrieve a specific fact from an enormous text. It is one of the most talked-about attempts of recent months to give models a memory that more closely resembles our own.

4. Sleeping to consolidate (and forgetting what gets in the way)

How the brain does it: sleep is not wasted time for memory, but rather when the brain does maintenance. During sleep, the day's memories are reactivated and reorganized (replay), the accumulated synaptic noise is reduced (synaptic downscaling) and what is irrelevant is selectively discarded. Forgetting in a targeted way is part of the job, not a failure.

The technique: LLMs suffer from a problem called proactive interference. When old, already outdated information stays in the context, it hampers the retrieval of the current value, and accuracy drops as obsolete associations pile up. SleepGate [4], titled Learning to Forget, draws directly on sleep-dependent consolidation to solve it. It adds a “sleep cycle” that acts on the key-value cache (the model's internal memory during generation) with three pieces: a detector that identifies when a new entry supersedes an old one, a forgetting gate that removes or compresses what is obsolete, and a consolidation module that merges what survives into compact summaries. These “sleep micro-cycles” are triggered periodically during inference.

Why it matters: in its experiments, SleepGate keeps retrieval accuracy very high (between 97% and 99%) in scenarios where several alternative memory-management approaches collapse below 18%, and it theoretically reduces the interference horizon from linear to logarithmic. That said, we should be fair about its scope: it is a proof-of-concept work carried out with a tiny model (four layers, about 793,000 parameters) and a controlled benchmark. The idea is very promising, but it still needs to be validated at the scale of production models.

5. Learning from easy to hard (and removing the training wheels in time)

How the brain does it: nobody learns calculus before learning to add. Human education is organized as a curriculum: first the fundamentals, then the complexity, scaffolding knowledge in stages. And there is an important nuance that any teacher knows: there comes a moment when the easy exercises have to be removed so the student does not get too comfortable.

The technique: Curriculum Reinforcement Learning from Easy to Hard [5], known as E2H Reasoner, applies this to reinforcement-learning training of reasoning abilities. Instead of throwing the model straight at hard problems, something that reinforcement learning alone does not handle well, it breaks the difficulty into stages and presents them progressively. The most interesting finding, and a very human one, is that easy tasks are valuable at the beginning but should be phased out over time. Otherwise, the model overfits to the simple cases and stops making progress.

Why it matters: the authors provide both empirical results (consistent improvements on hard problems and on tasks outside the training distribution, such as Blocksworld, Countdown and MATH) and theoretical guarantees: learning in stages requires fewer samples than attacking the hard problem head-on. It is one of those ideas that sounds like common sense precisely because it comes from common sense about how people learn. That said, the literature also warns that curriculum learning is not a universal silver bullet: it depends on how difficulty is measured and how the pace is scheduled.

In summary

The five techniques share the idea of borrowing cognitive mechanisms that make humans efficient: thinking only when and where it is needed (Think-Anywhere), alternating between intuition and deliberation (Thinker), distinguishing short-term from long-term memory guided by surprise (Titans), actively consolidating and forgetting as in sleep (SleepGate), and learning in increasing stages of difficulty (E2H).

It is worth keeping the goal in mind: the aim is not to make LLMs resemble our brain as closely as possible, but to improve their performance. And better performance does not always mean copying biology. What these works show is that, when an idea from the human brain turns out to be useful, borrowing it can be one of the most efficient levers for the next generation of models.

At Kaptor Security, we are experts in assessing the security of applications and architectures that make use of language models. If you are interested in understanding the risks these technologies introduce into your organization, do not hesitate to contact us.

References

Think-Anywhere in Code Generation. Jiang et al. (2026). arXiv:2603.29957. Available at: https://arxiv.org/abs/2603.29957
Thinker: Learning to Think Fast and Slow. (2025). arXiv:2505.21097. Available at: https://arxiv.org/abs/2505.21097
Titans: Learning to Memorize at Test Time. Behrouz, Zhong and Mirrokni (Google, 2025). arXiv:2501.00663. Available at: https://arxiv.org/abs/2501.00663
Learning to Forget: Sleep-Inspired Memory Consolidation for Resolving Proactive Interference in Large Language Models (SleepGate). Xie (2026). arXiv:2603.14517. Available at: https://arxiv.org/abs/2603.14517
Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning (E2H Reasoner). Parashar et al. (2025). arXiv:2506.06632. Available at: https://arxiv.org/abs/2506.06632

Contents