A gentle review of Karan & Du article (2025)
Showing how power sampling reveals AI’s hidden reasoning potential, showing how language models can think smarter by simply resampling their own outputs without complex retraining.
Every few months someone announces a shiny new way to make language models “reason”. Usually it involves a heroic amount of reinforcement learning, a verifier the size of a small moon, or a prompt so long it needs its own table of contents.
This time, however, the claim is surprisingly bold: no RL, no training, no verifiers, no prompt acrobatics — just better sampling.
Let’s unpack what the paper Reasoning with Sampling: Your Base Model is Smarter Than You Think actually shows, without the drama.
What the researchers genuinely discovered
Instead of retraining a model or bolting new machinery on top, the authors use something far simpler: a clever inference-time trick called power sampling (or iterative MCMC resampling).
The gist is this:
- You generate an output, then repeatedly resample parts of it.
- Each resampling leans into what the base model already thinks is most probable.
- The process nudges the model toward its own high-confidence reasoning paths.
Think of it as letting the model take a deep breath, reconsider a few sentences, and quietly choose the answer it believed all along.
And the results?
- Across tasks like MATH500, HumanEval, and GPQA, this sampling-only method reaches — and sometimes beats — the performance of RL-posttrained models such as GRPO.
- It manages this without collapsing diversity, a common issue in RL-tuned models that start producing the textual equivalent of identical twins.
- And yes, the headline is technically correct: no new training, no reward models, and no fancy prompting tricks.Just sampling done with unusual care.
The fine print (because there’s always fine print)
A few things deserve honest attention:
- It’s not free:No training, yes — but extra compute at inference time. The algorithm does multiple resampling passes, tunes parameters like α, and juggles several candidate generations. Not quite “vanilla sampling”.
- It shines on verifiable reasoning:Short-form maths, coding, and QA behave beautifully. Open-ended or long-horizon tasks? Still an open question. The paper dips its toes into AlpacaEval 2.0, but we’re not done exploring.
- It depends on base model competence:A strong base model blossoms under this approach. A weaker one… well, it tries its best. This technique elicitsalready-latent reasoning rather than inventing new abilities from scratch.
- It’s not a replacement for agentic reasoning:Multi-turn planning, tool-use, or long-context agents still benefit from dedicated training. This method improves single-shot reasoning, not full agent behaviour.
Why this matters (and what you might actually do with it)
If you’re working with modern base models — anything in the 7B to 70B+ range — this finding hints at a quiet shift in the field:
- You may already have more reasoning capability than you think.
- Getting better results might require smarter inference, not heavier training.
- Power sampling is particularly attractive for deployments where:
- some extra inference cost is acceptable,
- diversity matters,
- and you want better answers without touching the training pipeline.
However, if your workflows rely on planning, multi-turn dialogues, or agents navigating messy environments, keep your RL and fine-tuning tools handy. Sampling won’t solve everything — but it will help more than you expect.
Final thought
Perhaps the most interesting takeaway isn’t the algorithm but the implication:
These models have been sitting on untapped reasoning potential all along.
Sometimes they just need to be asked the right way — or, in this case, resampled the right way.
What would you try improving first: the model itself, or the way you listen to it?
Karan, A., & Du, Y. (2025, October 16). Reasoning with sampling: Your base model is smarter than you think [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2510.14901
