Skip to content
← All writing
Mindset & Edge

Bob and Uncle Tau: The Two Layers (Part 2 of 2) — More Floors

By FelixD
Bob and Uncle Tau: The Two Layers (Part 2 of 2) — More Floors

Previously in Part 1: Bob had been reading Diaconis-Ethier 2022 — the paper where ICM disagrees with gambler’s-ruin elimination probabilities by a factor of 600 at the short-stack corner — and called Uncle Tau in distress. At the back of the taquería he got a lecture about why “which model is right” is a category error. ICM, DE, FGS, and Ganzfried-Sandholm are all chips-to-dollars equity maps, not utilities; the data can’t adjudicate between them because they live on the same attenuated Markov channel. The full decision objective wraps a concave utility — log is the Kelly-canonical choice, CRRA and fractional Kelly generalize — around bankroll plus the equity map. The composition is twice-concave. Jensen stacks. Optimal play is strictly tighter than the naked equity-map readout. Then Bob tried to wriggle out by Kelly-sizing at entry and promising to chip-EV-max in the tournament; Tau dismantled that in four specific ways.

Bob had just opened his mouth to respond.

*If you haven’t read *

  • go read it first — the machinery below sits on top of it.*

Bob opened his mouth to respond and the door of the taquería slammed open so hard the velvet painting of the luchador shuddered on its nail. A man in a slightly-too-long overcoat stood in the doorway holding a stack of printed PDFs and a yellow legal pad with a pen behind each ear. His hair was doing something architecturally.

Man in Overcoat: V OF S.

The el combo waiter, who had seen worse, continued polishing a glass.

Man in Overcoat: YOU TWO. I could hear you from the street! You’re arguing about chips-to-dollars maps! That is the FIRST FLOOR. The building has many floors, sir, many many floors, and you have to backward-induct through all of them or you are doing nothing

Uncle Tau did not look up from his horchata.

Uncle Tau: Hello, Mister Bellmann.

Bob: You know this guy?

Uncle Tau: Everybody who does this work knows this guy. Sit down, Bellmann.

Mister Bellmann: I will NOT —

Uncle Tau: Bellmann. The el combo. Sit.

Mister Bellmann sat. He set the PDFs on the table. The top one was by Diaconis. The second one had “Vᵖⁱ(s) = 𝔼_π[∑ γᵗ rₜ | s₀=s]” in the header.


Uncle Tau: Before we go anywhere. Bob, do you know how Bellmann’s grandfather named the field he founded.

Bob: No.

Mister Bellmann: Oh god, not this.

Uncle Tau: In 1953 Richard Bellman named the study of multistage optimization “dynamic programming.” Do you know why he chose those two words, Bob?

Bob: No.

Uncle Tau: He wrote about it explicitly. He picked “dynamic” because it has a precise physics meaning and is impossible to use pejoratively — try using the word “dynamic” as an insult, you can’t. He picked “programming” because it sounds like the person is doing something. He combined them into a name that, and I quote, “not even a Congressman could object to.” He named the field specifically so the Department of Defense would fund it and nobody would ask what it was.

Mister Bellmann: The grant went through.

Uncle Tau: The grant went through and the field has been un-findable ever since. You ask someone “what do you study,” they say “dynamic programming,” you have absolutely no idea whether they’re a mathematician, a C++ developer, or somebody’s grandson writing Excel macros. The canonical work on value functions is filed under a name that was picked to be un-searchable. Half the reason nobody in poker knows any of this math is that they cannot find it.

Mister Bellmann: This is an old wound.

Uncle Tau: It is a live wound. And you’re here in Bob’s booth yelling about V of S. V of S is a fine name. “Dynamic programming” is the single worst name in applied mathematics.

Mister Bellmann: He also coined “curse of dimensionality.”

Uncle Tau: The curse of dimensionality is a perfectly fine name for a curse that is in fact about dimensionality. One out of two. Your family’s naming batting average is fifty percent.


Bob held up a hand.

Bob: Can I — ok hold on. Mister Bellmann. Back up. You said we stopped at the first floor. We got through the inner gauge, we got through the Kelly layer, and you’re telling me there are more floors.

Mister Bellmann: Oh god yes. So many floors.

Uncle Tau: Let him talk, it’s his one thing.

Mister Bellmann: The actual object of the tournament is the value function V-of-s. V. Of S. “S” is any state you might be in — stack depth, field size, position, action history, the whole business. “V” is the expected return under optimal play from that state. V of S is the thing you are secretly trying to approximate every single time you make a decision. Every chip-EV calculation, every ICM readout, every solver output, every “what’s my equity here” question — all of them are scalar readings of V at some state. V is the forest. Your numbers are leaves.

Bob: OK.

Mister Bellmann: And V satisfies a recursion. You don’t compute V directly; you compute it by looking at each available action, evaluating the expected value of the resulting next state, and recursing. Bellman equation. My grandfather’s math.

Uncle Tau: His grandfather’s math. Everyone else’s notation.

Mister Bellmann: BUT. The recursion has a catch. The expected value of the next state depends on what the other players are doing. The transition kernel depends on opponent strategies. And you don’t know opponent strategies.

Uncle Tau: Here it comes.

Mister Bellmann: So you must maintain beliefs about opponent strategies. You have priors, and every observation gives you a posterior. Bayesian updating. Every hand you see updates your beliefs about everyone at the table, those beliefs feed into the Bellman recursion, the recursion produces V, V tells you what to do, your action generates more observations, and the beliefs update again. It is two estimation layers stacked inside each other — Bayes sitting inside Bellman. Or Bellman sitting inside Bayes, depending on where you enter the loop.

Bob: Priest beliefs.

Mister Bellmann: I — what?

Bob: Prior. Posterior. Priest.

Mister Bellmann: That’s not —

Uncle Tau: Let him have it, Bellmann. And for what it’s worth — “priest beliefs” isn’t a mistake. The man who wrote down the rule for updating these beliefs was a Presbyterian minister. Reverend Thomas Bayes. Never published in his lifetime; another clergyman — Richard Price — fished the manuscript out of Bayes’ desk after he died and presented it to the Royal Society in 1763. Priest beliefs is the origin story, Bob. You got the etymology right.

Bob: You’re kidding.

Uncle Tau: I am not. The whole of Bayesian inference comes from a dead priest’s drawer.

Mister Bellmann: That is — also accurate.

Uncle Tau: The probability rule comes from a priest. The value recursion comes from a bureaucrat who named the field to hide from a Congressman. Draw your own conclusions about which discipline was founded on better hygiene.

Mister Bellmann: Fine. Priest beliefs. Inside Bellman.


Bob: OK so the layers are.

Tau started listing on his fingers.

Uncle Tau: Inner gauge. Chips-to-dollars map. Pick a cartoon — ICM, DE, FGS, GS. Data can’t distinguish them. Gauge.

Outer forced. Kelly log on wealth. Not a choice. Theorem.

Mister Bellmann: And above both of those — the value function V-of-s itself, which is what you’re secretly trying to compute when you pick an action at a state. Which requires —

Uncle Tau: Priest beliefs about opponent strategies. Posteriors that update slowly because most hands tell you almost nothing about how anyone plays.

Mister Bellmann: And the Bellman recursion over those posteriors, which would tell you V at every state by backward induction from the terminal reward.

Uncle Tau: Which requires backward-inducting through a Markov chain.

Mister Bellmann: Which —

Uncle Tau: Which has the DPI wall through it. The same wall from last time. The chain is too long, Fisher information attenuates exponentially, no estimator gets to V even in principle.

Mister Bellmann: Yes.

Bob: So V is hidden for — how many independent reasons now?

Uncle Tau: Three. Maybe four. Priest beliefs update too slowly to pin down opponent strategies; the gauge choice at the inner level has its own unidentifiability; the DPI chain is exponentially attenuated; and the composition of all three has to go through a log of wealth you may or may not have calibrated correctly. Any one of those alone is fatal. Together they aren’t even a wall anymore. They’re a fortress.

Mister Bellmann: I know V is hidden. I am yelling about V because it’s hidden. Everyone is optimizing a number that doesn’t exist at a place the wall doesn’t let them reach. They write “EV%” on forum posts. There is no EV. There is V. V is hidden. They are all arguing about fictitious arithmetic on an imaginary object.

Uncle Tau: Correct. So what are they supposed to do about it.

Mister Bellmann:

Uncle Tau: That’s the part you always leave out. That’s why the grandfather’s math is famous and the grandfather’s pipeline is not. There is no grandfather’s pipeline.

Mister Bellmann: I admit the prescriptive side is less developed.


Uncle Tau: Before the laptop. One more thing. You keep forgetting who you’re talking to — and more importantly who you’re talking about.

Mister Bellmann: Meaning.

Uncle Tau: GTO Wizard. PioSolver. MonkerSolver. Simple Postflop. Every solver the poker world has ever touched. Tell me, technically, what’s happening inside them.

Mister Bellmann: Counterfactual regret minimization, mostly. Some variants with real-time search.

Uncle Tau: Computing a Nash equilibrium of a two-player zero-sum extensive-form game. By the minimax theorem, that equilibrium has a unique value — a well-defined V at every state. CFR, CFR+, and the depth-limited Libratus-and-Pluribus-style solvers that plug value networks into the leaves — all of them are, at bottom, iterative approximation of that V. Multiagent Bellman on a minimax objective. Your grandfather’s recursion wearing a poker hat.

Mister Bellmann: That is — accurate.

Uncle Tau: And step outside poker for a second. AlphaGo. AlphaZero. MuZero. DQN. TD-Gammon before them. Every deep reinforcement learning result of the last fifteen years is the same trick.

He ticked them off on his fingers.

Uncle Tau: Approximate V. Recurse. Repeat. Brilliant trick. Works every single time. Works in Go. Works in chess. Works in Shogi. Works in Starcraft. Works in Atari. Works in protein folding. Works in cash-game poker. Works in the preflop chart a rec in Finland is memorizing right now without knowing he’s reading the output of your algorithm. And — keep up — Q-learning, deep Q-learning, policy gradients, actor-critic — all of them are your recursion with a function approximator bolted onto the V.

Bob: Wait. Hold on. Hold on. So he’s the guy. Like — for real.

Uncle Tau: He is the guy. Every solver output anyone has ever bought a subscription to read is a scalar evaluation of his family’s recursion. The reason the poker crowd doesn’t realize you’re their patron saint, Bellmann, is — one more time, to the back row — because your grandfather named the field “dynamic programming.”

Mister Bellmann: I thought we were done with this.

Uncle Tau: We will never be done with this.

Bob: Dynamic programming is the thing that makes GTO Wizard work.

Uncle Tau: Dynamic programming is the thing that makes everything since 2010 work. The solver industry. The deep RL industry. Every self-driving-car planning stack. DeepMind’s entire output. The single most productive algorithmic template of the last seventy years. Higher cash-out rate than any other idea in applied math. And it’s called “dynamic programming,” which sounds like a feature you’d find in a 1998 Microsoft Word manual.

Mister Bellmann: It has, on occasion, had marketing difficulties.

Uncle Tau: It has marketing difficulties in precisely the audience that needs it most. Poker is downstream of every single thing I just listed. The object the solver computes is your object. Nobody in poker ever says “Bellman recursion” — they say “the solver says.” And the solver is your recursion. You are in every GTO conversation that has ever happened. You are in every hand-history review. You are in every preflop chart anyone has ever screenshotted. You have been quietly in the water for seventy years and nobody has mentioned your name because the name you were given sounds like an Excel macro.

Mister Bellmann: I hadn’t thought of it that way.

Uncle Tau: You should. The only corner of decision theory your family hasn’t already conquered is the one we’re sitting in right now — composing that recursion with the Kelly layer, on a finite bankroll, for tournament play. Every other corner is yours already.


Uncle Tau: Which is why you’re going to go help us build the pipeline.

Mister Bellmann: I beg your —

Uncle Tau: The pipeline. The operational tool. The thing that takes all of this — the gauge choice, the Kelly layer, the priest beliefs, the backward recursion, the DPI wall — and turns it into an actual recommendation about what to enter, what to size, what to sell, and how conservatively to play in any given spot. Here is the thing everyone on the forums has missed: V is hidden, but the structure of the problem is not hidden. It’s right there. You just described it. Anyone can see the whole architecture of what you would compute if the numbers were in hand.

Mister Bellmann: And the pipeline operates on…

Uncle Tau: The pipeline operates on the structure with honest uncertainty about the hidden numbers. Wide priors on every unidentifiable parameter. Tight priors on identifiable ones. Forced composition for the theorem layer. An explicit named gauge choice at the inner. All of the uncertainty flows forward through the whole stack into a recommendation that automatically gets more conservative as more layers are hidden. That is the thing we can actually build.

Mister Bellmann: Integrate over the posterior.

Uncle Tau: Integrate over the posterior. If you can’t pin V, don’t pretend to. Hold a distribution over V. Let the distribution make the decision. Ignore the forum fight about which point estimate is correct. Build on the distribution.

Mister Bellmann: The truth stays hidden.

Uncle Tau: The truth stays hidden. For a while. Possibly forever. Certainly longer than any of our careers. But “the truth is hidden” and “we can act well under the hiding” are not the same statement, and people confuse them constantly. They say: we can’t know V, therefore any action is as good as any other, therefore send it. That is wrong. We can’t know V, therefore we need architecture that is honest about not knowing V, therefore we build a pipeline. Not knowing is an engineering problem, not a license.

Mister Bellmann: This is better than the way I was going to phrase it.

Uncle Tau: Bellmann. Your family has had eighty years. The phrasing portion of your turn is over. Get your laptop out. There’s an outlet next to the booth.

Mister Bellmann began unpacking a laptop with four stickers on the lid, one of which said HAMILTON and one of which said HJB and one of which said MAXENT and one of which said, inexplicably, TACOS.


Bob: OK so. Let me stack this up. Gauge at the bottom — ICM or DE, pick a cartoon, the data can’t tell us which is right. Kelly log on top — forced, universal, theorem. Above that, the value function — which depends on priest beliefs about opponents that update slowly, Bellman recursion that has the DPI wall through it, and the joint composition of all four on a finite bankroll. Hidden on multiple axes for multiple independent reasons. But the structure is visible — it’s just the composition — so we build a pipeline that integrates over everything we don’t know and outputs an action.

Uncle Tau: And the action is more conservative when more is hidden.

Bob: And that’s the whole machine.

Uncle Tau: That’s the whole machine.

Mister Bellmann: I have a whiteboard app open.

Uncle Tau: Of course you do.


Bob: Can I be a little polemic for a second.

Uncle Tau: You’ve been polemic for six months. Don’t make it a ceremony.

Bob: Fine. We’ve been doing geometry for forty minutes.

Uncle Tau: We’ve been doing geometry the entire time you’ve known me.

Bob: No, but really. Look at this napkin. The simplex. The “sign-varying field over the simplex.” Two concavities stacked — two curved surfaces composed. “The correction isn’t a scalar, it’s a field.” That’s a vector field. “The composition is twice concave.” That’s curvature. Like actual curvature. Like an actual shape with a second derivative.

Uncle Tau: All correct.

Bob: And Fisher information from last time —

Uncle Tau: A Riemannian metric on the space of probability distributions. Shun-ichi Amari, 1985. There’s a whole field called information geometry. Not a metaphor, not an analogy — a metric, in the differential-geometry sense, with lengths and angles and geodesics.

Bob: So when you said the channel was attenuated —

Uncle Tau: I meant Fisher distances get compressed under the transformation from parameters to tournament outcomes. Literally smaller. In a literal Riemannian sense. The data-processing inequality is a statement about geodesic length getting shorter. It’s not figurative.

Bob: The Cramér-Rao bound is —

Uncle Tau: An ellipsoid in parameter space. The inverse Fisher matrix is a covariance. Covariances are ellipsoids. The bound says “your estimator’s error distribution contains at least this ellipsoid.” It is a shape. It has axes. You could draw it.

Bob: The gauge thing. The voltage thing.

Uncle Tau: Fiber bundle. A section of a fiber bundle. Gauge theory in the differential-geometric sense — the same mathematical object as electromagnetism and general relativity, applied to decision-theoretic representations instead of electromagnetic fields. Same math. Different vocabulary.

Bob: And Shapley values live on —

Uncle Tau: A symmetric polytope. The core of a cooperative game is a convex polytope. The nucleolus minimizes a lexicographic distance inside the polytope. Cooperative game theory is polytope geometry with feelings.

Bob: KL divergence —

Uncle Tau: Bregman divergence. A generalized squared distance induced by a convex function. Still geometry.

Mister Bellmann: The value function is a section of a bundle over the state space, with the Bellman operator as a contraction in a suitably weighted sup-norm ball —

Uncle Tau: Bellmann. Not now.


Bob: OK here’s my polemic. I sat through six years of geometry in school. Circles, ellipses, triangles, conic sections, coordinate planes, Euclid’s postulates. Not once — and put the emphasis on not once — did anyone tell me this stuff was the machinery of advanced statistics. Nobody said “by the way, the squiggly integral in your stats textbook? That’s an ellipsoid.” Nobody said “the thing where two models can’t be told apart? That’s a fiber bundle, it’s the same math as a subway map.” They made us memorize π r² like it was arbitrary trivia.

Uncle Tau: It’s a common complaint.

Bob: If one teacher — one — had drawn a Fisher information ellipsoid on the blackboard and said “notice how it’s narrow along this axis and wide along that one? That’s how many hands you’d need to learn this parameter versus that one” — I would have kept my circle. I would have shown up. Instead I spent ten years after high school assuming geometry was a finite subject that ended when Euclid finished his tenth book.

Uncle Tau: A high-school geometry teacher’s job is to survive the semester. Connecting Euclid to Amari is a graduate seminar problem and most graduate seminars don’t do it either. You found your way back. The circle is still there.

Bob: It’s twenty years late.

Uncle Tau: The circle doesn’t care.

Bob: And then — then — you realize half the reason poker players can’t read the relevant math is because the math is filed under “differential geometry” and “information geometry” and “convex optimization,” and nobody ever walked them across the bridge from the stuff they learned at sixteen to the stuff that would actually pay their rent. You learn the names of conic sections in tenth grade and nobody tells you that a likelihood contour near a maximum is a conic section. It’s the same shape. Same word. Different aisle of the bookstore.

Uncle Tau: Statistics is geometry in disguise. Optimization is geometry in disguise. Mechanism design is geometry in disguise. Economics sneaks it in through convex sets. Physics is upfront about it. Poker is economics through a sieve of noise, so poker ends up back in geometry too. If you use numbers to describe a world, you end up needing shapes.

Bob: Nobody told me.

Uncle Tau: Nobody was going to. That’s why we’re sitting in a taquería redrawing it on napkins at thirty-six.

Mister Bellmann: Also a manifold.

Uncle Tau: Everything’s a manifold, Bellmann. Let the man have his moment.


Bob: What’s the one-sentence version.

Uncle Tau: ICM is the lens. log(B + · ) is the eye. Nobody’s been putting the lens in front of the eye.

Bob: That’s pretty good.

Uncle Tau: It’s not original. Kelly wrote the eye in 1956. Harville wrote the lens in 1973. I’m just composing two functions.


Bob: One more thing.

Uncle Tau: Yeah.

Bob: If the outer log is universal and the inner gauge is unidentifiable, why do we spend so much time arguing about the inner gauge?

Uncle Tau: Because the inner gauge is where the visible numbers live. ICM gives you a dollar figure. DE gives you a different dollar figure. Humans see different dollar figures and conclude there must be a debate to have. The outer log doesn’t produce a different visible number — it produces a different action. It doesn’t change the dollar readout; it changes the size of the variance you should tolerate to chase that readout. And variance tolerance is invisible in a screenshot.

Bob: So it’s the same story as last time. The visible thing is the wrong thing.

Uncle Tau: The visible thing is always the wrong thing. The number is a UI. The thing behind it is a shape. Forums argue about the UI. The shape is where the money is.


Bob: El combo?

Uncle Tau: Not today, amigo.

The waiter nodded, set down two more horchatas, and disappeared.


The setup:

Layer What it is Where it comes from Is it a choice? Inner: M(chips) Chips-to-dollars equity map ICM / DE-GR / FGS / GS / etc. Yes — but tournament data can’t tell you which is right Outer: concave in wealth (log is the canonical choice) Utility over wealth Kelly 1956; CRRA / fractional Kelly generalize Concavity is forced; the specific shape is canonical, not unique

The objective:

maximize 𝔼[ log(B + M(chips)) ]

Four things that are true at once:

  1. M is a gauge. Tournament data cannot distinguish ICM from DE from FGS. Pick one, live with it, don’t argue about it.
  2. The outer layer must be concave in wealth. Log is the Kelly-Breiman canonical choice; CRRA with any positive coefficient, fractional Kelly, and Bayesian-averaged risk preferences all give the same direction of correction. Concavity is forced by finite bankroll plus risk aversion. The specific shape is canonical, not unique — and every risk-averse human at the tournament sits inside this family.
  3. The composition is twice-concave. Optimal play is strictly tighter than the naked M readout. “ICM is too tight” has the sign backwards — and so does the same complaint aimed at any other inner model.
  4. Kelly-at-entry ≠ Kelly-in-tournament. Pre-funding your exposure answers “should I play.” The outer log answers “how should I play.” Sizing the buy-in correctly does not give you permission to chip-EV-max inside the tournament — variance is taxed by the composed utility independently of exposure, and the top of the payout distribution lives precisely where log(B + · ) stops being locally linear.

Above the two-layer core — what Bellmann brought to the booth:

Layer What it is Why it’s hidden Gauge M(chips) — pick ICM / DE / FGS / GS Unidentifiable from tournament data; lives in the map, not the territory Kelly log (or any concave utility) over B + M Not hidden — forced by any concave preference; this one you just do Value function V(s) Expected return under optimal play from state s Depends on opponent strategies you don’t observe directly Priest beliefs Bayesian posteriors over opponent strategies Posteriors update too slowly; most hands carry almost no strategic information Bellman recursion Backward induction for V across the whole game tree Chain is too long; DPI attenuates Fisher information exponentially

Any one of these alone is fatal to a point-estimate approach. Together they’re a fortress. The response is to build on the structure — composition and recursion, both visible — with honest uncertainty about everything hidden inside the structure. Integrate over the posterior. The truth stays hidden. The pipeline doesn’t.

What the sign-varying field looks like:

Region of the simplex DE vs ICM Kelly layer Net Short-stack corner Looser (big stack busts less than ICM says) Tighter Depends on B Middle of the simplex ≈ ICM Tighter Tighter Big-stack corner Press more (big stack worth more than ICM says) Press less Depends on B

This is why “the correction is worth X% of EV” is a category error. EV is the wrong currency, and the correction isn’t a scalar — it’s a field that changes sign depending on where in the simplex you’re standing.


References for Part 2: Richard Bellman (1957), Dynamic Programming, Princeton University Press. Thomas Bayes, “An Essay towards solving a Problem in the Doctrine of Chances,” communicated posthumously by Richard Price to the Royal Society, Philosophical Transactions 1763. Shun-ichi Amari (1985), Differential-Geometrical Methods in Statistics. Lloyd Shapley (1953), “A Value for n-Person Games,” Contributions to the Theory of Games II. Christopher Watkins (1989), “Learning from Delayed Rewards” (PhD thesis, Cambridge — the original Q-learning paper). Noam Brown and Tuomas Sandholm (2017), “Superhuman AI for heads-up no-limit poker: Libratus beats top professionals,” Science 359:418–424. Noam Brown and Tuomas Sandholm (2019), “Superhuman AI for multiplayer poker” (Pluribus). Tom Cover and Joy Thomas (1991/2006), Elements of Information Theory.

Part 1 is here.

Newsletter

New essays in your inbox

Free Substack — subscribe to get new posts as they ship. No upsell.

Related on the platform

About FelixD

Joined bitB Staking as an intern, left as CFO. Now founder of Mota GmbH.