“Problem 14 Dynamic Programming Solutions”, Gwern, FeepingCreature, nshepperd, Khoth2022-10-02 (, , , ; backlinks)⁠:

Timothy Falcon’s quant-interview problem 14 asks for the optimal stopping strategy when playing a card-drawing game of  l cards where red = +$1 & black = −$1; the value approaches 0.5 × √l. I re-solve it with dynamic programming in R, and others in Neat, Haskell & C with increasing efficiency.

Problem 14 is a probability question about optimal stopping in a card game where one draws from a finite deck of good & bad cards; the goal is to stop when the random walk has fluctuated to as a high good value as likely; what is the expected payoff given optimal play? This question has some counterintuitive aspects and has been used as an interview question.

It can be treated as a multi-stage decision problem and solved by top-down & bottom-up dynamic programming, to estimate the value of optimal play (and thus provide the actual optimal strategy). The value of a game with a standard deck of 52 (26 good vs 26 bad) cards is ~$2.62; the value increases as the square root of cards.

Naive implementations which take hours or error out can be optimized down to milliseconds. We build on the original spreadsheet answer by providing a top-down DP (naive memoization) implementation in R (verified in simulation), Neat implementations (naive & optimized to use arrays), and an optimized C version; these have 𝒪(l2) time/space computational complexity, limiting their range. Then we provide the more efficient bottom-up solution in Haskell & C, which need 𝒪(l2) time but only 𝒪(l) space.

While it’s not easy to change the 𝒪(l2) complexity without problem-specific analysis or approximation, modern computers are so powerful that by improving the constant factors, we can still calculate shockingly high card counts. The efficient C bottom-up can be made faster by careful use of parallelism at the CPU-core level using vector instructions, and then by breaking the problem up into individual DP problems which can be solved independently in different CPU threads or processes.

This final version lets us calculate values of Problem 14 from the original 52 cards up to 133.7 million cards (value of $6,038.32), and fit approximations which confirm the conjecture that it increases as a square-root. (An approximation yields <0.002% relative error at 133.7m card count.)