There has been a lot of recent interest in the abc conjecture, since the release a few weeks ago of the last of a series of papers by Shinichi Mochizuki which, as one of its major applications, claims to establish this conjecture. It’s still far too early to judge whether this proof is likely to be correct or not (the entire argument encompasses several hundred pages of argument, mostly in the area of anabelian geometry, which very few mathematicians are expert in, to the extent that we still do not even have a full outline of the proof strategy yet), and I don’t have anything substantial to add to the existing discussion around that conjecture. (But, for those that are interested, the Polymath wiki page on the ABC conjecture has collected most of the links to that discussion, and to various background materials.)

In the meantime, though, I thought I might give the standard probabilistic heuristic argument that explains why we expect the ABC conjecture to be true. The underlying heuristic is a common one, used throughout number theory, and it can be summarised as follows:

Heuristic 1 (Probabilistic heuristic) Even though number theory is a deterministic subject (one does not need to roll any dice to factorise a number, or figure out if a number is prime), one expects to get a good asymptotic prediction for the answers to many number-theoretic questions by pretending that various number-theoretic assertions {E} (e.g. that a given number {n} is prime) are probabilistic events (with a probability {\mathop{\bf P}(E)} that can vary between {0} and {1}) rather than deterministic events (that are either always true or always false). Furthermore:

  • (Basic heuristic) If two or more of these heuristically probabilistic events have no obvious reason to be strongly correlated to each other, then we should expect them to behave as if they were (jointly) independent.
  • (Advanced heuristic) If two or more of these heuristically probabilistic events have some obvious correlation between them, but no further correlations are suspected, then we should expect them to behave as if they were conditionally independent, relative to whatever data is causing the correlation.

This is, of course, an extremely vague and completely non-rigorous heuristic, requiring (among other things) a subjective and ad hoc determination of what an “obvious reason” is, but in practice it tends to give remarkably plausible predictions, some fraction of which can in fact be backed up by rigorous argument (although in many cases, the actual argument has almost nothing in common with the probabilistic heuristic). A famous special case of this heuristic is the Cramér random model for the primes, but this is not the only such instance for that heuristic.

To give the most precise predictions, one should use the advanced heuristic in Heuristic 1, but this can be somewhat complicated to execute, and so we shall focus instead on the predictions given by the basic heuristic (thus ignoring the presence of some number-theoretic correlations), which tends to give predictions that are quantitatively inaccurate but still reasonably good at the qualitative level.

Here is a basic “corollary” of Heuristic 1:

Heuristic 2 (Heuristic Borel-Cantelli) Suppose one has a sequence {E_1, E_2, \ldots} of number-theoretic statements, which we heuristically interpet as probabilistic events with probabilities {\mathop{\bf P}(E_1), \mathop{\bf P}(E_2), \ldots}. Suppose also that we know of no obvious reason for these events to have much of a correlation with each other. Then:

  • If {\sum_{i=1}^\infty \mathop{\bf P}(E_i) < \infty}, we expect only finitely many of the statements {E_n} to be true. (And if {\sum_{i=1}^\infty \mathop{\bf P}(E_i)} is much smaller than {1}, we in fact expect none of the {E_n} to be true.)
  • If {\sum_{i=1}^\infty \mathop{\bf P}(E_i) = \infty}, we expect infinitely many of the statements {E_n} to be true.

This heuristic is motivated both by the Borel-Cantelli lemma, and by the standard probabilistic computation that if one is given jointly independent, and genuinely probabilistic, events {E_1, E_2, \ldots} with {\sum_{i=1}^\infty \mathop{\bf P}(E_i) = \infty}, then one almost surely has an infinite number of the {E_i} occuring.

Before we get to the ABC conjecture, let us give two simpler (and well known) demonstrations of these heuristics in action:

Example 1 (Twin prime conjecture) One can heuristically justify the twin prime conjecture as follows. Using the prime number theorem, one can heuristically assign a probability of {1/\log n} to the event that any given large integer {n} is prime. In particular, the probability that {n+2} is prime will then be {1/\log(n+2)}. Making the assumption that there are no strong correlations between these events, we are led to the prediction that the probability that {n} and {n+2} are simultaneously prime is {\frac{1}{(\log n)(\log n+2)}}. Since {\sum_{n=1}^\infty \frac{1}{(\log n) (\log n+2)} = \infty}, the Borel-Cantelli heuristic then predicts that there should be infinitely many twin primes.

Note that the above argument is a bit too naive, because there are some non-trivial correlations between the primality of {n} and the primality of {n+2}. Most obviously, if {n} is prime, this greatly increases the probability that {n} is odd, which implies that {n+2} is odd, which then elevates the probability that {n+2} is prime. A bit more subtly, if {n} is prime, then {n} is likely to avoid the residue class {0 \hbox{ mod } 3}, which means that {n+2} avoids the residue class {2 \hbox{ mod } 3}, which ends up decreasing the probability that {n+2} is prime. However, there is a standard way to correct for these local correlations; see for instance in this previous blog post. As it turns out, these local correlations ultimately alter the prediction for the asymptotic density of twin primes by a constant factor (the twin prime constant), but do not affect the qualitative prediction of there being infinitely many twin primes.

Example 2 (Fermat’s last theorem) Let us now heuristically count the number of solutions to {x^n+y^n=z^n} for various {n} and natural numbers {x,y,z} (which we can reduce to be coprime if desired). We recast this (in the spirit of the ABC conjecture) as {a+b=c}, where {a,b,c} are {n^{th}} powers. The number of {n^{th}} powers up to any given number {N} is about {N^{1/n}}, so heuristically any given natural number {a} has a probability about {a^{1/n - 1}} of being an {n^{th}} power. If we make the naive assumption that (in the coprime case at least) there is no strong correlation between the events that {a} is an {n^{th}} power, {b} is an {n^{th}} power, and {a+b} being an {n^{th}} power, then for typical {a,b}, the probability that {a,b,a+b} are all simultaneously {n^{th}} powers would then be {a^{1/n-1} b^{1/n-1} (a+b)^{1/n-1}}. For fixed {n}, the total number of solutions to the Fermat equation would then be predicted to be

\displaystyle  \sum_{a=1}^\infty \sum_{b=1}^\infty a^{1/n-1} b^{1/n-1} (a+b)^{1/n-1}.

(Strictly speaking, we need to restrict to the coprime case, but given that a positive density of pairs of integers are coprime, it should not affect the qualitative conclusion significantly if we now omit this restriction.) It might not be immediately obvious as to whether this sum converges or diverges, but (as is often the case with these sorts of unsigned sums) one can clarify the situation by dyadic decomposition. Suppose for instance that we consider the portion of the sum where {c=a+b} lies between {2^k} and {2^{k+1}}. Then this portion of the sum can be controlled by

\displaystyle  \sum_{a \leq 2^{k+1}} \sum_{b \leq 2^{k+1}} a^{1/n-1} b^{1/n-1} O( ( 2^k )^{1/n - 1} )

which simplifies to

\displaystyle  O( 2^{(3/n - 1)k} ).

Summing in {k}, one thus expects infinitely many solutions for {n=2}, only finitely many solutions for {n>3} (indeed, a refinement of this argument shows that one expects only finitely many solutions even if one considers all {n>3} at once), and a borderline prediction of there being a barely infinite number of solutions when {n=3}. Here is of course a place where a naive application of the probabilistic heuristic breaks down; there is enough arithmetic structure in the equation {x^3+y^3=z^3} that the naive probabilistic prediction ends up being an inaccurate model. Indeed, while this heuristic suggests that a typical homogeneous cubic should have a logarithmic number of integer solutions of a given height {N}, it turns out that some homogeneous cubics (namely, those associated to elliptic curves of positive rank) end up with the bulk of these solutions, while other homogeneous cubics (including those associated to elliptic curves of zero rank, including the Fermat curve {x^3+y^3=z^3}) only get finitely many solutions. The reasons for this are subtle, but certainly the high degree of arithmetic structure present in an elliptic curve (starting with the elliptic curve group law which allows one to generate new solutions from old ones, and which also can be used to exclude solutions to {x^3+y^3=z^3} via the method of descent) is a major contributing factor.

Below the fold, we apply similar heuristics to suggest the truth of the ABC conjecture.

— 1. The ABC conjecture —

The ABC conjecture asserts that for every {\epsilon > 0}, one has

\displaystyle  c \ll_\epsilon \hbox{rad}(abc)^{1+\epsilon}

whenever {a+b=c} are coprime natural numbers, and {\hbox{rad}(abc)} is the product of all the primes dividing {abc}. Since {a,b,c} are coprime, {\hbox{rad}(abc) = \hbox{rad}(a) \hbox{rad}(b) \hbox{rad}(c)}, and an equivalent formulation of the conjecture is as follows: if {\alpha,\beta,\gamma \geq 0} are real numbers with {\alpha+\beta+\gamma < 1}, then for sufficiently large {N}, there does not exist any solution to the equation {a+b=c} with {N \leq c \leq 2N} (say), {a,b,c} coprime, and

\displaystyle  \hbox{rad}(a) \leq N^\alpha; \hbox{rad}(b) \leq N^\beta; \hbox{rad}(c) \leq N^\gamma.

Indeed, one can deduce the former version of the ABC conjecture from the latter by using a finite mesh of triples {(\alpha,\beta,\gamma)} with spacing equal to small multiple of {\epsilon}; we leave the details to the interested reader.

We can now try to randomly find counterexamples to the abc conjecture as follows:

  1. Pick a large {N} (say, a power of two).
  2. Pick coprime squarefree numbers {x \leq N^\alpha}, {y \leq N^\beta}, {z \leq N^\gamma}.
  3. Pick numbers {a,b,c = O(N)} with {\hbox{rad}(a) = x, \hbox{rad}(b) = y, \hbox{rad}(c) = z} with {c} comparable to {N}.
  4. Check if {a+b=c}.

Steps 1, 2, and 4 are easy to apply probabilistic heuristics to. For Step 3, we need the following lemma, reminiscent of the classical divisor bound:

Lemma 3 Let {x \leq N} be a square-free integer. Then there are at most {O(N^{o(1)})} integers {n} less than {N} with radical {x}.

Proof: Factorise {x = p_1 \ldots p_k}. We are trying to establish the bound {A = O(N^{o(1)})}, where {A} is the number of numbers {n} less than {N} which are divisible by {p_1,\ldots,p_k} but by no other primes, that is to say they lie in the set {S := \{ p_1^{a_1} \ldots p_k^{a_k}: a_1,\ldots,a_k \geq 1 \}}. To do this, it is convenient to work with Dirichlet series, and specifically with the sum

\displaystyle  \sum_{n \in S} \frac{1}{n^{\epsilon}}

for some parameter {\epsilon>0} to be optimised in later. On the one hand, this expression is at least {N^{-\epsilon} A}. On the other hand, we have the factorisation

\displaystyle  \sum_{n \in S} \frac{1}{n^{\epsilon}} = \prod_{i=1}^k (\frac{1}{p_i^\epsilon} + \frac{1}{p_i^{2\epsilon}} + \ldots).

We crudely bound

\displaystyle  (\frac{1}{p_i^\epsilon} + \frac{1}{p_i^{2\epsilon}} + \ldots) \leq C_\epsilon

for some {C_\epsilon} depending only on {\epsilon}, using the trivial bound {p_i \geq 2} and the geometric series formula, leading to

\displaystyle  A \leq N^\epsilon C_\epsilon^k.

On the other hand, since

\displaystyle  \sum_{i=1}^k \log p_i = \log x \leq \log N,

we see from the prime theorem (which gives a lower bound on the sum of the logarithms of the first {k} primes, which in turn lower bounds {\sum_{i=1}^k \log p_i}) that

\displaystyle  k \ll \frac{\log N}{\log \log N} = o(\log N),

and so

\displaystyle  A \leq N^{\epsilon + o(1)}

for any fixed {\epsilon}; as {\epsilon>0} can be arbitrarily small, the claim follows. \Box

Now we can run the probabilistic heuristics. After selecting {N} as a large power of two in Step 1, we have {O(N^{\alpha+\beta+\gamma})} choices for {x,y,z} in Step 2. By the above lemma, for each {x,y,z}, there are {O(N^{o(1)})} choices for {a,b,c} that are of magnitude {O(N)}, and assuming (in the case when {a,b,c} are coprime) that there is no prior correlation between {c} and {a+b}, which we think of as being randomly distributed amongst the numbers of size {O(N)}, the probability that {a+b=c} should be of order {1/N}. This leads to a total probability of {\sum_N O( N^{\alpha+\beta+\gamma-1+o(1)})}; since {\alpha+\beta+\gamma<1}, this sum is convergent (for {N} ranging over powers of {2}), so we expect only finitely many counterexamples, thus supporting the ABC conjecture.

Remark 1 In the case of {x^3+y^3=z^3}, the naive heuristic prediction was incorrect, basically because of the algebraic structure in the Fermat curve {x^3+y^3=z^3} (which, among other things, is an elliptic curve and thus enjoys a group law). In the case of the general {a+b=c} equation, though, no analogous algebraic structure appears to be present. So there is no obvious correlation here. (Of course, this does not rule out the possibility of a much less obvious correlation; this is one of the reasons why the above arguments are only heuristic, and fall well short of a rigorous proof of anything.)