Survey of Discrete Log Algorithms

In 1999 Dan Boneh published a survey paper called “Twenty years of attacks on the RSA cryptosystem”. I like to tell people that if they read the entire paper they can solve just about every CTF challenge involving RSA.

This post is my attempt to do something similar with the discrete log problem, a primitive that underlies a bunch of cryptographic protocols including the Diffie-Hellman key exchange and the ElGamal encryption system. The algorithms all have similar-sounding names, and I wanted to just sit down and describe each of them to get them straight in my head.

The Discrete Log Problem

Suppose we have a group G generated by an element g (this means that every element can be written as gk for some integer k.) The discrete log of an element y=gx is the number x. For cyclic groups of order n, discrete logs are unique modulo n.

A common choice for G in cryptography is (Zp)×, or the multiplicative group modulo the prime p. Fermat’s little theorem tells us that gp1=1(modp), so the order of (Zp)× is p1.

If the order of the cyclic group is not a smooth number, i.e. if p1 has at least one large prime factor, then it is believed that it is hard to compute the discrete logarithm on elements of the group.

This hardness assumption forms the basis for a large portion of modern cryptography. As a result, it’s important to understand the best algorithms that exist to compute discrete logs.

Approach #1: Brute Force

Given an element y in a group G with generator g, we’re trying to find an integer x such that gx=y. A naive way to do this is to simply try all possible x. This approach has time complexity O(n) where n is the order of the group and space complexity O(1).

This website is secured using an elliptic curve group with order n=22562224+219289188191075325690597107910205041859247. Using this approach to find a discrete log in this group would take around 2256 trials, which is a little much.

Approach #2: Baby-step giant-step

There’s a relatively easy way to improve the running time of the brute-force approach using a common technique in cryptography. We’re trying to find x such that gx=y. Let’s rewrite x as im+j where m=n for integers i and j between 0 and m. Now we can rewrite y=gx=gim+j as (gm)iy=gj.

Next, let’s compute and store gj for all possible values of j in a set, and then run through every possible value of i and check when (gm)iy is in our set. Once we find a colliding i and j, we can compute x=im+j.

This is an example of a space-time tradeoff; we’ve essentially turned an O(n) time, O(1) space algorithm into an O(n) time, O(n) space algorithm.

The name stems from the fact that every increment of j represents a “baby-step” of size g1 while every increment of i represents a “giant-step” of size gm.

Approach #3: Pollard’s ρ algorithm

It turns out it’s possible to compute discrete logs with O(n) time and O(1) space. Here’s the idea:

First, partition the group into three pairwise-disjoint, approximately-equally sized sets S0,S1, and S2 such that G=S0S1S2. This partition should be made at random.

Consider the following sequence hi=gaiybi where a0=b0=1 and hi=f(hi1) where

f(h)={gh,hS0h2,hS1yh,hS2

Basically, a third of the time we increase ai by one, a third of the time we increase bi by one, and a third of the time we double both ai and bi. This is just a convenient way to make the sequence walk through the space of gaiybi pseudorandomly.

Why do we want to walk through this space? Well, consider what happens if we find a collision; that is, four integers a,b,a,b such that

gayb=gayb.

We can rewrite this as y=g(aa)/(bb) and then compute the discrete log of y as (aa)/(bb) (division just means “multiply by the inverse modulo n, the order of the group”).

So how do we find a collision in our sequence using O(1) space? A collision in our sequence means that our sequence contains a cycle, and there’s a common trick that frequently pops up in algorithms interviews that allows us to find cycles in linked lists using O(1) space.

The idea is to use two iterators. One iterator increments through the sequence by one at each step, while the other increments by two at each step. If the linked list has a cycle, the two iterators will, at some point, have the same value.

For our sequence, we’ll use two variables, s and t both initialized to h0. On each step of our algorithm, we’ll set s to f(s) and t to f(f(t)) and check if the two are equal. If they are, we have a collision and can solve the DLP. If not, continue.

By the birthday paradox, it should take on the order of O(n) steps before we find a collision, so the running time is O(n).

The name “rho” stems from the fact that the sequence with a cycle, when considered as a directed graph, takes the shape of the Greek letter ρ. Pollard also has an analogous algorithm for factorization.

Approach #4: Pollard’s kangaroo algorithm

In the same paper where he introduced his rho-algorithm, Pollard also discussed a different algorithm to solve the discrete log problem. This algorithm is most useful when you already know some information about x. Specifically, if you know that axb, then this algorithm will take O(ba) time on expectation.

Here’s how it works: Once again we’ll define a pseudorandom sequence xi where

xi+1=xi+2gxi(modk) for some small k. (In practice I’ve found that setting k=log2(ba)/2 works pretty well.) Also define yi=gxi(modp) and note that

yi+1=yig2yi(modk).

The point is that on each step, this sequence jumps forward by a random-looking power of two between 0 and 2k1. In his paper, Pollard makes an analogy to a kangaroo leaping forward by a random distance on each jump.

The first step is to launch a “tame” kangaroo starting from x0=b and tell it to jump N times, for some fixed N. Mathematically this means computing xN and yN starting from x0=b and y0=gb. Note that we don’t need to store the values of the intermediate hops; we just make our N jumps and then stop.

Now, let’s try to use our tame kangaroo to catch a wild kangaroo. Define a similar sequence starting from y0=y. Repeatedly compute the next value in the sequence and check if it is equal to yN. We don’t know the corresponding values of xi (indeed, that’s the value we’re trying to find), but we can keep track of how much distance we’ve hopped from the initial x because on each hop we travel 2yi(modk).

The hope is that at some point the wild kangaroo will eventually land on the same location as a jump of the original tame kangaroo, and once the two kangaroos are on the same point, every subsequent jump will be identical since the sequence is generated deterministically. Thus, if the wild kangaroo ever lands on the same spot as the tame kangaroo, it will eventually end up on the tame kangaroo’s final location yn.

In the above diagram, the blue arrows represent the tame kangaroo and the red arrows represent the wild kangaroo. If the wild kangaroo eventually hits yN after traveling a distance d, we can compute:

y0gd=yN=gxN y=y0=gxNd

so the discrete log of y is x=xNd.

There are a couple of remaining problems. This algorithm is inherently probabilistic, and it’s possible that the wild kangaroo will never overlap with the tame kangaroo. We can detect when this has happened by stopping the wild kangaroo after it has traveled a distance of d>xNa. Since we know our wild kangaroo starts somewhere between a and b, if it has traveled more than a distance of xNa it has definitely passed our tame kangaroo sitting at xN.

If this happens, the algorithm has failed, and we’ll need to try again with a different tame kangaroo (start the tame kangaroo at b+1 instead of b for example).

Note that both Pollard’s ρ method and his kangaroo method are probabilistic, but in different ways. The ρ method takes an unbounded amount of time, but always succeeds when it terminates, while one iteration of the kangaroo method takes a bounded amount of time, but is not guaranteed to succeed. This is the difference between a Las Vegas algorithm and a Monte Carlo algorithm.

Finally, we need to choose an appropriate value of N and then prove that the algorithm runs in time O(ba). We’d like to set N such that the probability of succeeding on one iteration is constant (not dependent on k). Suppose the mean jump length is μ. The probability of any one jump of the wild kangaroo landing on a jump of the tame kangaroo is approximately 1/μ. The tame kangaroo makes N jumps, and we succeed as long as at least one of those jumps coincides with the wild kangaroo’s path. The probability of this is:

1(11μ)N.

If we set N=4μ, then this is approximately 1e40.98. The mean jump length is 2k/k so a good choice is N=42k/k=2k+2/k.

The total number of jumps taken by all the kangaroos on expectation is

N+(ba2μ+N)=8(μ+ba16μ).

The minimum value of this is 4ba which is achieved when μ=ba/4. (Incidentally, since μ=2k/k, this justifies setting klog2(ba)/2).

Thus, the algorithm takes O(ba) time when it succeeds, and will succeed with probability 0.98.

Pollard has also referred to this algorithm as his λ algorithm because the shape of the converging kangaroo paths resembles the Greek letter lambda.

Approach #5: The Pohlig-Hellman algorithm

Remember how we said earlier that the order of the group needs to not be a smooth number? Here’s what goes wrong if n has a bunch of small factors:

(Some group theory knowledge is helpful for comprehension). Suppose that

n=i=1kpiei where the pi’s are prime and unique. As usual we have a y=gx and we want to find x.

For each i we’ll compute xmodpiei and then use the Chinese Remainder Theorem to find x itself.

For each i, let gi=gn/piei. This element has order piei (raise gi to this power to verify). Now consider

yi=yn/piei=gxn/piei=gix=gix(modpiei)

Since piei is small, we should be able to find the ri between 0 and piei1 such that yi=gri(modp) using any of the previous DLP algorithms. Once we’ve done this for all i we have a system of linear congruences of the form x=ri(modpiei). Using CRT, we can then derive x.

Note that k, the number of prime divisors of n is bounded by k<log2n, so it’s insignificant. The time complexity is then bounded by the size of the largest prime p and is O(p). The space complexity is O(k)=O(logn) but since we need this much space to store the number n in the first place, we’ll call it O(1).

Comparison table

Algorithm Time Complexity Space Complexity Notes
Brute Force O(n) O(1) Don't use.
Baby-step Giant-step O(n) O(n) Easy to implement, but Pollard's ρ is better.
Pollard's ρ O(n) O(1) Fastest known DLP algorithm for prime order.
Pollard's kangaroo O(ba) O(1) Use when you already know some information about x.
Polhig-Hellman O(p) O(1) Use when n is factorable into many small primes.