Survey of Discrete Log Algorithms
Dec 03, 2017In 1999 Dan Boneh published a survey paper called “Twenty years of attacks on the RSA cryptosystem”. I like to tell people that if they read the entire paper they can solve just about every CTF challenge involving RSA.
This post is my attempt to do something similar with the discrete log problem, a primitive that underlies a bunch of cryptographic protocols including the Diffie-Hellman key exchange and the ElGamal encryption system. The algorithms all have similar-sounding names, and I wanted to just sit down and describe each of them to get them straight in my head.
The Discrete Log Problem
Suppose we have a group generated by an element (this means that every element can be written as for some integer .) The discrete log of an element is the number . For cyclic groups of order , discrete logs are unique modulo .
A common choice for in cryptography is , or the multiplicative group modulo the prime . Fermat’s little theorem tells us that , so the order of is .
If the order of the cyclic group is not a smooth number, i.e. if has at least one large prime factor, then it is believed that it is hard to compute the discrete logarithm on elements of the group.
This hardness assumption forms the basis for a large portion of modern cryptography. As a result, it’s important to understand the best algorithms that exist to compute discrete logs.
Approach #1: Brute Force
Given an element in a group with generator , we’re trying to find an integer such that . A naive way to do this is to simply try all possible . This approach has time complexity where is the order of the group and space complexity .
This website is secured using an elliptic curve group with order Using this approach to find a discrete log in this group would take around trials, which is a little much.
Approach #2: Baby-step giant-step
There’s a relatively easy way to improve the running time of the brute-force approach using a common technique in cryptography. We’re trying to find such that . Let’s rewrite as where for integers and between and . Now we can rewrite as
Next, let’s compute and store for all possible values of in a set, and then run through every possible value of and check when is in our set. Once we find a colliding and , we can compute .
This is an example of a space-time tradeoff; we’ve essentially turned an time, space algorithm into an time, space algorithm.
The name stems from the fact that every increment of represents a “baby-step” of size while every increment of represents a “giant-step” of size .
Approach #3: Pollard’s algorithm
It turns out it’s possible to compute discrete logs with time and space. Here’s the idea:
First, partition the group into three pairwise-disjoint, approximately-equally sized sets and such that . This partition should be made at random.
Consider the following sequence where and where
Basically, a third of the time we increase by one, a third of the time we increase by one, and a third of the time we double both and . This is just a convenient way to make the sequence walk through the space of pseudorandomly.
Why do we want to walk through this space? Well, consider what happens if we find a collision; that is, four integers such that
We can rewrite this as and then compute the discrete log of as (division just means “multiply by the inverse modulo , the order of the group”).
So how do we find a collision in our sequence using space? A collision in our sequence means that our sequence contains a cycle, and there’s a common trick that frequently pops up in algorithms interviews that allows us to find cycles in linked lists using space.
The idea is to use two iterators. One iterator increments through the sequence by one at each step, while the other increments by two at each step. If the linked list has a cycle, the two iterators will, at some point, have the same value.
For our sequence, we’ll use two variables, and both initialized to . On each step of our algorithm, we’ll set to and to and check if the two are equal. If they are, we have a collision and can solve the DLP. If not, continue.
By the birthday paradox, it should take on the order of steps before we find a collision, so the running time is .
The name “rho” stems from the fact that the sequence with a cycle, when considered as a directed graph, takes the shape of the Greek letter . Pollard also has an analogous algorithm for factorization.
Approach #4: Pollard’s kangaroo algorithm
In the same paper where he introduced his rho-algorithm, Pollard also discussed a different algorithm to solve the discrete log problem. This algorithm is most useful when you already know some information about . Specifically, if you know that , then this algorithm will take time on expectation.
Here’s how it works: Once again we’ll define a pseudorandom sequence where
for some small . (In practice I’ve found that setting works pretty well.) Also define and note that
The point is that on each step, this sequence jumps forward by a random-looking power of two between and . In his paper, Pollard makes an analogy to a kangaroo leaping forward by a random distance on each jump.
The first step is to launch a “tame” kangaroo starting from and tell it to jump times, for some fixed . Mathematically this means computing and starting from and . Note that we don’t need to store the values of the intermediate hops; we just make our jumps and then stop.
Now, let’s try to use our tame kangaroo to catch a wild kangaroo. Define a similar sequence starting from . Repeatedly compute the next value in the sequence and check if it is equal to . We don’t know the corresponding values of (indeed, that’s the value we’re trying to find), but we can keep track of how much distance we’ve hopped from the initial because on each hop we travel .
The hope is that at some point the wild kangaroo will eventually land on the same location as a jump of the original tame kangaroo, and once the two kangaroos are on the same point, every subsequent jump will be identical since the sequence is generated deterministically. Thus, if the wild kangaroo ever lands on the same spot as the tame kangaroo, it will eventually end up on the tame kangaroo’s final location .
In the above diagram, the blue arrows represent the tame kangaroo and the red arrows represent the wild kangaroo. If the wild kangaroo eventually hits after traveling a distance , we can compute:
so the discrete log of is .
There are a couple of remaining problems. This algorithm is inherently probabilistic, and it’s possible that the wild kangaroo will never overlap with the tame kangaroo. We can detect when this has happened by stopping the wild kangaroo after it has traveled a distance of . Since we know our wild kangaroo starts somewhere between and , if it has traveled more than a distance of it has definitely passed our tame kangaroo sitting at .
If this happens, the algorithm has failed, and we’ll need to try again with a different tame kangaroo (start the tame kangaroo at instead of for example).
Note that both Pollard’s method and his kangaroo method are probabilistic, but in different ways. The method takes an unbounded amount of time, but always succeeds when it terminates, while one iteration of the kangaroo method takes a bounded amount of time, but is not guaranteed to succeed. This is the difference between a Las Vegas algorithm and a Monte Carlo algorithm.
Finally, we need to choose an appropriate value of and then prove that the algorithm runs in time . We’d like to set such that the probability of succeeding on one iteration is constant (not dependent on ). Suppose the mean jump length is . The probability of any one jump of the wild kangaroo landing on a jump of the tame kangaroo is approximately . The tame kangaroo makes jumps, and we succeed as long as at least one of those jumps coincides with the wild kangaroo’s path. The probability of this is:
If we set , then this is approximately The mean jump length is so a good choice is .
The total number of jumps taken by all the kangaroos on expectation is
The minimum value of this is which is achieved when . (Incidentally, since , this justifies setting ).
Thus, the algorithm takes time when it succeeds, and will succeed with probability 0.98.
Pollard has also referred to this algorithm as his algorithm because the shape of the converging kangaroo paths resembles the Greek letter lambda.
Approach #5: The Pohlig-Hellman algorithm
Remember how we said earlier that the order of the group needs to not be a smooth number? Here’s what goes wrong if has a bunch of small factors:
(Some group theory knowledge is helpful for comprehension). Suppose that
where the ’s are prime and unique. As usual we have a and we want to find .
For each we’ll compute and then use the Chinese Remainder Theorem to find itself.
For each , let . This element has order (raise to this power to verify). Now consider
Since is small, we should be able to find the between and such that using any of the previous DLP algorithms. Once we’ve done this for all we have a system of linear congruences of the form . Using CRT, we can then derive .
Note that , the number of prime divisors of is bounded by , so it’s insignificant. The time complexity is then bounded by the size of the largest prime and is . The space complexity is but since we need this much space to store the number in the first place, we’ll call it .
Comparison table
Algorithm | Time Complexity | Space Complexity | Notes |
---|---|---|---|
Brute Force | Don't use. | ||
Baby-step Giant-step | Easy to implement, but Pollard's is better. | ||
Pollard's | Fastest known DLP algorithm for prime order. | ||
Pollard's kangaroo | Use when you already know some information about . | ||
Polhig-Hellman | Use when is factorable into many small primes. |