all that jazz

james' blog about scala and all that jazz

Cracking Random Number Generators - Part 4

In Part 3 of this series, we investigated the Mersenne Twister, and saw how with 624 consecutive integers obtained from it, we can predict every subsequent integer it will produce. In this part, we will look at how to calculate previous integers that it has produced. We will also discuss how we might go about determining the internal state of the Mersenne Twister if we are unable to get 624 consecutive integers.

Breaking it down

Before we go on, let's look again at the algorithm for generating the next value for state[i]:

int y = (state[i] & 0x80000000) + (state[(i + 1) % 624] & 0x7fffffff);
int next = y >>> 1;
if ((y & 1L) == 1L) {
  next ^= 0x9908b0df;
}
next ^= state[(i + 397) % 624];

We can see that there are 3 numbers from the previous state that are involved here, the old state[i], state[(i + 1) % mod 624], and state[(i + 397) % mod 624]. So, taking these 3 numbers, let's have a look at what this looks like in binary:

1. 11100110110101000100101111000001 // state[i]
2. 10101110111101011001001001011111 // state[i + 1]
3. 11101010010001001010000001001001 // state[i + 397]

// y = state[i] & 0x80000000 | state[i + 1] & 0x7fffffff
4. 10101110111101011001001001011111 // y
5. 01010111011110101100100100101111 // next = y >>> 1
6. 11001110011100100111100111110000 // next ^= 0x9908b0df
7. 00100100001101101101100110111001 // next ^= state[i + 397]

If we work backwards from i = 623, then the pieces of information from the above equation is the end result (7), state[i + 1] and state[i + 397]. Starting from the result, the easiest step to unapply is the last one, undoing an xor is as simple as applying that same xor again. So we can get from 7 to 6 by xoring with state[i + 397].

To get from 6 to 5 depends on whether y was odd, if it wasn't, then no operation was applied. But we can also see from the bitshift to the right applied from 4 to 5, that the first bit at 5 will always be 0. Additionally, the first bit of the magic number xored at 6 is 1. So, if the first bit of the number at 6 is 1, then the magic number must have been applied, otherwise it hasn't. Hence, we can conditionally unapply step 5.

At this point, we have the first bit of the old state[i] calculated, in addition to the middle 30 bits of state[i + 1] calculated. We can also infer the last bit of state[i + 1], it is the same as the last bit of y, and if y was odd, then the magic number was applied at step 6, otherwise it wasn't. We've already worked out whether the magic number was applied at step 6, so if it was, the last bit of state[i + 1] was 1, or 0 otherwise.

However, as we work backwards through the state, we will already have calculated state[i + 1], so determining its last 31 bits is not useful to us. What we really want is to determine the last 31 bits of state[i]. To do this we can apply the same transformations listed above to state[i - 1]. This will give us the last 31 bits of state[i].

Putting it all together, our algorithm looks like this:

for (int i = 623; i >= 0; i--) {
  int result = 0;
  // first we calculate the first bit
  int tmp = state[i];
  tmp ^= state[(i + 397) % 624];
  // if the first bit is odd, unapply magic
  if ((tmp & 0x80000000) == 0x80000000) {
    tmp ^= 0x9908b0df;
  }
  // the second bit of tmp is the first bit of the result
  result = (tmp << 1) & 0x80000000;

  // work out the remaining 31 bits
  tmp = state[(i - 1 + 624) % 624];
  tmp ^= state[(i + 396) % 624];
  if ((tmp & 0x80000000) == 0x80000000) {
    tmp ^= 0x9908b0df;
    // since it was odd, the last bit must have been 1
    result |= 1;
  }
  // extract the final 30 bits
  result |= (tmp << 1) & 0x7fffffff;
  state[i] = result;
}

Dealing with non consecutive numbers

What happens, if when collecting 624 numbers from the application, that some other web request comes in at the same time and obtains a number. Our 624 numbers won't be consecutive. Can we detect that, and what can we do about it?. Detecting it is simple, having collected 624, we can predict the 625th, if it doesn't match the next number, then we know we've missed some.

Finding out how many numbers we've missed is the next task. Let's say we only missed the 624th number. This is fairly straight forward to detect, we would find that our state[623] is equal to what we were expecting to be the new state[0]. We would then know that we've missed one number, and by continuing to extract numbers from the application, and comparing that with the results we were expecting, we can narrow down which one.

A generalised algorithm for doing this is beyond the scope of these blog posts. But it should be clear from the reverse engineering of the steps that if most of the values are correct, but only a few are missing, determining what they were will be a fairly simple process.

We now know how to determine the internal state and how to go backwards as well as forwards in two of the most popular PRNG algorithms. In Part 5, we will look at what developers can do to ensure that their applications are safe against PRNG attacks.

Cracking Random Number Generators - Part 3

In Part 1 and Part 2 of this series we focussed on one of the simplest PRNG's, the linear congruential PRNG. We looked at detail into Java's implementation, and then wrote algorithms to crack the seed, and to calculate previous seeds from the current seed.

Not many other languages use a linear congruential PRNG. Originally, the C and C++ implementations of rand() were very simple linear congruential PRNGs, but these days they use more complex algorithms to generate higher quality random numbers. PHPs rand() function delegates to the platforms rand() function, so depending on the platform, it may be a linear congruential PRNG.

A much more popular PRNG algorithm is the Mersenne Twister. This is used by Ruby's rand() function, Pythons random module, and PHP's mt_rand() function.

The Mersenne Twister was invented in 1997. The most common implementation of it uses an internal state of 624 32 bit words (integers). These integers are handed out sequentially, applying a transform algorithm to each one before handing them out. Once all 624 integers have been handed out, an algorithm is applied to the state, to get the next 624 integers.

Generating the next state

The algorithm for generating the next state is simple, it heavily uses bitwise operations such as shifts, masks and xors. The algorithm is as follows:

int[] state;
// Iterate through the state
for (i = 0; i < 624; i++) {
  // y is the first bit of the current number,
  // and the last 31 bits of the next number
  int y = (state[i] & 0x80000000) + (state[(i + 1) % 624] & 0x7fffffff);
  // first bitshift y by 1 to the right
  int next = y >>> 1;
  // xor it with the 397th next number
  next ^= state[(i + 397) % 624];
  // if y is odd, xor with magic number
  if ((y & 1L) == 1L) {
    next ^= 0x9908b0df;
  }
  // now we have the result
  state[i] = next;
}

Obtaining the next number

Before handing the next integer out, a transform is applied to each integer. This transform has 4 steps, each step involves xoring the number with a bit shifted and bit masked version of itself. The algorithm is as follows:

currentIndex++;
int tmp = state[currentIndex];
tmp ^= (tmp >>> 11);
tmp ^= (tmp << 7) & 0x9d2c5680;
tmp ^= (tmp << 15) & 0xefc60000;
tmp ^= (tmp >>> 18);
return tmp;

Cracking the Mersenne Twister

Since the Mersenne Twister contains 624 integers of internal state that it hands out sequentially, obtaining 624 consecutive integers from the generator is the first step in cracking it. In a common webapp, this might be done by issuing 624 requests, and recording the token. If we can then undo the transform applied to these integers, we will then have the complete internal state. Having calculated that, we can predict every subsequent number.

In order to undo the transform, we will take it one step at a time, in reverse order. The step of the transform we need to understand first is therefore:

tmp ^= (tmp >>> 18);

Let's look at what this looks like in binary:

10110111010111100111111001110010                    tmp
00000000000000000010110111010111100111111001110010  tmp >>> 18
10110111010111100101001110100101                    tmp ^ (tmp >>> 18)

As is clear from the bold bits above, we can easily work out what the first 18 bits of the previous number was, they are the same as the result. Taking those 18 bits, we can bitshift them across 18 bits, and xor them with the value, this will give us the entire number. A generalised algorithm can be written to do this for any arbitrary number of bits, and we can reuse this to reverse the first step as well:

int unBitshiftRightXor(int value, int shift) {
  // we part of the value we are up to (with a width of shift bits)
  int i = 0;
  // we accumulate the result here
  int result = 0;
  // iterate until we've done the full 32 bits
  while (i * shift < 32) {
    // create a mask for this part
    int partMask = (-1 << (32 - shift)) >>> (shift * i);
    // obtain the part
    int part = value & partMask;
    // unapply the xor from the next part of the integer
    value ^= part >>> shift;
    // add the part to the result
    result |= part;
    i++;
  }
  return result;
}

This method can be used to reverse the two right bitshift operations. The left bitshift operations also include masking the bitshifted number with a magic number, so when we unapply part as above, we have to apply the mask to it, as follows:

int unBitshiftLeftXor(int value, int shift, int mask) {
  // we part of the value we are up to (with a width of shift bits)
  int i = 0;
  // we accumulate the result here
  int result = 0;
  // iterate until we've done the full 32 bits
  while (i * shift < 32) {
    // create a mask for this part
    int partMask = (-1 >>> (32 - shift)) << (shift * i);
    // obtain the part
    int part = value & partMask;
    // unapply the xor from the next part of the integer
    value ^= (part << shift) & mask;
    // add the part to the result
    result |= part;
    i++;
  }
  return result;
}

Using these two functions, we can unapply the transformation applied to the numbers coming out of the PRNG:

int value = output;
value = unBitshiftRightXor(value, 18);
value = unBitshiftLeftXor(value, 15, 0xefc60000);
value = unBitshiftLeftXor(value, 7, 0x9d2c5680);
value = unBitshiftRightXor(value, 11);

Having applied the above algorithm to 624 consecutive numbers from the PRNG, we now have the complete state of the Mersenne Twister, and can easily determine every subsequent value. In Part 4 of the series, we'll investigate how to find previous values generated by the Mersenne Twister, and also discuss how to deal with being unable to get consecutive numbers out of the PRNG.

Cracking Random Number Generators - Part 2

In Part 1 of this series, we saw how simple it is to predict future values generated by a linear congruential PRNG. In this part, we will look at how to calculate past values generated by a linear congruential PRNG.

Undoing three simple operations

As we saw before, going forward on a pseudo random number generator involves multiplying by the multiplier, adding the addend, and then trimming the result back down to 48 bits. Surely this can be solved by a simple subtraction and divide?

It's not that simple. Let's have a look at it in binary:

Seed 1:     110011111101000111101001010011010100010001001111
Multiplier:              10111011110111011001110011001101101 *
------------------------------------------------------------
             11111011111111000111100000110010000111110100011
Addend:                                                 1011 +
------------------------------------------------------------
Seed 2:      11111011111111000111100000110010000111110101110
Addend:                                                 1011 -
------------------------------------------------------------
             11111011111111000111100000110010000111110100011
Multiplier:              10111011110111011001110011001101101 \
------------------------------------------------------------
                                               1010101110110

As you can see, we've done the multiply, the add, then subtracted the addend, and divided by the multiplier, and the result is nothing like the original seed. Why not? The problem is the bitmask we applied. When we applied the multiplier originally, it overflowed beyond 48 bits. The most significant bits were discarded, meaning that a simple divide can not get us the result. So how can we find the result?

The solution comes in going back to primary school. In primary school we learnt how to do long multiplication. Let's look at what multiplying two small binary numbers looks like, using long multiplication:

     101101
     001011 *
-----------
     101101
    101101
   000000
  101101
 000000
000000
-----------
00111110111 
     111111 &
-----------
     110111

Let's say that the first number is our multiplier, it is the known, and the second number is the unknown that we want to determine. The final number is the seed that we have, trimmed down to the number of bits, it is known, but the excess bits are not.

It should be clear from the above that there is a pattern, for each 1 in the second number, there is a corresponding first number on the line below. We can also see that if the last digit of the original number was 0, then the last digit of the new number will be 0, if it was 1, then the new number will be 1. In the above example we can determine therefore that the last digit in the original number was 1. Because we know that, subtract the multiplier from our seed, and reverse the effects of the value of the last number in the original number. Here is the result:

Result: -----1

     101101
     00101- *
-----------
    101101
   000000
  101101
 000000
000000
-----------
00111000010 
     111111 &
-----------
     000010
We can then continue the process with the next digit, if the next digit from the end in the seed is 1, then the next digit in the result is 1, otherwise its 0, and if it is 1, we subtract the multiplier, this time bitshifted across by 1:
Result: ----11

     101101
     0010-- *
-----------
   000000
  101101
 000000
000000
-----------
00101101000 
     111111 &
-----------
     101000

Continuing this process, of scanning from right to left, taking the next digit in the seed, and subtracting the effect that that digit would have had on the seed, we can determine the end result.

This only works if our multiplier is odd, which it is. If it weren't, we would only have to bitshift the seed to the left by the number of 0's on the end of the multiplier, and every bit that we shifted across would be one less bit of the final result that we could determine. So our algorithm for determining the previous seed looks like this:

long seed = currentSeed;
// reverse the addend from the seed
seed -= addend; // reverse the addend
long result = 0;
// iterate through the seeds bits
for (int i = 0; i < 48; i++)
{
    long mask = 1L << i;
    // find the next bit
    long bit = seed & mask;
    // add it to the result
    result |= bit;
    if (bit == mask)
    {
        // if the bit was 1, subtract its effects from the seed
        seed -= multiplier << i;
    }
}
System.out.println("Previous seed: " + result);

And there we have it, we can now predict all previous values from java.util.Random given the current seed.

That covers everything you need to know about hacking a linear congruential PRNG. However, most common platforms don't use such a simple PRNG as their default random number generator. In Part 3 we will look at how to hack a very popular PRNG, the Mersenne Twister.

Cracking Random Number Generators - Part 1

This is the first in a series of posts I'm going to give on how to crack common random number generators. Since first posting this series, I have lost count of how many people have contacted me asking if I'll help them crack random number generators, particularly for gambling systems such as lotto and betting. The answer is no, I prefer to spend my time being useful to society. Please don't contact me, your idea is not original, and even if I wanted to make money by cheating gambling systems, why would I work with you? Why wouldn't I just do it myself and take all the winnings for myself? No, I will not help you.

Random number generators are a key part of web security. They are used all over the place, from session tokens to tokens to sent to an email address to verify its owner made a particular request, to CAPTCHA tokens and so on. In all these applications, if the token can be predicted, then the security mechanism can be broken, and a malicious user will be able to identify themselves as someone who they are not.

There are obvious concerns with publishing instructions explaining how to exploit security vulnerabilities. However, I have some good reasons for doing so:

  1. There is nothing new about the vulnerabilities associated with random number generation, nothing that I'm publishing here is new to hackers either.
  2. Hacking random number generators is actually very easy to do, and hackers know this.
  3. Many developers that I have come across are of the belief that hacking random number generators is a hard to exploit avenue of attack. Even when they know that there are dangers in random number generation, their understanding is often incomplete, leading them to make serious mistakes.
  4. None of the algorithms I supply here can be used as is, they require an attacker to know exactly what algorithm is being used on a system, how to extract numbers out of it, and having then cracked the number generator, how to exploit it. Knowing all this I believe is harder than working out for yourself how to crack the random number generators themselves, so if they couldn't work out what I am saying for themselves, they likely won't be able to use it to hack into a real system.

Hence I believe that there is nothing a hacker will learn from this series that they can't work out for themselves. Meanwhile, many developers live in ignorance, and would never bother to see if their understanding of random number generators is flawed. My hope therefore is to give developers an understanding of just how dangerous their ignorance can be.

What I won't talk about in this series is anything about the maths of random number generators, beyond explaining how the algorithms are implmented. Why a particular algorithm makes a good PRNG is beyond the scope of this series.

Linear Congruential PRNG

The first PRNG we will focus on is the linear congruential PRNG. Rather than talk theoretically, we'll look at a particularly common one, Java's default random number generator, java.util.Random.

The idea behind a linear congruential PRNG is that you store a single number as the internal state. This state is usually called the seed, because the two numbers are usually one in the same thing. In Java's case, this is not quite true, but to keep our explanation simple, we will assume that it is. The seed has a precision, in Java's case, the precision is 48 bits.

The seed changes each time a number is generated, by applying a simple formula. That formula is:

seed = (seed * multiplier + addend) mod (2 ^ precision)

The key to this being a good random number generator is the choice of multiplier and addend. In Java's case, the multiplier is 25214903917, and the addend is 11. As I said earlier, what makes these two numbers good is beyond the scope of this series. The mod operation is implemented using a bitmask, 48 1's.

java.util.Random never gives out its full 48 bits of state, it gives out at most 32 on each call to nextInt(). Other calls that return more bits, for example, nextLong(), generate multiple 32 bit numbers and combine them together. To convert the 48 bit seed to a 32 bit int, the seed is bitshifted to the right by 16 bits.

Determining the seed from a Linear Congruential PRNG's output

It is not possible, from one int generated by java.util.Random, to determine the seed, because 16 bits were discarded by the bitshift. However, if we obtain a second int from it, we can quite easily guess what those remaining 16 bits were, by brute forcing all possible values, and seeing if the next value from that matches the second value we obtained. 16 bits is only 65536 possible values, and calculating the next int is only a few instructions, so this can be computed in a fraction of a second.

The code in Java looks like this:

Random random = new Random();
long v1 = random.nextInt();
long v2 = random.nextInt();
for (int i = 0; i < 65536; i++) {
    long seed = v1 * 65536 + i;
    if (((seed * multiplier + addend) & mask) >>> 16) == v2) {
        System.out.println("Seed found: " + seed);
        break;
    }
}

As can be seen, given two integers from java.util.Random, we can predict all future generated integers. In Part 2 we'll get a little more involved in the maths, and learn how to calculate the previous seeds.

About

Hi! My name is James Roper, and I am a software developer with a particular interest in open source development and trying new things. I program in Scala, Java, Go, PHP, Python and Javascript, and I work for Lightbend as the architect of Kalix. I also have a full life outside the world of IT, enjoy playing a variety of musical instruments and sports, and currently I live in Canberra.