One Number Repeated Forever: RNG in NSMB
Hello! It’s been a while, hasn’t it. Sorry for the delay. These posts sometimes take a long time to put together, since this blog isn’t my highest priority and I try to add custom multimedia to each one. Nevertheless, I already have several future posts in the works, so you can look forward to more in the coming months.
Anyway! This is a sort of “part 0” for an upcoming 2-part series about the Green Toad House minigame in New Super Mario Bros. The minigame involves randomness, so in researching it, I investigated NSMB’s random number generator (RNG). My findings didn’t turn out to be very relevant to the minigame, but are interesting in their own right.
To keep this from getting far too long, I’m assuming you already know with what an RNG is, and about the concept of seeding one. If not, here are some good resources: pannenkoek2012 on YouTube (SM64), Retro Game Mechanics Explained on YouTube (SMW), Wikipedia.
Investigating the function
To start off, here’s a manual decompilation of NSMB’s random number generator, which I am by executive decision naming “rand_nsmb
”:
uint32_t rand_nsmb(uint32_t *state) {
uint64_t value = (uint64_t)(*state) * 1664525 + 1013904223;
return *state = value + (value >> 32);
}
Interactive rand_nsmb
Want to try outrand_nsmb
? You can do that right here:uint32_t state = ;
An RNG function where the next output is calculated as the previous output times some constant, plus another constant, is called a linear congruential generator. The function above is almost an LCG. The only part that makes it not quite qualify is the + (value >> 32)
at the end.
Whenever you find a piece of unknown code with weird large constants, you can often learn a lot by Googling them. If you search for 1664525 and 1013904223, you’ll find that they’re very commonly used together in LCG functions1, and were originally published in the book Numerical Recipes. That book names the LCG function based upon these two numbers “ranqd1
” (for “random quick-and-dirty generator #1”). I’ll be using that name for this LCG from here on.
For comparison, here’s an implementation of ranqd1
(not copied directly from the book) matching the formatting of rand_nsmb
above:
uint32_t ranqd1(uint32_t *state) {
uint64_t value = (uint64_t)(*state) * 1664525 + 1013904223;
return *state = value;
}
A note about seeding
As far as I can tell, NSMB's procedure for seeding the RNG upon game boot is sound. It gathers entropy from a variety of sources, including time spent loading the game so far, current scanline number, GPU status, any buttons being held, and more. Then it takes a SHA-1 hash of it all, and XORs that up into a 32-bit integer.Given this, I think it's safe to assume for this analysis that seed values are uniformly distributed.
Other than missing the + (value >> 32)
, it’s identical to rand_nsmb
.
ranqd1
is known to have the very nice property of cycling through every 32-bit integer before looping2 (a “full cycle”), and is generally (quoting Numerical Recipes) “about as good as any 32-bit linear congruential generator.”
So NSMB’s RNG function is just ranqd1
– a decent LCG – with a small change. Nintendo adds back the bits that would otherwise be truncated away during the implicit cast to uint32_t
when returning. On the surface, this feels like it should be a good thing – instead of losing that somewhat-random information, it’s now getting recycled!
But here’s the issue: random number generation is one of those things where you should seriously not mess with tried-and-true formulas unless you really know what you’re doing. And as we’ll see soon, this change ends up bringing more issues than benefits.
Cycles
I mentioned earlier that ranqd1
cycles through every 32-bit integer before looping. With NSMB’s modification, this property doesn’t necessarily still hold, so I decided to do a fairly thorough analysis of rand_nsmb
’s cycle(s).
I wrote a C program that iterated over every 32-bit integer, plugged each one into rand_nsmb
, and followed the resulting trail of RNG values until it hit a duplicate. This scenario lends itself nicely to dynamic programming, so I had the program fill information about each seed (which cycle it ended up in, and how many steps it took to reach it) into a fairly gigantic 20 GB data table file. Even with the file stored on my laptop’s internal SSD, the program took about two weeks to finish running. Once that was done, I was able to answer a lot of questions quickly by linearly reading through the output file with Python (which takes about half an hour3) and calculating statistics I was interested in.
Here’s what I found.
Let’s look at the average case first. Given a random starting seed, rand_nsmb
will repeat an output after 1,820,529 calls, on average. While that’s a far cry from the 4,294,967,296 calls needed to get ranqd1
to repeat a number, it’s still absolutely plenty for a video game like this.
What I find more interesting is to inspect the worst cases. If you broke a mirror underneath a ladder prior to launching NSMB, how small of a cycle could you end up in? Well, here’s a list of all 1653 of rand_nsmb
’s cycles, sorted by length, and a pie chart showing the amounts of states that feed into them:
- 1 cycle of length 1,708,724
- 1 cycle of length 354,835
- 1 cycle of length 155,834
- 1 cycle of length 146,318
- 1 cycle of length 127,646
- 1 cycle of length 81,673
- 1 cycle of length 48,534
- 1 cycle of length 26,128
- 1 cycle of length 1371
- 1630 cycles of length 8
- 12 cycles of length 4
- 1 cycle of length 2
- 1 cycle of length 1
About ¾ of all states drain into the largest cycle, which is 1.7 million states long. Following that are a bunch of progressively smaller cycles, with similarly shrinking slices of the pie chart. I don’t know why there are so many cycles exactly 8 elements long, but that’s not important.
Look at the bottom of the list. There are twelve cycles 4 states long (henceforth “4-cycles”), a 2-cycle, and… a 1-cycle.
This RNG function has a fixed point.
Here it is: 1144735523. You can check it for yourself with the interactive rand_nsmb
widget from earlier if you want – rand_nsmb(1144735523)
returns 1144735523.4
To put it mildly, a random number generator that might spit out the same number over and over is not ideal.
The fixed point
Impact on gameplay
What would it be like to play NSMB with its RNG function stuck on a single number?
While you’d only have to boot the game 1.9 billion times to have a 90% probability of hitting rand_nsmb
’s fixed point, it’s a tiny bit faster to achieve it by hacking. You can use this NSMBe code patch to do it.
I played through the whole game like this. Here are some highlights of what I found:
- Tiles are not randomized (leftmost animation). This affects most levels, but is especially obvious at the beginning of 1-2.
- Certain sound effects are not randomized. Mario will always use the same voice clips when double- or wall-jumping, instead of choosing randomly from a few.
- The solution to Green Toad Houses is always the same. And coincidentally5 happens to match the order in which Toadsworth places the cards into the blocks.
- Pipe bubbles are emitted from a single point (middle animation). Normally they’re emitted from various points along the pipe’s edge.
- Some enemies with unpredictable attack patterns become predictable. Specifically, Skeeters, Hammer Bros and Sledge Bros – the last of whom never groundpounds at all anymore.
- Big Piranha Plants, when killed with fire, emit coins in an unnatural way (rightmost animation). Instead of spewing out coins in a variety of directions like a fountain, they all go in the same direction.
…Despite having over 600 distinct RNG calls in its code, NSMB doesn’t use randomness very heavily in ways directly affecting gameplay, does it. All things considered, that’s probably for the best.
I’d like to remind you that while the above footage and info was obtained by hacking the game, this is a very real bug that, with nonzero probability, can occur naturally. Which makes me wonder… can we estimate the number of players who’ve actually experienced it?
Estimating the number of affected players
A total of 5 seed values6 lead to rand_nsmb
’s fixed point, meaning that your odds of encountering it on any given game bootup are (5 / 0x100000000) ≈ 0.0000001164%.
As of March 2020, New Super Mario Bros. has sold 30.80 million copies7. As a rough estimate, let’s say that, on average, each person who bought the game launched it 10 times. In total, then, NSMB has been launched about… let’s round it to 300 million times.
The odds that nobody ever experienced this fixed point are then (1 - 0.0000001164%)300,000,000 ≈ 70%, meaning there’s a 30% chance that at least one person has encountered it.
A 30% chance that somebody was unlucky enough to get the fixed point is not all that high. But if you also count the 10 seeds that lead to the 2-cycle (which you should – cycling between two values is almost as horrible), then there’s a 65% chance at least one person experienced it. If you further include the 240 seeds that lead to a 4-cycle… it’s effectively guaranteed (99.999998%) that someone, somewhere, has gotten an exceptionally bad RNG seed.
A more professional approach
This kind of manual analysis is valid and fun, and I even noticed some more irregularities related to divisibility that way, but professionals don’t typically analyze random number generators like this. Experts use statistical tests that check for lots of different things. To get some more rigorous results, I ran rand_nsmb
through the Dieharder automated RNG test suite.
Now, linear congruential generators are designed to be fast, not to pass every statistical test, so it wouldn’t be at all fair to just look at what percentage of the test suite rand_nsmb
passes without comparing it to something. For that, I’m using the original ranqd1
function it was based upon. This lets us directly measure the impact Nintendo’s code change had.
All of the tests are based on a single initial seed (0), so the small-cycle problems I talked about earlier aren’t reflected in the results. 0 feeds into the largest cycle rand_nsmb
offers, so this choice is as generous as possible.
Running the tests on my laptop took a little under a day. The full results are available here. The chart on the left summarizes the numbers of tests passed. You can hover over it to see numbers.
To rand_nsmb
’s credit, it outperforms ranqd1
on 35 of the 114 tests (31%), so the function does seem to be an improvement over the original in some ways. But still, looking at the results as a whole… it seems pretty clear that it generally performs worse.
Final thoughts
Nobody playing New Super Mario Bros. is going to notice or care that its random number generator fails some obscure statistical tests, nor should they. rand_nsmb
obviously doesn’t need to be cryptographically strong, and over 99% of the time, it’s perfectly acceptable for its purpose. It mainly bothers me that the developers inexplicably changed the function in a way that triggers horrible behavior in certain cases, which are perfectly possible (though unlikely) to encounter during normal play.
If it ain’t broke (and if it was invented by people a lot more experienced than you), please just don’t fix it.
-
“Linear congruential generator” (“Parameters in common use”) – Wikipedia. Retrieved April 18, 2020. ↩
-
“Random number information - The nature of randomness” – RandomNumberGenerator.com. Retrieved April 18, 2020. ↩
-
If you’re thinking based on this that I must have a slow SSD… no, I moved the file to a different drive once it was fully generated. ↩
-
Here are a few other interesting inputs you can try in the interactive widget, if you’d like: 2576391288 (2-cycle), 49939938 (first 4-cycle), 459503 (first 8-cycle). ↩
-
If the fixed point had been some other number, this wouldn’t necessarily still be true. ↩
-
285742064, 1144735523 itself, 2003728982, 2862722441, and 3721715900. ↩
-
“Top Selling Title Sales Units” – Nintendo. Retrieved May 7, 2020. ↩