Use Chaos to Create True Random Numbers
In ML, we usually see code like the below. seed = 42
is used to set the seeding of the random number generator. But if it is random, why do we have this fixed number in it?
There are different types of number generators, the two main categories are:
Pseudo-Random Number Generators (PRNGs)
True Random Number Generators (TRNGs)
Let’s discuss each. But first, the basics:
Random numbers
One single random number is a value that we chose from the available options. All options have the same probability to be chosen, meaning, the distribution is uniform.
A sequence of random numbers means that we repeat the process above and the choices are not impacting future choices, so the drawing is independent.
Randomness is important in many fields and also used for various purposes in Data Science:
Data sampling
Data splitting
Simulations
Randomness has great advantages, but it is hard to make a computer act randomly. They follow the instructions no matter what. And that’s good. The same input should result in the expected output and not a random mess. Imagine that you open up the browser and type google.com, and it would open up a random website for you.
We need some methods to make them act more randomly.
True Random Number Generators
The real world is fortunately highly unpredictable, so we can use it as an input for computers. TRNGs usually use physical events to create the randomness. The events can vary from simple, like how you move your mouse to more advanced like nuclear radioavtive events involving quantum mechanics.
The most fun way to create randomness is to create chaos.
Cloudflare uses the coolest TRNGs. They have many lavalamps installed in the HQ office. One lavalamp alone is super random itself, as the liquid in it can take up infinite forms, but they have many of them. Moreover, since they are installed on one of the companies main corridors, people passing the lamps can add to the chaos.
A camera watches the lobby 24/7 and take photos. Since computers store photos as series of numbers, every photo is a truely random number.
In the London office Cloudflare uses pendulums. The unpredictable swings coming from these things mixed with the incoming lights and shadows make the installation super random.
Here is a great illustration of how random a single pendulum can be.
Another office from Cludflare includes the hanging rainbow mobiles that react to changes in its environment like door openings, and ambient light. As you can see, the bigger the chaos is the better for randomness.
And finally, here is a great illustration showing how these sources turn into a random number that is impossible to predict.
Pseudo-Random Number Generators
The number generators we work with in our applications like Excel, Python are PRNGs. They cannot use chaos like the above. PRNGs use mathematical algorithms to create sequences that looks random. But the numbers are not truly random, they follow a predetermined sequence.
When I wrote “looks random” I meant if you see the numbers written out in a sequence. But when we visualize them, the story is different.
Bo Allen created the above visual to show that how a bad combination of language (PHP), operating system (Windows), and function (rand()) can cause some obvious patterns. (With the same code, different operating systems get different results)
We can see that random generators can be very bad, but the best are really good. The quality varies a lot.
Generally, PRNGs are an efficient way to create randomness since these algorithms are fast. Also, the deterministic nature can be advantageous.
We see the seed parameter in many codes, because the seed provides reproducibility. If you set it to be the same, the sequence created will be the same. Meaning, we can recreate our randomness for every run and test other parts of the code without changes in data.
Without reproducibility, testing is more challenging. If each execution produced entirely different random values, it would be difficult to determine whether changes in the output were caused by modifications in the code or simply by a new set of random inputs.
These algorithms are good for modeling and simulations, but not the best solution in cases where numbers must be truly unpredictable, like in gambling or data encryption.
PHPs documentation also highlights this. And it explains why rand() is more restiricted in Windows.
Conclusion
Going down this rabbit hole I realised how important randomness can be and how fun or creative solutions run behind super secure systems. As said, computers are not random at all without some random inputs. But what about humans?
Check this out:
Sources:
How do lava lamps help with Internet encryption? | Cloudflare
https://blog.cloudflare.com/harnessing-office-chaos/
https://www.random.org/randomness/