How Beer Helped Shaping Data Science
Guinness is one of the most successful alcohol brands worldwide. Sales in 2011 amounted to 850,000,000 liters. But that wouldn’t have been possible if a statistician named William Sealy Gosset had not developed a smart solution 120 years ago.
In October 1886, Guinness became a public company and investors poured money into it. Guinness used this money wisely. They wanted to reform the way they make beer. However, they didn't only invest in better equipment or ingredients but also turned to science and hired statistician William Sealy Gosset as Head Experimental Brewer in 1899. The goal was to achieve better beers with quality controls and improved processes.
The scientist had two big issues:
1. Their sample size was really small.
2. The variance in the observations was big.
Testing every new batch of beer where you only tweaked a little bit in the ingredients was not cost-effective. They needed a reliable way to test which differences mattered or were significant. A significance test.
Gosset developed a test that addressed the issue of making inferences from small sample sizes. He published his findings in 1908 under the pseudonym "Student" because Guinness had a policy against employees publishing work under their names. That's how the Student's t-test was born.
Imagine you have ten fields side by side and you plant one of two barley (an ingredient for beer) varieties in alternating fields. After harvesting, you measure the yield from each field. Some fields will have better yields than others. But is that difference real - caused by the barley variety - or is it just natural variation?
Gosset used the following calculation to answer the question: The difference in means divided by the standard deviation, adjusted for the sample size. This gave him a number, which he could then compare to a probability table to see how likely it was that the differences in yield had happened purely by chance. After the comparison, he could conclude if one barley was better or not.
The T-test started as a practical tool for a brewery, but it became one of the most widely used statistical methods in science.
I read this story in the book How Data Happened by Chris Wiggins and Matthew L. Jones. Awesome book! It is interesting to read about how the tools and methods we are working with every day evolved during history. Years and years of research on beer gave us statistical cornerstones.
From this story, I also learned that we need to be open-minded. How can we transfer a new practice that has proven successful in one industry to another? Gosset was searching for better beer, but revolutionized statistics and hence all the other industries using T-test. Data Science is packed with similar stories.
Neural Networks are inspired by neuroscience
In the late 1940s, Stanisław Ulam invented the modern version of the Monte Carlo method while he was working on nuclear weapons projects. Today it is one of the most used methods in finance.
Reinforcement learning was inspired by behavioral psychology. We teach Self-driving cars with reward-based learning.
By looking beyond our industries, we can find ideas that reshape the way we solve problems.
The next big breakthrough might already exist. We just need to borrow and apply it in a whole new way.