top of page

How to not suck at probability and statistics.


A wooden desk with old fashioned medical tools, bottles, and vials.

A simple way of looking at our brain is by dividing it into the conscious, subconscious and unconscious minds. The conscious mind is all about what we are actively thinking about in the here and now. We might be navigating as we drive through the countryside. We might decide to take an exit from the main road because our conscious mind has worked out that the map we are looking at is showing us that’s what we need to do to get to where we want to go.

Then there is the subconscious mind. It is like a well that we have filled up with memories and experiences that create urges, instincts and gut feelings that our conscious mind hasn’t had a chance to come up with. These urges, instincts and gut feelings are immediate and important survival mechanisms. When a spider crawls onto our leg, our immediate reaction is to (sometimes hysterically) brush that spider away, perhaps screaming and shrieking as we go. We don’t take the time for our conscious mind to identify a creature that could be a spider, remember that most spiders inject venom into prey, deduce that although we might not be prey a spider bite can be painful, and finally conclude that the spider needs to get off our leg as quickly as possible. Just the thought of ‘hairy spider legs’ is enough for an uncontrollable physical reaction.

And in the unconscious mind … we’re not really aware of much at all. So let’s not worry about the unconscious mind too much right now.


The subconscious (not conscious) mind is great at probability and statistics

Think of a professional baseball player. He or she practices throwing the baseball every day. This means fielding the ball, looking at (for example) the first baseman or woman, and throwing the ball (ideally) into his or her glove.

But of course, there is a lot of stuff that goes into making this throw. The first is estimating how far away the glove of the first baseman or woman is. This is primarily based on what is seen through the baseball player’s eyes. Then there is an estimate of the speed and angle at which the ball needs to be thrown. And finally, there are hundreds of estimates that need to be applied to the magnitudes and timing of the electrical signals our brain sends to the muscles in the players body and arm to make that throw.

All of these estimates are ‘random.’ And by random we mean a quantity that has some level of uncertainty. Our conscious mind estimates distances in a very theoretical way. If we were to pause baseball practice when our player is about to throw the ball, their conscious mind might be able to help them write down that their best guess at how far away the first baseman or woman is, is 25 metres. They might also be able to add to that best guess by writing down that they’re about 90 per cent confident that the first baseman or woman is between 23.5 and 26.5 metres away. And even then, their conscious mind is asking their subconscious mind for a lot of help when it comes to judging distances.

But our conscious mind has nothing to do with throwing a baseball. Years of practice and repetition help our subconscious mind gather experience regarding where the ball ultimately goes every time we try to throw a baseball. If someone who has never thrown anything before watches a YouTube video about how to throw a baseball, the conscious mind is the only part of the brain that is involved. They cannot simply walk out into the field after watching this video and immediately throw a baseball as well as the person they watched in that YouTube video. And that is because of all those estimates that need to be made about how our muscles have to be controlled and coordinated as part of throwing something.

Our subconscious mind is the only ‘mind’ that is good at probability and statistics. It works out what those uncertain estimates are from thousands of practice throws. Not from watching a YouTube video once.

Which is why we have to practice throwing a baseball and not study throwing a baseball.

So how does this help?


The problem with most courses or books that focus on trying to teach probability and statistics is that they try to appeal to our conscious mind … which sucks at probability and statistics. For example, everyone who has attempted to learn about probability and statistics has seen the curve below.



This curve is also known as the ‘bell curve,’ ‘normal distribution,’ ‘Gaussian distribution’ or just the ‘Gaussian.’ Let’s for simplicity refer to this curve as the ‘bell curve’ moving forward. The next thing that is usually ‘taught’ is this equation …



This equation defines the shape of that ‘bell curve’ above. But what are we ‘taught’ when we have this equation flashed before our eyes? Not much. We might also be ‘taught’ that the symbol ‘μ’ represents the ‘mean’ of the bell curve. We might be ‘taught’ that the symbol ‘σ’ represents this concept called the ‘standard deviation’ which represents how spread out the bell curve is.

In this first lesson, we are only using our conscious mind. Perhaps our conscious mind might be wondering why ‘π’ is in this equation. Or the number represented by ‘e,’ which is around 2.71838. Our conscious mind certainly won’t be deconstructing and looking for a story in the equation above. The only thing most people’s conscious minds will be concluding is that the equation above is complicated.

But what about our subconscious mind? We might ask our subconscious mind to help us memorize this equation. We might write this equation out over and over again. We might complete practice exams where we need to select the right equation. But all that means is that we are able to write those symbols and parameters in that equation onto a piece of paper. And because there is no meaning behind these memories, we very quickly forget them after we (hopefully) pass our final exam.

A doctor wearing a mask with an old fashioned book shelf behind him in an office. Bottles and books can be seen in the foreground spread out infront of him on a desk.


There is a problem with just memorizing an equation.

I have worked with plenty of engineers who are able to identify that the equation above describes a bell curve, but have precious little idea what any of it means. For example, reliability engineers who need to be able to understand the random nature of times to failure for products and systems are usually able to at least remember the bell curve. And these engineers might be asked to come up with a servicing or preventive maintenance interval where we replace or maintain the product or system whose time to failure is described by the bell curve above. The reason we want to do this is to prevent as many failures we can before they occur.

Let’s also say that the mean (μ) of that bell curve is 10 000 hours, which is also known as the Mean Time Between Failure (MTBF). So what do many reliability engineers come up with for the servicing or preventive maintenance interval for their product or system? They often come up with 10 000 hours, or the MTBF. Which is really, really bad. And the illustration below shows us why.

A large factory room with bay doors, various machinery and tools can be seen around the edges of the large room and many windows.

The MTBF, μ' or mean is actually the ‘balance point’ of the shape created by the bell curve. That is all the MTBF is. And this is a concept that is not usually taught in those probability and statistics lessons. This is a shame, because there is a story behind the MTBF. And stories are much easier for our subconscious to retain.

This story is important, because you can see that half of the area under the bell curve is to the left of the balance point or MTBF. So if we service or conduct preventive maintenance at the MTBF, we would expect half of our products or systems to have already failed. This is WAY too much. Would you be happy if your car broke down and needed to be towed for repair after every second servicing at your local dealer?

As a rule, we are trying to service or conduct preventive maintenance before perhaps at most 1 per cent of our products or systems fail. Not 50 per cent.

So why do reliability engineers get this so wrong, so many times?


Because they don’t engage their subconscious minds when it comes to probability and statistics.

Sure, they can use their subconscious minds to memorize where the ‘’ goes into the equation above, but they never engage their subconscious minds to understand what the mean actually represents. And this is usually down to the way they are taught. So when they are asked to come up with the servicing or preventive maintenance interval, there is nothing in their subconscious minds to help them out. And so the MTBF is often mistaken as a ‘failure free’ period.

This might sound incredible, but it happens all the time. If a product or system has an advertised MTBF of 10 000 hours, many people assume that we should expect the overwhelming majority of them to last 10 000 hours. Which has led to many disastrous decisions.


What about the bell curve itself?


After we are ‘taught’ about the bell curve, we are told that it models lots of different random processes (like product failure). And we do see the bell curve a lot. Not nearly as often as we might assume, but still a lot.

So why do we see it a lot? … but not all the time?

Let’s go back to our subconscious mind. And ice cream.

Let’s say that I eat ice cream every night (not a hypothetical scenario). And of course, the amount of ice cream I eat each night will vary. So this daily amount is a random variable. Now let’s say that I can purchase ice cream much more cheaply if I purchase one month’s supply at a time. Let’s say this equates to 30 days for simplicity.

So now I need to work out how much ice cream I need to buy every month. The amount of ice cream I eat each month (a random variable) is the sum of the amounts I eat each day (each amount will also be a random variable). A very smart Russian mathematician, Aleksandr Lyapunov, worked out that if you add lots of random variables together to get another, bigger random variable, that bigger random variable will be described by the bell curve.

So we see the bell curve for any random process in nature that is itself the sum of lots of other random processes. For example, the bell curve does a great job of modelling the random nature of human heights. This is because height is based on lots of other random processes adding up (genetics, diet, injuries, exercise, mental health and so on).



Let's use our subconscious!

Instead of using our subconscious to memorize the equation for the bell curve, we can use it to be on the look out for random processes that are based on other things adding up. The amount of tread a tire loses when it is being driven on your car adds up, so we see the bell curve here. The dimensions of manufactured components are also based on lots of different parts of the manufacturing process adding up (tool sharpness, material properties, placement by operator and so on). So we see the bell curve in lots of manufacturing processes as well.

But this also means we shouldn’t use the bell curve for things like repair times, radioactive decay, infant mortality, and other random processes that if we take the time to examine them (even just a little bit) we will quickly realize they are not based on things adding up. This means we need to look for a different model that doesn’t use the bell curve.

And so when it comes to learning probability and statistics, avoid the temptation to simply memorize the equations to help you pass an exam. This will at best have a temporary effect on your skills as an engineer, banker, manufacturer, risk analyst or anyone else that needs to deal with uncertainty and randomness.

Instead, use your subconscious mind to understand where (for example) the bell curve comes from, and what the mean represents. Look for meanings and stories. Try and understand how these theoretical curves can help us in the real world through real world experiences.

If you don’t … then there is a good chance that you are going to be one of those people who can be replaced by a robot with artificial intelligence. Because they don’t have subconsciouses either.

 

Comments


bottom of page