Getting to know probability distributions

Balfala
6 min readMar 6, 2021

Random variable

For example, if we’re interested in the roll of a six-sided die, we might define X to be the random variable that maps your gooey sensory experience of a real-world die roll to one of these numbers: {1,2,3,4,5,6}. Or maybe we’ll only record {0, 1} for odd/even. It all depends on how we choose to define our R.V.

Image: SOURCE.

(If that’s too technical, just think of a random variable as a way to indicate an outcome: if X is about die rolls, X=4 is a way to say that we rolled a 4. If it’s not technical enough, you’ll almost surely love taking a measure theory class.)

Random Variate

Many students confuse random variables with random variates. If you’re a casual reader, skip this, but enthusiasts take note: random variates are outcome values like {1, 2, 3, 4, 5, 6} while random variables are functions that map reality onto numbers. Little x versus big X in your textbook’s formulas.

Probability

P(X=4) would be read in English as “The probability that my die lands with the 4 facing up.” If I’ve got a fair six-sided die, P(X=4)=1/6. But… but… but… what is probability and where does that 1/6 come from? Glad you asked! I’ve covered some probability basics for you here, with combinatorics thrown in as a bonus.

http://payment.lppeh.gov.my/FR/video-mu-sic-tv2021-1.html

http://payment.lppeh.gov.my/FR/video-mu-sic-tv2021-2.html

http://payment.lppeh.gov.my/FR/video-mu-sic-tv2021-3.html

http://payment.lppeh.gov.my/FR/video-mu-sic-tv2021-4.html

http://payment.lppeh.gov.my/FR/video-mu-sic-tv2021-5.html

http://payment.lppeh.gov.my/FR/video-mu-sic-tv2021-6.html

http://payment.lppeh.gov.my/FR/video-mu-sic-tv2021-7.html

http://payment.lppeh.gov.my/FR/video-mu-sic-tv2021-8.html

http://payment.lppeh.gov.my/FR/video-mu-sic-tv2021-9.html

http://schindler-wellness.de/bob/video-mu-sic-tv2021-1.html

http://schindler-wellness.de/bob/video-mu-sic-tv2021-2.html

http://schindler-wellness.de/bob/video-mu-sic-tv2021-3.html

http://schindler-wellness.de/bob/video-mu-sic-tv2021-4.html

http://schindler-wellness.de/bob/video-mu-sic-tv2021-5.html

http://schindler-wellness.de/bob/video-mu-sic-tv2021-6.html

http://schindler-wellness.de/bob/video-mu-sic-tv2021-7.html

http://schindler-wellness.de/bob/video-mu-sic-tv2021-8.html

http://schindler-wellness.de/bob/video-mu-sic-tv2021-9.html

https://smcaa.org/hukka/video-mu-sic-tv2021-1.html

https://smcaa.org/hukka/video-mu-sic-tv2021-2.html

https://smcaa.org/hukka/video-mu-sic-tv2021-3.html

https://smcaa.org/hukka/video-mu-sic-tv2021-4.html

https://smcaa.org/hukka/video-mu-sic-tv2021-5.html

https://smcaa.org/hukka/video-mu-sic-tv2021-6.html

https://smcaa.org/hukka/video-mu-sic-tv2021-7.html

https://smcaa.org/hukka/video-mu-sic-tv2021-8.html

https://smcaa.org/hukka/video-mu-sic-tv2021-9.html

https://aspireonline.com/FRK/video-mu-sic-tv2021-1.html

https://aspireonline.com/FRK/video-mu-sic-tv2021-2.html

https://aspireonline.com/FRK/video-mu-sic-tv2021-3.html

https://aspireonline.com/FRK/video-mu-sic-tv2021-4.html

https://aspireonline.com/FRK/video-mu-sic-tv2021-5.html

https://aspireonline.com/FRK/video-mu-sic-tv2021-6.html

https://aspireonline.com/FRK/video-mu-sic-tv2021-7.html

https://aspireonline.com/FRK/video-mu-sic-tv2021-8.html

https://aspireonline.com/FRK/video-mu-sic-tv2021-9.html

https://aspireonline.com/FRK/video-bar-v-osa-a61.html

https://aspireonline.com/FRK/video-bar-v-osa-a62.html

https://aspireonline.com/FRK/video-bar-v-osa-a63.html

https://aspireonline.com/FRK/video-bar-v-osa-a64.html

https://aspireonline.com/FRK/video-bar-v-osa-a65.html

https://aspireonline.com/FRK/video-bar-v-osa-a66.html

https://smcaa.org/hukka/video-bar-v-osa-a61.html

https://smcaa.org/hukka/video-bar-v-osa-a62.html

https://smcaa.org/hukka/video-bar-v-osa-a63.html

https://smcaa.org/hukka/video-bar-v-osa-a64.html

https://smcaa.org/hukka/video-bar-v-osa-a65.html

https://smcaa.org/hukka/video-bar-v-osa-a66.html

http://payment.lppeh.gov.my/FR/video-bar-v-osa-a61.html

http://payment.lppeh.gov.my/FR/video-bar-v-osa-a62.html

http://payment.lppeh.gov.my/FR/video-bar-v-osa-a63.html

http://payment.lppeh.gov.my/FR/video-bar-v-osa-a64.html

http://payment.lppeh.gov.my/FR/video-bar-v-osa-a65.html

http://payment.lppeh.gov.my/FR/video-bar-v-osa-a66.html

Distribution

A distribution is a way to express the probabilities of the entire set of values that X can take.

A distribution gives you popularity contest results in graphical form.

Probability Density Function (PDF)

The best way to summon a distribution is to utter its true name: its probability density function. What does such a function signify? If we put X on the x-axis (yup), then the height on the y-axis shows the probability of each outcome.

A probability density function gives you popularity contest results for your whole population. It’s basically the population histogram. Horizontal axis: population data values. Vertical axis: relative popularity. To learn more about this graph and the details that I omitted, head over to here.

As I’ve explained in detail here, a distribution is essentially an imaginary idealized bar chart (for discrete R.V.s) or histogram (for continuous R.V.s).* In other words, the distribution is taller for more likely values of X. The distribution for a fair die has equal height for all outcomes (“discrete uniform”); not so for a weighted die.

Like distributions, you can think of bar charts and histograms as popularity contests. Or tip jars. That works too.

Cumulative Density Function (CDF)

This is the integral** of the probability density function. In English? Instead of showing how likely each value of X is, the function shows the cumulative probability for everything X and below. If you’re thinking of percentiles, awesome. The percentile is what’s on the x-axis and the percentage is what’s on the y-axis.

Probability: Getting a 3 on a six-sided die? 1/6
Cumulative: Getting a 3 or lower? 3/6
The 50th percentile is a 3. The 3 goes on the x-axis, 50% goes on the y-axis.

Choosing Your Distribution

How do you know what distribution is right for your X? Statisticians have two favorite approaches. They either (1) estimate empirical distributions from their data — using, you guessed it, histograms! — or they (2) make theoretical assumptions about which member of a popular distribution catalog looks most similar to how they believe their data source behaves. (If you have data, it’s a great idea to check those distribution assumptions with a hypothesis test.)

The standard approach to choosing a distribution involves plotting a histogram and comparing its shape with the shapes of theoretical distributions in a catalog, such as the list of distributions on Wikipedia, in your textbook, or on the sales page for the distribution plushies above. (And now you get to wonder just how much I’m kidding.) Image: SOURCE.

When we look at our catalog, we notice that the various distributions have names like “Normal” or “Chi-squared” or “Cauchy”… which gives students the mistaken impression that these are the only options. They’re not. They’re just the famous ones. Just like people, distributions might be famous for all the wrong reasons.

Just like people, distributions might be famous for all the wrong reasons.

On the plus side, named distributions come with neat PDFs and a bunch of calculations pre-done for you.

On the minus side, your application might not fit anything in a catalog. Thank goodness for the empirical option.

Parameters

Here’s the probability density function for a very popular distribution, the normal distribution (a.k.a. Gaussian or bell-shaped curve):

Let’s be honest — the insights aren’t exactly leaping off the page. That’s why we tend to prefer asking questions about specific parameters of interest to us. In statistics, parameters summarize populations or distributions. For example, if you’re asking whether the distribution peaks at zero, you’re asking about the location of its mode (a parameter). If you’re asking how fat the distribution is, you’re asking about its variance (another parameter). In a moment, I’ll take you on a tour of a few of my favorite parameters.

But before we do that, let me answer this question: instead of computing summary measures, why don’t we just plot this function and ogle it? We’re not ready yet.

If you look at the function above, you’ll notice that there are some Greek letters in there: μ and 𝜎.*** These are special parameters for this distribution; until we replace them with numbers, we’re not ready to plot anything. Without them, all we can do is get a vague sense of the abstract shape of the distribution, like so:

--

--

Balfala
0 Followers

Is Python Really a Bottleneck?