Joint Probability and Chain Rule

1. Joint Probability Function

Joint probability can be classified with discrete random variables and continuous random variables. In this post I want deal with just discrete random variables and its joint probability function.

Let Y_1, Y_2 be discrete random variables. The joint probability distribution for [latex] Y_1, Y_2 is given by

p(y_1, y_2)=P(Y_1=y_1, Y_2=y_2), \,\,\, -\infty< y_1 < \infty, -\infty< y_2 < \infty


The function p(y_1, y_2)   will be referred to as the joint probability function. [referred by Mathematical Statistics with Applications]


The below can be satisfied.

  • 1. p(y_1, y_2) \geq 0 \,\,\, for \,all y_1, y_2
  • 2. \sum_{y_1, y_2} p(y_1, y_2) = 1


Let's see an example. Let the total probability space be S = {middle school 3rd grade students in Seoul} and Y1 represents favorite idols while Y2 represent the place where they are live. Y1 has three elements {'BTS', 'EXO', OTHERS', Y2 has two elements {'NORTH', 'SOUTH'}. Then we can assign some joint probabilities for each outcomes (Y1, Y2).

You can answer the below questions.

  • P(Y_1='BTS') =?
  • P(Y_2='SOUTH') =?
  • P(Y_1='BTS', Y_2='SOUTH') =?

The answer of the last question is  3 / 64. Please examine that.


2. Chain Rule

To induce the result by using the chain rule, joint probability function can be thought as an consecutive trials.

P(x, y) = P(x \, and \, y)

we can think about that y followed by x or vise versa.

I would not introduce the concept of Joint Probability Density Function here. Rather than that I want to explain the chain rules by using joint probability considered as 'consecutive events'.

We can describe the consecutive events like below.

P(x \, and \, y) = P(x) \cdot P(y|x)

i.e. probability \, of \, x \rightarrow y \, : \, P(x) \cdot P(y|x)

How about in the case of  x \rightarrow y \rightarrow z ?

It can be decomposed into two steps.
1. x \rightarrow y, z
2. x, y \rightarrow z

At the first step,

probability \, of \, x \rightarrow y,z \, : \, P(x) \cdot P(y,z|x)

At the second step, we derive the equation carefully given x,

probability \, of \, x,y \rightarrow z \, : \, P(x) \cdot P(z|x,y) \cdot P(y|x)

Like above, we can derive the joint probabilities of n events :  x_1, \, x_2, \cdots , x_n

P(x_1,x_2, \cdots , x_n) = P(x_n|x_{n-1}, \cdots, x_1) \cdots P(x_2|x_1) \cdot P(x_1)


Leave a Comment