Joint Probability and Chain Rule

1. Joint Probability Function

Joint probability can be classified with discrete random variables and continuous random variables. In this post I want deal with just discrete random variables and its joint probability function.

Let $Y_1, Y_2 be discrete random variables. The joint probability distribution for [latex] Y_1, Y_2$ is given by

p(y_1, y_2)=P(Y_1=y_1, Y_2=y_2), \,\,\, -\infty< y_1 < \infty, -\infty< y_2 < \infty

The function $p(y_1, y_2)$ will be referred to as the joint probability function. [referred by Mathematical Statistics with Applications]

The below can be satisfied.

$1. p(y_1, y_2) \geq 0 \,\,\, for \,all y_1, y_2$
$2. \sum_{y_1, y_2} p(y_1, y_2) = 1$

Let's see an example. Let the total probability space be S = {middle school 3rd grade students in Seoul} and Y1 represents favorite idols while Y2 represent the place where they are live. Y1 has three elements {'BTS', 'EXO', OTHERS', Y2 has two elements {'NORTH', 'SOUTH'}. Then we can assign some joint probabilities for each outcomes (Y1, Y2).

You can answer the below questions.

$P(Y_1='BTS') =?$
$P(Y_2='SOUTH') =?$
$P(Y_1='BTS', Y_2='SOUTH') =?$

The answer of the last question is 3 / 64. Please examine that.

2. Chain Rule

To induce the result by using the chain rule, joint probability function can be thought as an consecutive trials.

P(x, y) = P(x \, and \, y)

we can think about that y followed by x or vise versa.

I would not introduce the concept of Joint Probability Density Function here. Rather than that I want to explain the chain rules by using joint probability considered as 'consecutive events'.

We can describe the consecutive events like below.

P(x \, and \, y) = P(x) \cdot P(y|x)

i.e. $probability \, of \, x \rightarrow y \, : \, P(x) \cdot P(y|x)$

How about in the case of $x \rightarrow y \rightarrow z$ ?

It can be decomposed into two steps.
1. $x \rightarrow y, z$
2. $x, y \rightarrow z$

At the first step,

probability \, of \, x \rightarrow y,z \, : \, P(x) \cdot P(y,z|x)

At the second step, we derive the equation carefully given x,

probability \, of \, x,y \rightarrow z \, : \, P(x) \cdot P(z|x,y) \cdot P(y|x)

Like above, we can derive the joint probabilities of n events : $x_1, \, x_2, \cdots , x_n$

P(x_1,x_2, \cdots , x_n) = P(x_n|x_{n-1}, \cdots, x_1) \cdots P(x_2|x_1) \cdot P(x_1)

Leave a Comment Cancel reply