Definitions

# Conditional probability distribution

Given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X (written "Y | X") is the probability distribution of Y when X is known to be a particular value.

For discrete random variables, the conditional probability mass function can be written as P(Y = y | X = x). From the definition of conditional probability, this is

$P\left(Y=y|X=x\right) = frac\left\{P\left(X=x mathrm\left\{and\right\} Y=y\right)\right\}\left\{P\left(X=x\right)\right\}= frac\left\{P\left(X=x|Y=y\right) P\left(Y=y\right)\right\}\left\{P\left(X=x\right)\right\}.$

Similarly for continuous random variables, the conditional probability density function can be written as pY|X(y | x) and this is

$p_\left\{Y|X\right\}\left(y|x\right) = frac\left\{p_\left\{X,Y\right\}\left(x,y\right)\right\}\left\{p_X\left(x\right)\right\}= frac\left\{p_\left\{X|Y\right\}\left(x|y\right)p_Y\left(y\right)\right\}\left\{p_X\left(x\right)\right\}$

where pX,Y(x, y) gives the joint distribution of X and Y, while pX(x) gives the marginal distribution for X.

The concept of the conditional distribution of a continuous random variable is not as intuitive as it might seem: Borel's paradox shows that conditional probability density functions need not be invariant under coordinate transformations.

If for discrete random variables P(Y = y | X = x) = P(Y = y) for all x and y, or for continuous random variables pY|X(y | x) = pY(y) for all x and y, then Y is said to be independent of X (and this implies that X is also independent of Y).

Seen as a function of y for given x, P(Y = y | X = x) is a probability and so the sum over all y (or integral if it is a density) is 1. Seen as a function of x for given y, it is a likelihood function, so that the sum over all x need not be 1.