probability - pf, pdf, cdf?
could someone explain the concepts and differences between probability function, probability density function and cumulative density function in depth to me?
- ?Lv 59 years agoFavorite Answer
First point, probability functions (PF) are used with discrete variables while probability density functions (PDF) and cumulative density functions (CDF) are for continuous variables.
Probability functions are used with discrete variables. Plotted, they show a range of discrete values on the x axis against the probability of getting a specific discrete value on the y axis. For a dice (uniform distribution), the probability function shows 1 to 6 on the x axis with 6 points (bars) each of which goes up to 1/6 on the y axis. Effectively it is a historgram showing exact probabilities rather than observed or expected counts.
For continuous variables, it doesn't make sense to talk about the probability of a specific value as there are issues of rounding etc to take into consideration (e.g. like asking what's the probability my height is 6 foot - not roughly six found - exactly six foot and neither an atom above or below). As a result, with continuous variables we have a continuous curve rather than bars and are more interested in the probability of obtaining a value equal to or below a certain other value. The traditional Normal distribution bell curve is a PDF. Like the probability function, the values of a PDF are plotted on the x axis but instead of bars you have a curve where the y value represents the probability density but don't worry about what these density values actually are in terms of the numbers as the important thing is the shape of the curve. To get the probability up to a certain value on the x axis you need to accumulate or integrate the area under the curve from the extreme left up to that value. Using the Normal bell curve as an example, the mean is the mid point on the x axis which has the highest y value and which exactly cuts the distribution into halves that are symmetrical about the y axis at that point. The left half of that distribution up to this point represents the probability of getting the mean or below and the area under the curve to this point is exactly 0.5.
A cumulative density function is a plot of the values on the x axis, like the PDF, but shows the culumative (i.e. integrated) probability so it is a plot of the increasing area under the curve of the PDF (mathmatically, the CDF is the integral of the PDF). As the PDF of the normal distribution starts at zero, climbs to the mean then drops back to zero, the CDF adds all these up as you go so it starts at 0, curves up ever steeper to 0.5 at the mean and then continues climbing but flattening out to eventually flatten out completely at a probability of 1.0.
The two charts on this page http://en.wikipedia.org/wiki/Cumulative_distributi... show colour coded CDF and matching PDFs of a range of Normal distributions.
Hope this helps.Source(s): Ex stats lecturer P.S. Ignore the answer starting "pf is an ivalid term, i've never heard of it" as I suspect this person is just copying out of somethig like a calculator manual and it is clear that they have not studied this aspect of statistics.