Saturday, January 20, 2007

Chapter 9 - Sampling Distributions

How does the sample size affect our estimate and our decisions?

Parameters are the (usually unknown) measures of a population. Often they are represented by Greek letters like mu and sigma.

Statistics are the calculated measures generated from the samples. Statistics are estimators for parameters.

When the average of a statistic is the parameter itself, it is called an unbiased estimator. X-bar is an unbiased estimator for mu, the population mean.

Sampling distributions are the distributions of all of the averages of all of the samples of size n taken from a population.

When the sample size n increases, the variability of the means of the samples decreases--the graph of the sampling distribution is taller and narrower. When the sample size n decreases, the variability of the means of the samples increases.

This holds for sample proportions. The mean of the sample proportions (p-hats) is the true proportion for the population, p. Under special conditions we can use a formula for the standard deviation of the p-hats: SQRT(p*(1-p)/n).

The condition that allows this is that the sample is less than 1/10th of the population (and, of course, we're talking about simple random samples!!)

Also, the really BIG twist is that we can also use an approximation to the normal distribution when the expected numbers of successes and and failures are both 10 or more.

So, about that CLT thing. . . What was the REALLY BIG idea with the Central Limit Theorem???

How do you express the distributions for a binomial X, a geometric X, a uniform X, a normal X, the sampling distribution (X-bar), and the sample proportions (p-hat)?

When can you assume that the sampling distribution is approximately normally distributed?

What do you have to write to support your calculations of mean and standard deviation? your calculations of probabilities?

Monday, January 08, 2007

Chapter 8 - Binomial and Geometric distributions

Part 1 - Binomials

Binomial distributions have the following defining characterisitics:

(1)Only two mutually-exclusive and complementary events are possible on each trial--success or failure.

(2)The number of trials is fixed (n).

(3)The probability of a success on any trial is fixed at p. This DOES NOT mean that the probability of a success is always 50%.

(4)The trials are independent--knowing one outcome does not help you predict the next.

Always define what X represents, for instance, X = number of daughters (successes).

Shorthand identification for a binomial distribution: Binom(n, p).


The calculator will provide probabilities given n and p: binompdf(n,p[,x]) and binomcdf(n,p[,x]). Use pdf when you want probabilities for individual values of X and cdf when you want cumulative values, like the probability that the number of successes is less than or equal to 5. Insert the X value when you want just one value for a specific value of X. You may omit it when you want all the probabilities. Caution! For binomials, the least value X can take is ZERO, not one, so make sure that you associate the right X values with te correct probabilities.


The formula for P(X=k) = nCk p^k * (1-p)^(n-k).

nCk is "n choose k" or n!/(k!*(n-k)!).

If you calculate these probabilities for each possible value of x from 0 to n and add them up you will get a sum of 1.

The expected value or mean of the number of successes in a binomial setting is "mu sub x" = n*p.

The variance of the number of successes in the binomial setting is sigma squared sub x = n * p * (1-p).

The square root is (of course!) the square root of the variance.


What were those directions for loading binomial values into the lists and graphing as histograms? Use seq(X,X,0,n) --> L1 to populate the Xs and binompdf(n,p) --> L2 to insert the corresponding probabilities. To graph, select the histogram tool, use L1 as the xlist and L2 as the freq. You can use zoom 9 to generate a first stab at the graph. Then fix the graph using the window controls.


Part 2--Geometric Distribution

This was different from the binomial in that we are counting the number of trials UNTIL we achieve success, then we stop. This means that X is the number of trials it took and there is no "n" involved. Theoretically, it could take us infinitely many tries before we had a successful result.

Defining characteristics: fixed p, s/f, independent trials, count until success (not a fixed n).

The expected value of x, the number of trials required, is 1/p, where p is the probability of a success in one try. The variance is (1-p)/p^2.

The probability distribution for x = 1, 2, 3, 4, etc. is p, (1-p)p, (1-p)^2*p, (1-p)^3*p, etc.

What is the probability that it takes more than k attempts before you get a success?

Tuesday, January 02, 2007

State of Fear

The rhetorical questions:

What does State of Fear refer to?

What is true? How do you know? Who do you trust? What role does the statistician play in your understanding of news? What role do the media play?


The question to answer:

How can someone lie with statistics?