Thursday, March 09, 2006

Chapter 12 Inferences about proportions

BINOMIAL CONNECTION:
The methods of this chapter are based on the binomial distribution.

Let x be the number of successes in n trials.
If the conditions of a binomial setting hold,
then mu of x = np and
sigma sub x =sqrt(n*p*(1-p)).

Now, because p-hat, the estimator for the population proportion equals x/n,
mu sub p-hat = (mu sub x)/n Which means that p-hat is an unbiased estimator of p

and

sigma sub p-hat = (sigma sub x) / n.

Well, if you take that last part, (sigma sub x) / n, and substitute for sigma sub x,
you get sigma sub p-hat = sqrt(np(1-p)) / n

which can be rewritten as sqrt(p(1-p)/n).


WHICH P DO I USE?

If you have a hypothesized p, you use that. For instance, if your previous study or some expert indicated that p = .35, then you use .35 in your hypothesis, the standard deviation for your hypothesis test, and calculations to find the minimum sample size for a margin of error.

You also use this value when checking assumptions np>10 and n(1-p)>10.

If you have only your sample proportion, then you use p-hat to estimate the standard deviation for confidence intervals and for checking conditions for CI: n*p-hat> 10 and n*(1 - p-hat) > 10.

If you have neither, then you must be finding the minimum sample size, so use the most conservative estimate: .5.


2 proportion methods:

It helps A LOT to make a table of values as they showed in the book.

For confidence intervals, the methods just as you imagined. You are developing a confidence interval for the difference between two proportions,
so use p-hat1 - p-hat2.

For the standard deviation,look to the variances. Add the variances of the two samples and take the sqrt. Among the conditions, compare the products n1*p-hat1, n1*(1 - p-hat1), n2*p-hat2, and n2*(1 - p-hat2) to 5. Each product must exceed 5.

For hypothesis tests, there is a nifty twist. Your null hypothesis probably stated that the two proportions were the same. Therefore, their standard deviations should be combined. Take (x1+x2)/(n1+n2) to calculate a new, stronger p-hat which you use for standard deviation calculations and checking conditions.

The standard deviation would be sqrt( p-hat(1-p-hat)/n1 + p-hat(1-p-hat)/n2), but that requires that you enter p-hat too many times. Rewritten, that formula is sqrt( p-hat * (1-p-hat) * (1/n1+1/n2)). It looks nicer in the book. Go there to read all about it.

There are super examples in this chapter.

5 comments:

Mrs.L said...

To investigate the relationship between the binomial distribution and the normal distribution, a key to understanding this chapter, google binomial distribution applet or go to sites like http://www.ruf.rice.edu/~lane/stat_sim/normal_approx/index.html

Mrs.L said...

To test the hypothesis you have to assume that it is true, then see if the evidence contradicts it. Thus, for a TWO-PROPORTION TEST OF DIFFERENCES OF PROPORTIONS you are assuming that the two proportions are the same. You use the pooled p-hat value in your standard deviation and in your checking of conditions.

For all other tests and intervals you use something else.

Mrs.L said...

These questions are not asking for opinions or quick-and-dirty answers. They require actual statistical inference. You have to perform a hypothesis test or construct a confidence interval and interpret the results formally.

Mrs.L said...

The website for the GHSGT study guides is
http://public.doe.k12.ga.us/ci_testing.aspx?PageReq=CI_TESTING_GHSGT

Good luck!

Mrs.L said...

jb-sorry. That's why I gave you the printout in class. I hope that one of the resources helped.

n-Well, let's see. You are talking about finding the sample sizes for a CI for the difference between two proportions. Let's think out loud.

We can't use pooled data because we don't have any data yet (we would still be in the planning stages of our study). Besides, for a CONFIDENCE INTERVAL we would not have a hypothesis that the two proportions were equal. So I guess we don't pool the (nonexistant) results.

We can't use zero, because that would be silly (that's the hypothesized diff between two proportions, not one of the proportions) AND we would have m.e. equals z* sqrt of zero over n. Oops. That won't work.

Worst case scenario? We use .5, right? So I guess we could use sqrt (.5*.5*(1/n1+1/n2)), but then we are solving for two variables, n1 and n2, with just one equation. Theoretically you could make both samples the same size, so you would have

m.e. = z* sqrt(.5(.5)(2/n), where n is the sample size for EACH sample.

Seems reasonable, but a lot more math than I would expect at this point. Thanks for asking. You tickled my brain. TWYMFS.

OK, here's a tickler for you--let's say we are trying to find a confidence interval for the difference between the proportion of free throws made and the proportion of free throws missed. Would the methods of this chapter be appropriate? Why or why not?