Monday, February 04, 2008

Chapter 9 Sampling Distributions

Welcome to the beginning of inferential statistics! Your test is Thursday February 7.

Elaborate response to a question about mu and x-bar is in the comments section below. Don't miss it.


Work all the problems on the latest problem worksheet for homework Tuesday.

Mean of sampling distribution = mean of population regardless of shape of distribution

Std dev of sampling distribution = std dev of population divided by sqrt (n) regardless of shape of distribution as long as sample does not exceed 1/10th population.


Central Limit Theorem

Case 0: Underlying distribution is normal à sampling distribution is AUTOMATICALLY normal. The Central Limit Theorem DOES NOT APPLY. It isn't needed.

Case 1: If the sample is tiny (less than or equal to 10), then the population distribution must be nearly normal for the Central Limit Theorem to kick in.

Case 2: If the sample is moderate (up to about 35), then the population needs to be mound shaped without outliers for the Central Limit Theorem to kick in.

Case 3: If the sample is large, then the Central Limit Theorem kicks in.

****Write the implications of the Central Limit Theorem in your own words.

Thus, the answer to the questions at the end of problems .27 and .28 from Friday are that we DID NOT need the underlying distribution to be normal for us to use the formulas for the mean and the std dev of the sampling distribution (mu x = mu x-bar and std dev x-bar = std dev x / sqrt n, as shown in the text). Knowing that the distribution of x is Normal or knowing that the CLT applies is essential to working the sampling distribution problems using Normal methods.

HW: Problems 17 through 19 on the w/s from last week AND problems 8.34-8.36 on the w/s we handed out in class today.

Here's an applet that will help to show what happens to the distribution of the sample means as the sample size increases and as the distribution of the population is more or less normal. Set the radio buttons to show only the sample means and sample 100 at a time.
http://wise.cgu.edu/sdmmod/sdm_applet.asp

As you saw in the histograms today, the larger the sample size, the narrower the distribution of sample means. In fact, the standard deviation of the distribution of sample means shrinks proportionally to 1/( square root of the sample size)[the std dev of the sampling means = sigma of x times 1/sqrt n]. You'll see this in your READING of section 9.3 and practice it in your HOMEWORK for the weekend: 9.26 through 9.29.

For those who need to refresh their understanding of histograms, do problems 1.4 and 1.41 by hand.

Don't forget about CiCi's on Sunday if you are available.

What is the probability. . . ?http://www.time.com/time/health/article/0,8599,1707541,00.html?cnn=yes

Due Friday: Work problems 12-16 on the handout. Be an expert on these proportion problems and normal approximation for the binomial before you come to class. Friday we start sampling distributions for sample means.

Due Thursday: Problems 8-11 on the handout.

Due Wednesday: Problems 9.17 and 9.18, worked and explained completely, OR problems 9.19 and 9.20, which are more routine but must be completed.

What happens when the sample size increases??? If your variable of interest is the sample proportion, the distribution of sample proportions (p-hats) will become "tighter," that is that it will have less variability as the sample size increases.

Now, if you're talking about the values of X in a binomial distribution, as the number of trials increases, the variability of the number of successes also increases. Arrrrrggggh! Sometimes the standard deviation increases, sometimes it decreases.

One of the important ideas that you were supposed to catch was that the population size does not affect the variability of the sampling distribution. You "see" this when you look at the formulas--there is no mention of the population size in the formula for the standard deviation.

So, what is the point? For those binomial distributions where the expected number of successes and the expected number of failures are both at least ten, the sampling distribution of the Xs or the p-hats may be modeled (approximately) by the normal curve. Thus, you can calculate z-values for the values of interest, X or p-hat, and use the standard normal table or normalcdf to calculate probabilities. Cool.

This is part of the foundation for the polling estimates that you see so often during these election years.

Here's a new section for the blog called Because You Asked. One of the gurus of AP Statistics wrote an article that explains the 10% rule. You can find it through College Board at the following link.
http://apcentral.collegeboard.com/apc/members/courses/teachers_corner/39161.html

Due Tuesday: Problems 9.14 and 9.15 PLUS you must read the pages between these problems. Get your journal ready for the first entry.

Due Monday: All of the work previously assigned and a complete, correct draft of the answers to the "depth of the refracting layer" problem we worked in class on Thursday and Friday.

HW due Friday: Read through page 467 CAREFULLY. Work problems 9.1-9.4, 9.6, 9.8, 9.9, and 9.12. Do problem 9.7 if you get the chance (otherwise, it will be due later).

Parameters are measures of populations.
Statistics are measures of samples.

Examples of parameters: the mean of a population (mu, AKA mu sub x), the population proportion (p), the population standard deviation (sigma).

Examples of statistics: the sample mean (x-bar), the sample proportion (p-hat), and the sample standard deviation (s sub x).

The mean of all the sample means of a distribution (the sampling distribution) is the same as the mean of the distribution. This means that mu sub x-bar equals mu sub x.

The variability of the sample means (the variance of the sampling distribution)decreases as the sample size increases.

As long as the population is REALLY large compared to the sample, the size of the population does not affect the variability of the sample means.

HW for Thursday: Complete the blue sheet AND do problems 9.1-9.4 and 9.8.

Questions to ponder: What does the histogram of your penny-ages look like?

Does the histogram from a small sample have the same shape as a large sample's histogram?

What do you think that the average age of the pennies is?

How far out are outliers?

What are the important characteristics that you need to include when describing a probability histogram (either frequency diagram or relative frequency diagram)?

13 comments:

Mrs.L said...

All the work tht was already assigned, including a complete, legible draft of the classwork (the answers to the exam problem about depth of the refracting layer). Answers must be correct and complete.

Mrs.L said...

Pam C,
You made my day! Please come back and visit my classes sometime.

derek said...

the wording in problem number 11 is a bit confusing to me. i would greatly appreciate it if you could help point me in the direction of what it is asking for...

Mrs.L said...

Calling all stat studs! Can anyone else help with this problem? I don't have a copy of the problem at home, so the best I can do is answer questions before school.

MrFantasian said...

How would I know whether to use x bar or mu in certain problems? Is it within the context of the problem and if they are, what are tehy?

MrFantasian said...

And I still don't get why the Central Limit Theorem is significant. I know it helps prove that it is normally distributed. Does that help prove the accuracy of my answer? And in Case 1 and 2 on the CLT sheet, what does it mean by "must be" and "needs to be"?

Mrs.L said...

You will be using BOTH mu and x-bar in some problems, just like you use BOTH p and p-hat in some problems. Mu/x-bar problems concern measures, while p/p-hat problems concern proportions.

Do you recall how we drew boxes and ovals around the values of p and p-hat on that one worksheet? Well, identifying the mu and the x-bar is analogous. Mu is the value that describes the entire population. X-bar only describes the sample.

Sometimes x-bar is an actual measure, but most of our examples from this chapter just ask hypothetical questions like what would the probability be when. . . Whereas the mean, mu, is the center of our distribution, these values of x-bar become the lower or upper bounds of the region we evaluate for area (probability). This works just like the way p-hat represented the upper or lower bound of the shaded region in the graph! Convert the x-bar to a z-value as we did in class, and find the associated probability.

Are you a master of the probability statements??

P(statistic < number) =
P( (stat - param)/std dev of stat <
(number - param)/std dev of stat)


For instance:
Let x be normally-distributed with a mean of 2 and a std dev of 21. What is the probability that a sample of size 49 would have an average greater than 5?

[Note that sigma of x-bar is 21/sqrt 49 = 21/7 = 3.]

P(x-bar > 5) = P((x-bar - mu)/sigma of x-bar > (5 - 2)/3)
= P(z > 1) = .1587

Readers: Please put parts of this message into your own words and respond to this blog so we can make sure that we understand. Thanks!

Mrs.L said...

Re: the CLT

How do you intend to find the probability that x-bar is greater than 5 if you can't use the normal distribution?

Most distributions are NOT normal. It's pretty special that the means (under some conditions) are normally distributed.

Mrs.L said...

Here's a site that discusses the CLT and calls it amazing. I have to agree.

[Gaussian means the same as Normal.]

http://cs.wellesley.edu/~cs199/lectures/23-std-err-CLT.html

Mrs.L said...

Further answers to unposted questions:

x-bar is the MEAN of the SAMPLE
mu is the MEAN of the POPULATION

When the sample size is small you CAN'T use normal procedures UNLESS (Case 0)the underlying distribution of x is normal(CLT not required :D) or (Case 1)the distribution is reasonably close to normal (CLT kicks in).

derek said...

very helpful site thanks =]my only comment is that it uses variance more than we have and that kind of got in the way but you just have to ignore it.

Mrs.L said...

Regarding MUST BE and NEEDS TO BE:

The Central Limit Theorem only applies under special conditions.

The closer your original distribution was to normal to begin with, the lower the sample size will be for it to apply.

For instance, if the distribution of x is distinctly non-normal (envision the wildest distribution that you can), then the sample size must be really big for you to assume that the sample means are normally-distributed. On the other hand, if the x values are distributed almost normally, then even little samples will have averages that are approximately normal.

When I wrote the summaries up, I wrote them from the other perspective (small size NEEDS near-normality, for example) because you are given a sample size and a description of the x distribution, so you have to determine whether the sample is large enough to indicate that the sample means are normally-distributed given the characteristics of the underlying distribution.

derek said...

I am working 9.23(dealing with probabilities) and I can't remember how to find what it is asking for. It is a great problem to work though and i would recommend it to anyone who is still having trouble with p and p-hat. The page before has formulas like the square root of p times 1-p over n and I think I am supposed to use that but I don't know exactly how.