Tuesday, March 30, 2010

Chi-square procedures

http://lassiterstatistics.wikispaces.com/

Send your summary documents (in pdf format if possible) to jhl2881 at
students dot kennesaw dot edu


Suggested problems from Chapter 14:
goodness of fit: 3, 4, 5 (this is an example of how biologists use 2x2 tables to do goodness of fit tests), 9 (simulation)
2-way tables: 13, 19, 20, 14, 16, 18, 12, 17, 22, 24, 32, 33, 34.



We're beginning our last new topic (since we already did linear regression inference once).

We will use two different types of chi-square procedures and three different names for the procedures.

First, if some higher power determined what the proportions of the sample should have been associated with different values of the categorical variable. . .
like what portion of your M&Ms should have been red, brown, green, etc., then you will use a Goodness of Fit test to compare your experience (the sample) with what the higher powers suggested. This is also the test we use when the higher power might suggest that the distribution should have been "fair" or equal across all the values.


If the sample itself is going to suggest a distribution, then we use the test on independence or the test of homogeneity. These two tests are performed the same way, we just have different inputs and hypotheses associated with the two forms.

When we have one sample from one population and want to know if characteristics are associated, like red hair and green eyes, we might use the test of indepenence.

If we have two populations, like smokers and non-smokers, and want to know if the two populations had the same propensity for speeding tickets, we could use a test of homogeneity with cleverly-selected data.

Methods will be discussed in class.

Have the printed draft of your assignment in class on Wednesday!

Friday, March 26, 2010

T-tests for means

http://www.nytimes.com/2010/02/28/weekinreview/28sussman.html?ref=weekinreview
.
Time magazine article on the complications of the race and ethnicity entries on the census: http://www.time.com/time/nation/article/0,8599,1975883,00.html?hpt=T2
What inference procedure do I perform? applet (http://www.ltcconline.net/greenl/java/Statistics/StatsMatch/StatsMatch.htm?)

Welcome to the new stuff. Same as the old stuff. . .almost.

EQ: Why do you use inference?
Under what conditions can we use inference?

As you've seen, t-tests for means are quite similar to the z-tests for proportions. We still follow the same pattern of Setup-Check Assumptions-calculations/arithmetic-decision in the context of the problem.

Now, to use t-methods we have to prove that the distribution of x-bars is probably mound-shaped and symmetric enough to invoke the CLT. If given the data, sketch the histogram of the observations or the Normal probabilty plot. If either sample observation graph looks severely non-normal (with gaps or outliers), then we cannot assume that the means of the samples drawn from that type of population would be close to Normal.

Our quiz on 3/2 will be a lot like the lab we did in class on Monday. You will have data to analyze (SCAD).

3/4/10 Now you've taken two quizzes and shown great improvement.

Things you can do to improve your communication:

Label the graph of the observations or the Normal Probablity plot. The x axis of the histogram should be named with the thing you're measuring, like blood pressure. The scale should be added at about 5 places (no need to mark off every little bit). The y axis is the frequency. Help the reader out by labeling the tallest bar.

Label your Normal Probability plot with the definition of the x at the bottom. No scales are necessary if you label the NPP as a Normal Probability Plot.


Sketch the Normal curve before you start the calculations. Put the hypothesized mean in the middle. Then mark your x-bar to the right or the left (as appropriate). This will help you put the less than or greater than sign into your calculations correctly. It also reminds you that a probability cannot be less than 0 or more than 1!

Refine your decisions. Compare, conclude, contextualize, convince.

HW due Friday: Finish the AP exam problem begun in class that deals with a paired t-test. A paired t-test is performed exactly the same way as a regular -test, but on the third set of data--the difference between the two sets of DEPENDENT data.

PAIRED T-TESTS
This is used when the two "samples" are not independent, but two measures from each of the experimental units or participants. Examples: pre-tests and post-tests on the same students, the e.coli problem, the pharmacy problem, and the hand span problem.

Align the samples so you can subtract the "pre-test" value from the "post-test" value or perform a similar subtraction to generate one list of differences. Use this list as your input for the one-sample t-test.



T-TESTS FOR TWO INDEPENDENT SAMPLES
For this, Ho is mu1 - m2 = some number.

The test statistic is (x-bar 1 - x-bar 2)/std error of the difference of the means.

Use tcdf to find the area in the tail. The procedure is much like the regular t-test.

CONFIDENCE INTERVALS: estimate +- t* (std error of the estimate).

Use the x-bar or the difference of the x-bars as the estimate.
Use the sx/sqrt n or sqrt((Sx1)^2/n + (Sx2)^2 /n ) for the std error.

Work problems from the chapters for homework.



TEST REWORK/TEST "RETAKE"

There will be a test "retake" available on Thursday, March 18 in class for anyone who turns in their completed reworked problems from the Two-proportion test.

The retake will include one-proportion and two proportion tests and intervals, Type I and Type II error, and finding the sufficient sample size for a margin of error.



EXAM REVIEW
Have the draft of your first review page (assigned individually in class) with you in class on March 31. We will peer-review and edit the pages before they go into the class webpage. Go to this page to see examples of how the class of 2009 handled a similar assignment. Your product may be a webpage or a document.

HW for every night: Work problems from chapters 11-13. You should be able to do all these problems.

Essential questions that you need to write responses for:

Why is a t-test used instead of a z-test when we do not have the population standard deviation?

Why is it inappropriate to perform repeated tests instead of relying on one test?

The mean of x for sample 1 - the mean of x for sample 2 = the mean of the paired differences. So why does it matter whether we perform a two-sample test of independent means or a paired t-test? (This is incredibly important!!!)

WRITE AT LEAST A PAGE ABOUT THESE TOPICS. Hint: Consider the effect that your choice of test has on the power of the test.

TEST on inference for means: March 30, 2010