Friday, March 25, 2011

Chi-square tests

We're nearing the end of the new material. How sad!

HW for tonight: 14.3, 14.4, 14.5, and 14.8

Wednesday, February 09, 2011

Second semester 2011--Starting with chapter 9&10

Welcome back, students!


We begin by investigating the distributions of p-hat and x-bar. This is the concept of sampling distributions. We consider the distribution of ALL the sample means that we would observe if we took EVERY sample of size n from a population.

In class on Wednesday we collected data: we computed average penny ages from samples of sizes 5, 10, and 25. We have more collecting and computing to do before the distributions become evident from the graphs. Prepare to crank through more pennies on Thursday. (The program we used to report the findings is Fathom.)

Homework January 4th: Work one problem from the handout completely. Become a master of that problem.

Homework January 5th: Read pages 563-568 from the text and work problems 9.1, 9.2, and 9.5.

Be prepared for a quiz at any time.

Homework January 7th: Problem 9.7 using Excel if possible, problems 9.10 through 9.17.

Homework January 20: Problems 9.31, 9.32, 9.33 HAVE THIS DONE BY MONDAY. Remember, your book should be read by Wednesday.


February 9, 2011
You've been busy in class collecting data and constructing confidence intervals for the mean and for the proportion.
There are three cases to consider on tomorrow's test:

Confidence intervals for proportions
Check n phat and n(1-phat) and that n < 1/0 N
Use the sample proportion (phat) in your calculation of the standard error
Use a Z statistic for computing the margin of error
Don't forget the interpretation

Confidence intervals for the mean (when we miraculously KNOW the population standard deviation)
Check that the observed values would not indicate that the means would be non-normal
Are the observations random and independent?
Use the CLT--standard error = pop std dev / sqrt sample size
Use Z (remember, this is the miraculous case)
Don't forget the interpretation

Confidence intervals for the mean (when we miraculously DON'T KNOW the population standard deviation)
Check that the observed values would not indicate that the means would be non-normal
Are the observations random and independent?
Use the CLT--standard error = SAMPLE std dev / sqrt sample size
Use t-distribution with n-1 degrees of freedom
Don't forget the interpretation

Margin of error = (Z or t)* Std error
Greater confidence = wider margin of error
Larger sample size = smaller margin of error

Good interpretation of the confidence interval:
We are 95% confident that the true population mean test score falls between 3.2 and 3.6.

Good interpretation of the confidence level:
If this procedure were repeated many times, we would expect approximately 95% of the confidence intervals constructed from the sample mean test scores to contain the true population mean test score.

Bad interpretations:
Like you really expected me to post BAD examples? Anything that says there is a 95% chance. . . is really bad.

A confidence interval without an interpretation is relatively worthless. Almost as bad as using a point estimate instead of a confidence interval! Don't fall into the lazy trap of answering questions without including all the required parts.

Monday, December 13, 2010

Preparing for the 1st semester final

Topics on the AP Stat 1st Semester Final Exam

Types of graphs, their advantages and disadvantages, their interpretations
Measures of center and spread, their calculation, their different meanings, and their uses
Measures of position, converting back and forth among different measures (i.e. percentile, observed value, and z-value)
Probabilities associated with continuous random variables (area under the curve, normalcdf, empirical rule, Chebyshev’s theorem)
Probabilities associated with discrete random variables, multiplication property, addition property, independence, conditional probabilities
Special discrete random variables—binomial and geometric distributions
Relationships in two variables – linear regression, residuals, interpreting regression output, correlation coefficient, coefficient of determination
Means and standard deviations of combinations and transformations of random variables
Design of surveys, types of bias, types of sampling
Design of experiments, methods of randomizing, methods of control, matched pairs design, blocking, causation
Vocabulary: for instance, outliers, clusters, gaps, population, sample, variance, influential observations

Answer the following two questions on two separate sheets of paper. Your response to question one will be graded as a small test grade. Your response to question two will be the free response part (take-home portion) of your final exam. Your answers may be hand-written or typed, but must be legible and complete. Computed numbers that are unsupported by their calculations will be given no credit. You may NOT work together on this assignment.

Question One: Using an example from the second half of your selected book (Bringing Down the House, Freakonomics, etc.), explain a specific connection to one of the topics in the list of exam topics above. You may get creative with your product for this question. It may be in the form of a Powerpoint, a 9”x12” poster, or other appropriate written or mixed media form. Interpretive dance is not appropriate.

Question Two: Answer the problem handed out in class on a separate sheet of paper. You must work alone on this problem.

Monday, December 06, 2010

Distributions of Random Variables

We're combining parts of chapters 6-8 to build on students' prior understanding of probability.

First up: Geometric and binomial probabilities
geometric probabilities:
Know the 4 characteristics that define a geometric distribution
Know how to find the expected value of x
Know how to find probabilities for values of x (both individual probabilities and cumulative probabilities)

bimomial probabilities:
geometric probabilities:
Know the 4 characteristics that define a binomial distribution
Know how to find the expected value of x
Know how to find probabilities for values of x (both individual probabilities and cumulative probabilities)

Be able to identify a binomial or geometric distribution when you read a problem.
Define the random variable x.
Solve problems related to probabilities for these distributions.

Next up: any other discrete distributions
Work with valid probability distributions (individual and cumulative)

Apply the concept of independent events to joint probability problems.

Apply the concepts of disjoint sets and complements to find probabilities.

Find the means and standard deviations of transformations of a random variable and combinations of independent random variables. YOu'll have to bookmark these pages and study them A LOT!

Monday December 6, 2010
Testing on Thursday on distributions of random variables. We will be learning new material through Wednesday. Be here!

Thursday, November 04, 2010

Producing Data

So we've moved into survey design and experimental design. (Chapter 5 in the text.)

Important vocabulary:
nonreaponse bias
response bias
convenience sample
stratified random sample
systematic random sample (like The Lottery by Shirley Jackson)
simple random sample (SRS)

We used the Table of Random Digits to (1) pick a sample, (2) Simulate a random event, and (3) randomly allocate participants to experimental treatments.

We have looked at a few experimental design/survey design questions from previous exams.

HW due Thursday 11/4: Problems 5.2, 5.3, 5.10 ,5.11
HW due Friday 11/5: Written answers on your own paper to the 1998 and 2002 experimental design questions handed out in class.

Friday, October 22, 2010

Bivariate distributions

We've moved from investigations using single variables to the world of two variables. The first type of bivariate relationship we study is the relationship between two numerical variables.

We collected data on Monday and crunched numbers again on Tuesday to find the least squares regression line and the coefficient of determination. R^2.

On Wednesday we studied the correlation coefficient, the slope and intercept of the LSRL, and the patterns in residuals. We saw that the point (x-bar, y-bar) lies on the LSRL and that R^2 may not be an indicator of a good model.

Based on your new knowledge of these concepts, please expand your 4 inch summary of section 3.1 to 5 inches of strong content.

Do the problems in section 3.1 that relate to the manatees and to the archaeopteryx.

Quiz answers:
scatterplot - points have a strong positive linear pattern with no outliers. Graph should have labels and scale.
LSRL y-hat = -10.64 + 4.117x
Residuals - y - y-hat graphed against x. To compute residuals, use L2 - Y1(L1). The graph shows that the residuals fluctuate above and below the axis with varying distances.
Interpretation - Because the residuals can be interpreted as randomly scattered about the residual = 0 line, the linear model is good.
Caveat- Because the residuals seem to be getting further from residuals = 0 as x gets larger, we might be concerned about our error increasing as length increases. Beware telescoping residuals.

October 8, 2010

This week student should have completed problems 3.35, 3.36, 3.37, & 3.37 from the text. For HW due Monday, they need to complete problems 3.39, 3.40, & 3.48.

What have we done so far? Collected bivariate data. Looked at them. Computed the LSRL. Computed residuals. Interpreted the residuals and the slope of the LSRL. Used the LSRL to predict a value of y. Performed a Linear regression t-test to determine the significance of the slope.

What do we have to do? Practice and interpret outputs.

ALSO, pick a book. Suggestions: A Civil Action, Freakonomics, Bringing Down the House, Moneyball, And the Band Played On, The Lady Tasting Tea. Get your parents' permission to read your book. You should have it finished by the end of Thanksgiving break.

Monday, October 11, 2010

We collected data that we expected to have no correlation. In 13 of the 15 cases, we got what we expected. We graphed the ordered pairs, computed the LSRL, checked the residuals, performed a linear regression t-test, and interpreted the results.
Small p-value>>> reject the null hypothesis--that there is no linear relationship between x and y. Instead, we have evidence indicating that there is a linear relationship.
Large p-value>>> fail to reject the null hypothesis. We do not have compelling evidence that there is a linear relationship.

HW problems 3.6 and 3.61

Have your papeback on Friday. You will be given a reading day.
As we continue through bivariate distributions, please take care to clearly identify the transformation you have performed on your lists in the calculator. For instance, log L2 may make sense to you, but you may be better off by renaming the list log life expectancy.

Typically, students have problems when they graph the curves through data. The linear regression graph only works with straightened (transformed) data. The curves go through the original data.

Your test on bivariate data will be Thursday, Oct 28.

Monday, August 30, 2010

Measurements of position

Hmmm. Z values. Percentile ranks. Proportions between two x-values.
How are these connected for Normal distributions?

The percentile for a particular z-value is the value in the body of the Z table that represents the "sum" of the column and row titles. For Negative z-values, just append (attach) the hundredths place digit. For instance. . .
row 1.3, column 0.4 ==> 1.34 = z. This is the 90.99th percentile.
for row -2.3, column 0.4 ==> -2.34 = z. With a table value if 0.0096, this is just a hair under the 1st percentile.

The percentile is the proportion of data that lies to the left of the x value or is equal to it. If you took a test and scored at the 99th percentile, 99% of all other test scores should be equal to your score or below it.

Another way to find the percentile is to use the NormalCDF function on the calculator. Use NormalCDF(lower bound, upper bound) where the boundary values are z scores. To find the percentile for a Normally-distributed z value, we use the lower bound of negative infinity and the upper bound of the z under consideration.

We can use -999999 for negative infinity. NormalCDF(-999999,1) = the proportion of the population of Normally distributed z values that fall equal to or below 1.

To find the Z value for a particular percentile, use the inverse of the NormalCDF function-- INVNorm. To find the 95th percentile, enter InvNorm(.95). Approximately 95% of all z-values in a Normal distribution will fall below this value.

To find the X value that corresponds to the desired Z value, take the mean and add Z standard deviations.

Practice converting X values into z values adn percentiles into X values. Do the problems on page 147.