Wednesday, November 01, 2006

Chapter 5 Sampling, experiments, and simulation

Essential questions:

Can the data we collected be generalized to the population?

How can the survey or experiment be designed to accomplish our goals?

How can we confirm our suspicions using simulation?

-----------------------------

Running list of key concepts from class:
Survey
Census
Simple Random Sample (SRS)
Systematic Random Sample
Stratified Random Sample takes samples from all strata
Convenience Sampling
Table of Random Digits


Cluster Sampling takes a sample from a few clusters
Multi-stage sampling is a complex form of cluster sampling
Probability Sample like the computer lottery at LHS
Bias when method favors certain outcome(s)
Undercoverage when systematically omits part of population from inclusion
Non-Response when they refuse to participate
Sampling Frame is the list from which the sample is drawn


Experiments:
observational studies
experiments
experimental units/subjects
treatment
factor
level


control
comparison of several treatments
placebo effect results in bias
reduces the effect of lurking variables (confounding and bias)
could include blocking (not required) *BLOCKING reduces the variability within the group, so effects of the treatments can be more easily recognized.
control group
matched pairs design is smallest block

randomization
matching of characteristics does not work
required real randomization, not just haphazard guesswork
makes the effect of any uncontrollable lurking variables affect all groups equally, thereby also reducing bias
When the problem asks for the experimental design, it requires that you describe how you will randomly allocate experimental units/subjects to treatment groups. Two key points to remember: you CAN'T randomly assign subjects to blocks, because the characteristic you are blocking for is not random, AND this is not a SRS.


replication
allows you to generalize your data to your population
makes the experiment more sensitive to differences among treatments, instead of just random variation between the groups. The compiled or averaged results from a larger group of subjects should more precisely represent the actual, underlying truths of the relationship than results from smaller numbers of subjects. Of course, there is a cost trade-off.


simulation
use table of random variables or random number generator
CLEARLY identify what specific random outcomes represent, such as
The digits 0-4 represent a vote for Adams, 5 & 6 are a vote for Jefferson, 7-9 will be a vote for Roosevelt. Take one random digit at a time, comparing the result to our mapping above, until we have identified 100 votes and the corresponding candidates.

or . . . in cases where you CAN'T reuse a number . . . "Assign each child a unique number 01-47. Take two digits at a time from the TORD (table of random digits), recording the names of the students as we select their number, throwing out any number greater than 47 or those which have already been used.

When a question asks you to describe or explain, there should be a description or explanation in your answer. Just providing a mapping is not sufficient.


When it asks you for the sampling or experimental DESIGN, an explanation of how you are going to select your random units must follow. You must describe how you will assign the digits to the outcomes, how you will take the digits from the TORD, what "toss out" rules you need for duplicates or numbers that have no correspondences, and when you will stop. You have to explain it all. You will need to write.

Some common calculator stuff: Rand(100) selects 100 random numbers between 0 and 1 where repetition is HIGHLY unlikely.

RandInt(5,29,31) selects 31 random digits from the range [5,29] and allows repeats.

SortA(L2,L1) sorts both L2 and L1 in the ascending order of L2.



Watch this space for more key words.

2 comments:

Mrs.L said...

I didn't assign problem 5.62 because managing the data is waaaaaaaaaayyyyyyy too difficult and not within the scope of our subject. Thank you for taking on the extra challenge.

For problem 63, what is the probability that neither card is a heart? 39 C 2 / 52 C 2. Use that percentage to assign values.

Mrs.L said...

You can look at the solutions in the back of the text to see how to proceed. The text expects you to use simulations to investigate these problems.

For instance, for 5.61, the authors suggest that you randomly select 41 birthdays from 365 days, sort them, and see if there are any repeated numbers. Repeat at least ten times.

For problem 5.63, let 01-13 represent hearts and 14-52 the other cards. Take two pairs of two digits and see if you got a heart. Repeat many times.

Ths point is to simulate the activity.