| Term 
 | Definition 
 
        | a measure that can resist the influence of extreme observations 
 e.g Median
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | midpoint of a distribution (i.e. the number such that half the observations are smaller and the other half are larger (n+1)/2 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | 1st Quartile is > 25% of observations 2nd Quartile = median
 3rd Quartile is > 75% of observations
 |  | 
        |  | 
        
        | Term 
 
        | Quartiles (Freund/Perles) |  | Definition 
 
        | the lower quartile (Q1) is the ¼(n+3)th observation 
 the second quartile (median) is the ½(n+1)th observation
 
 the upper quartile (Q3) is the ¼(3n+1)th observation
 |  | 
        |  | 
        
        | Term 
 
        | Choosing a Summary (center/spread) |  | Definition 
 
        | Five number summary is usually better than mean and standard deviation for a distribution or one with strong outliers |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | A curve that has area exactly 1 underneath it.  The area under the curve and above any range of values is the proportion of values that fall in that range |  | 
        |  | 
        
        | Term 
 
        | Mean of skewed distribution |  | Definition 
 
        | The mean of a skewed distribution is pulled toward the long tail |  | 
        |  | 
        
        | Term 
 
        | Normal Curve/Distribution |  | Definition 
 
        | Symmetric, single-peaked, and bell-shaped |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | 68% of values fall within the 1 std dev from the mean 95% fall within 2 std dev from the mean
 99.7% fall within 3 std dev from the mean
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | subtract mean of distribution from value and divide by standard deviation (z-score) |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | tells is how many standard deviations original value falls away from the mean and in what direction |  | 
        |  | 
        
        | Term 
 
        | Standard Normal Distribution |  | Definition 
 
        | The normal distribution with mean 0 and standard deviation 1 |  | 
        |  | 
        
        | Term 
 
        | Behavior of Mean of Skewed Distribution |  | Definition 
 
        | Mean moves farther toward long tail for a skewed curve |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | minimum, Q1, Q2(Median), Q3 Maximum |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | s is zero when there is no spread and gets larger as spread increases |  | 
        |  | 
        
        | Term 
 | Definition 
 | 
        |  | 
        
        | Term 
 | Definition 
 
        | sum of individual deviations squared divided by the degrees of freedom (i.e. n-1) |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Q3-Q1  (Outlier is 1.5 X IQR above Q3 or below Q1 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Measures outcome of a study |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | explains or influences changes in a response variable |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Plot explanatory variable on x-axis and response variable on the y-axis |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | when above average of one variable tend to accompany above average of the other or below average values tend to occur together |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | when above average value of one variable accompany below average values of the other and vice versa |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | when points in a scatter plot lie in a straight line pattern |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | the sum of the x deviations over std dev of x times the y deviations times 1/n-1 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Correlation makes no distinction between x and y |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Because r uses standardized variables r doesn't change when change units of measurement for x and y or both |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Positive r indicates positive association and negative r indicates negative correlation |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | r is always between -1 and 1 and strength increases as move away from 0 in either direction (r = +-1 points lie on straight line) |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | correlation measure strength of linear relationship only not curved |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | correlation is not resistant i.e. affected by outliers |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | a straight line that describes how a response variable changes as an explanatory variable changes |  | 
        |  | 
        
        | Term 
 
        | Least-squares regression line |  | Definition 
 
        | the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible slope = r*(sy/sx)
 intercepts = y-b*x
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | along the regression line a change of one std dev in x corresponds to a change of r std dev in y in other words as correlation grows less strong the prediction moves kess in response to changes in x |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | The difference between an observed value of the response variable and the value predicted by the regression line residual = obs y - predicted y
 |  | 
        |  | 
        
        | Term 
 
        | Mean of least-squares residuals |  | Definition 
 | 
        |  | 
        
        | Term 
 | Definition 
 
        | a scatterplot of the regression residuals against the explanatory variable |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | point in extreme of x direction which has a strong influence on the position of the regression line |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | observation that lies outside the overall pattern of the other observations |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | the use of a regression line for prediction far outside the range of values of the explanatory variable |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | correlations based on averages are usually too high when applied to individuals |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | a variable that has an important effect on the relationship among the variables in a study but is not included amont the variables studied |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | changing one of the variables causes changes in the other - usually caused by lurking variable |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | an association between an explanatory variable and a response variable is not by itself good evidence that changes in x cause changes in y even if that association is strong |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Association is strong Association is consistent
 Higher doses are associated with stronger responses
 Cause precedes effect in time
 Cause is plausible
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | table defining two categorical variables |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | row and column totals that appear at right and bottom margins of a two way table |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | an association or comparison that holds for all of several groups can reverse direction when the data are combined to form a single group |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | observes individuals and measures variables of interest but does not attempt to influence responses  e.g. sampling |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | study that deliberately imposes some treatment on individuals in order to observe their responses |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | when two variables (explanatory or lurking) effects on a response variable cannot be distinguished from each other |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | entire group of individuals we want info about |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | subset of population that we actually examine in order to gather information |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | method used to choose sample from population |  | 
        |  | 
        
        | Term 
 
        | Voluntary Response Sample |  | Definition 
 
        | sample where people choose themselves to respond to a general appeal.  biased b/c people with strong opinions-especially negatve ones-are most likely to respond |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | sample design that chooses the individuals easiest to reach |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | systematic error; i.e. sample design that favors certain outcomes |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | consists of n indviduals from a population chosen such that every set of n individuals has an equal chance to be selected |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | sample technique that gives each member of the population a known chance of being selected |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | divides population into groups of similar individuals called strata and then choosing a SRS from each stratum and combining the SRSs to form sample |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | groups of similar individuals within a population used in stratified random sampling |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Stage 1: Divide population into groups and select a sample of the groups Stage 2: divided groups from one into smaller areas called blocks and take a stratified sample from the blocks
 Stage 3: Sort individuals from blocks into clusters and take random sample of clusters
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | when some groups in the population are left out of SRS.  e.g. phone survey and 6% w/o phones |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | when an individual chosen for the sample can't be contacted or refuses to cooperate |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | bias caused by behavior of respondent or interviewer  e.g. respondent lying, race or sex of interviewer |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | bringing events in the past forward in memory to more recent time periods  e.g. saw dentist 8 months ago and say yes to seeing dentist in the last 6 mos. |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | wording of quesions in sample surveys can introduce bias |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | list of individuals from which a sample is selected |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | The individuals on which an experiment is done |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | the experimental units when dealing with human beings |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | experimental condition applied to the units |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | the explanatory variable(s) in an experiment |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | values of the factors in an experimental treatment |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | use of chance to divide experimental units into groups in an experiment |  | 
        |  | 
        
        | Term 
 
        | Randomized Comparative Experiment |  | Definition 
 
        | An experiment that uses both comparison and randomization |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | experimental design where all experimental units are allocated at random among all treatments |  | 
        |  | 
        
        | Term 
 
        | Statistically Significant |  | Definition 
 
        | An observed effect so large that it would rarely occur by chance |  | 
        |  |