Term
|
Definition
| z-scores are used to standardize data. they measure how many standard deviations a value is away from the mean |
|
|
Term
| when are normal models appropriate? |
|
Definition
| when data is unimodal and roughly symmetric |
|
|
Term
| what are the conditions for correlation? |
|
Definition
1. quantitative variables-- can't be used with categorical data 2. straight enough- scatter plot is linear 3. no outliers |
|
|
Term
|
Definition
1. between 1 and -1 2. no units 3. unaffected by changes in center or scale 4. treats x and y symmetrically 5. measures linear association between two variables 6. sensitive to outliers |
|
|
Term
|
Definition
| a hidden variable that stands behind a relationship and determines it by simultaneously affecting the other two variables |
|
|
Term
|
Definition
| measures the linear association between two linear variables |
|
|
Term
|
Definition
| gives an equation of a straight line through the data to help predict/understand the relationship between the variables |
|
|
Term
|
Definition
| the difference between the observed value and its associated predicted value. tells us how far off the model's prediction is at that point |
|
|
Term
| line of best fit/least squares line/regression lines |
|
Definition
| line for which the sum of squared residuals is smallest |
|
|
Term
|
Definition
| tells how rapidly y-hat changes in respect to x. remember, the slope of x is not the reciprocal of the slope of y, so you'd have to create a whole new model from the data if you wanted to find a model for x |
|
|
Term
| conditions for regression models |
|
Definition
1. quantitative 2. straight enough 3. no outliers |
|
|
Term
| proper scatter plot of residuals |
|
Definition
| very boring, spread horizontally with even scatter, no interesting features like direction or shape. |
|
|
Term
|
Definition
| the sum of squared deviations from the mean, divided by the count minus one. (almost the average deviation from the mean) |
|
|
Term
|
Definition
| gives the fraction of the data's variation accounted for by the model. |
|
|
Term
|
Definition
1. quantitative variable 2. straight enough 3. no outliers 4. spread of data of the data around the generally straight relationship seems consistent |
|
|
Term
|
Definition
| because the correlation is always less than 1.0 in magnitude, each predicted y-hat tends to be fewer standard deviations from its mean than its corresponding x was from its mean. |
|
|
Term
|
Definition
| gives a starting value of y-hat values. it's the y-hat value when x is 0. |
|
|
Term
|
Definition
| when a data point is unusual because it's x value is far from the mean of x-values |
|
|
Term
|
Definition
| if omitting a point from the analysis changes the model enough to make a meaningful difference |
|
|
Term
|
Definition
| a simulation of what we'd get if we could see all the proportions from all possible samples |
|
|
Term
| sampling distribution model |
|
Definition
| allows us to quantify the variation of a statistic from sample to sample and to make statements about where we think the corresponding population parameter is. |
|
|
Term
| sampling error/variability |
|
Definition
| sample to sample variation |
|
|
Term
| conditions for the normal model |
|
Definition
1. independence model 2. randomization 3. 10% condition 4. success/failure |
|
|
Term
|
Definition
| once you sample about 10% of the population, the remaining individuals are no longer really independent of each other |
|
|
Term
| success/failure condition |
|
Definition
| you should have at least 10 successes and 10 failures in your data. |
|
|
Term
|
Definition
the sampling distribution of any mean becomes more normal as the sample size grows. shape of the population distribution doesn't matter. remember, CLT doesn't talk about the distribution of the data from the sample. It talks about the distribution of sample means and sample proportions of many different random samples drawn from the same population |
|
|
Term
| conditions for central limit theorem |
|
Definition
1. independence 2. randomization 3. sample size |
|
|
Term
|
Definition
| an estimation of the standard deviation of a sampling distribution |
|
|
Term
|
Definition
the percentage of samples of this size will produce confidence intervals that capture the true proportion. "we are ___% confident that the true proportion lies in our interval. |
|
|
Term
|
Definition
| the number of standard errors to move away from the mean of the sampling distribution to correspond to the specified level of confidence. |
|
|
Term
|
Definition
| hypothesis to be tested. assumed status quo |
|
|
Term
|
Definition
| contains the values of the parameter that we consider plausible if we reject the null |
|
|
Term
|
Definition
| the probability of seeing data like these or something even less likely given that the null hypothesis is true. how surprised we'd be to see the data we collected if the null hypothesis is true |
|
|
Term
| p-value of one vs. two sided alternative |
|
Definition
| the p value of one sided alternatives is half the value of two sided alternatives. this means a one sided alternative will reject the null more often for the same data set |
|
|
Term
|
Definition
| the difference between the null hypothesis and the true value of a a model |
|
|
Term
|
Definition
| the number of independent quantities that are left after we've estimated the parameters. |
|
|
Term
|
Definition
1. randomization 3. independence 2. nearly normal-- greater than 15 because t models have fatter tails and narrower centers |
|
|
Term
|
Definition
| if the p-value falls below this point, we can reject the null hypothesis |
|
|
Term
|
Definition
| corresponds to our selected confidence level, it's the value in the sampling distribution model of the statistic whose p-value is equal to the alpha level. |
|
|
Term
|
Definition
| the null hypothesis is true, but we mistakenly reject it. a false positive |
|
|
Term
|
Definition
| the null hypothesis is false, but we fail to reject it. a false negative |
|
|
Term
|
Definition
the probability that the text correctly rejects a false null hypothesis. = 1 - probability of type two error |
|
|
Term
| how do you reduce both type one and two error? |
|
Definition
| by narrowing the distribution by lowering the standard deviation by increasing the sample size |
|
|
Term
|
Definition
1. independence between variables 2. randomization 3. paired data 4. nearly normal |
|
|
Term
|
Definition
1. counted data- data must be counts of data for the categories of categorical variables 2. independence- counts are independent of each other 3. expected cell frequency- at least 5 individuals in each cell |
|
|
Term
|
Definition
| compares counts with a theoretical model |
|
|
Term
|
Definition
| finding whether the distributions are the same across different groups- find expected values from the data |
|
|
Term
|
Definition
| asking whether 2 variables measured on the same population are independent |
|
|