Term
|
Definition
z-scores are used to standardize data. they measure how many standard deviations a value is away from the mean |
|
|
Term
when are normal models appropriate? |
|
Definition
when data is unimodal and roughly symmetric |
|
|
Term
what are the conditions for correlation? |
|
Definition
1. quantitative variables-- can't be used with categorical data 2. straight enough- scatter plot is linear 3. no outliers |
|
|
Term
|
Definition
1. between 1 and -1 2. no units 3. unaffected by changes in center or scale 4. treats x and y symmetrically 5. measures linear association between two variables 6. sensitive to outliers |
|
|
Term
|
Definition
a hidden variable that stands behind a relationship and determines it by simultaneously affecting the other two variables |
|
|
Term
|
Definition
measures the linear association between two linear variables |
|
|
Term
|
Definition
gives an equation of a straight line through the data to help predict/understand the relationship between the variables |
|
|
Term
|
Definition
the difference between the observed value and its associated predicted value. tells us how far off the model's prediction is at that point |
|
|
Term
line of best fit/least squares line/regression lines |
|
Definition
line for which the sum of squared residuals is smallest |
|
|
Term
|
Definition
tells how rapidly y-hat changes in respect to x. remember, the slope of x is not the reciprocal of the slope of y, so you'd have to create a whole new model from the data if you wanted to find a model for x |
|
|
Term
conditions for regression models |
|
Definition
1. quantitative 2. straight enough 3. no outliers |
|
|
Term
proper scatter plot of residuals |
|
Definition
very boring, spread horizontally with even scatter, no interesting features like direction or shape. |
|
|
Term
|
Definition
the sum of squared deviations from the mean, divided by the count minus one. (almost the average deviation from the mean) |
|
|
Term
|
Definition
gives the fraction of the data's variation accounted for by the model. |
|
|
Term
|
Definition
1. quantitative variable 2. straight enough 3. no outliers 4. spread of data of the data around the generally straight relationship seems consistent |
|
|
Term
|
Definition
because the correlation is always less than 1.0 in magnitude, each predicted y-hat tends to be fewer standard deviations from its mean than its corresponding x was from its mean. |
|
|
Term
|
Definition
gives a starting value of y-hat values. it's the y-hat value when x is 0. |
|
|
Term
|
Definition
when a data point is unusual because it's x value is far from the mean of x-values |
|
|
Term
|
Definition
if omitting a point from the analysis changes the model enough to make a meaningful difference |
|
|
Term
|
Definition
a simulation of what we'd get if we could see all the proportions from all possible samples |
|
|
Term
sampling distribution model |
|
Definition
allows us to quantify the variation of a statistic from sample to sample and to make statements about where we think the corresponding population parameter is. |
|
|
Term
sampling error/variability |
|
Definition
sample to sample variation |
|
|
Term
conditions for the normal model |
|
Definition
1. independence model 2. randomization 3. 10% condition 4. success/failure |
|
|
Term
|
Definition
once you sample about 10% of the population, the remaining individuals are no longer really independent of each other |
|
|
Term
success/failure condition |
|
Definition
you should have at least 10 successes and 10 failures in your data. |
|
|
Term
|
Definition
the sampling distribution of any mean becomes more normal as the sample size grows. shape of the population distribution doesn't matter. remember, CLT doesn't talk about the distribution of the data from the sample. It talks about the distribution of sample means and sample proportions of many different random samples drawn from the same population |
|
|
Term
conditions for central limit theorem |
|
Definition
1. independence 2. randomization 3. sample size |
|
|
Term
|
Definition
an estimation of the standard deviation of a sampling distribution |
|
|
Term
|
Definition
the percentage of samples of this size will produce confidence intervals that capture the true proportion. "we are ___% confident that the true proportion lies in our interval. |
|
|
Term
|
Definition
the number of standard errors to move away from the mean of the sampling distribution to correspond to the specified level of confidence. |
|
|
Term
|
Definition
hypothesis to be tested. assumed status quo |
|
|
Term
|
Definition
contains the values of the parameter that we consider plausible if we reject the null |
|
|
Term
|
Definition
the probability of seeing data like these or something even less likely given that the null hypothesis is true. how surprised we'd be to see the data we collected if the null hypothesis is true |
|
|
Term
p-value of one vs. two sided alternative |
|
Definition
the p value of one sided alternatives is half the value of two sided alternatives. this means a one sided alternative will reject the null more often for the same data set |
|
|
Term
|
Definition
the difference between the null hypothesis and the true value of a a model |
|
|
Term
|
Definition
the number of independent quantities that are left after we've estimated the parameters. |
|
|
Term
|
Definition
1. randomization 3. independence 2. nearly normal-- greater than 15 because t models have fatter tails and narrower centers |
|
|
Term
|
Definition
if the p-value falls below this point, we can reject the null hypothesis |
|
|
Term
|
Definition
corresponds to our selected confidence level, it's the value in the sampling distribution model of the statistic whose p-value is equal to the alpha level. |
|
|
Term
|
Definition
the null hypothesis is true, but we mistakenly reject it. a false positive |
|
|
Term
|
Definition
the null hypothesis is false, but we fail to reject it. a false negative |
|
|
Term
|
Definition
the probability that the text correctly rejects a false null hypothesis. = 1 - probability of type two error |
|
|
Term
how do you reduce both type one and two error? |
|
Definition
by narrowing the distribution by lowering the standard deviation by increasing the sample size |
|
|
Term
|
Definition
1. independence between variables 2. randomization 3. paired data 4. nearly normal |
|
|
Term
|
Definition
1. counted data- data must be counts of data for the categories of categorical variables 2. independence- counts are independent of each other 3. expected cell frequency- at least 5 individuals in each cell |
|
|
Term
|
Definition
compares counts with a theoretical model |
|
|
Term
|
Definition
finding whether the distributions are the same across different groups- find expected values from the data |
|
|
Term
|
Definition
asking whether 2 variables measured on the same population are independent |
|
|