Shared Flashcard Set

Details

Stats final 1
chapters 5-8
48
Mathematics
Undergraduate 1
05/09/2014

Additional Mathematics Flashcards

 


 

Cards

Term
z-score
Definition
z-scores are used to standardize data. they measure how many standard deviations a value is away from the mean
Term
when are normal models appropriate?
Definition
when data is unimodal and roughly symmetric
Term
what are the conditions for correlation?
Definition
1. quantitative variables-- can't be used with categorical data
2. straight enough- scatter plot is linear
3. no outliers
Term
correlation properties
Definition
1. between 1 and -1
2. no units
3. unaffected by changes in center or scale
4. treats x and y symmetrically
5. measures linear association between two variables
6. sensitive to outliers
Term
lurking variable
Definition
a hidden variable that stands behind a relationship and determines it by simultaneously affecting the other two variables
Term
correlation
Definition
measures the linear association between two linear variables
Term
linear model
Definition
gives an equation of a straight line through the data to help predict/understand the relationship between the variables
Term
residual
Definition
the difference between the observed value and its associated predicted value. tells us how far off the model's prediction is at that point
Term
line of best fit/least squares line/regression lines
Definition
line for which the sum of squared residuals is smallest
Term
slope
Definition
tells how rapidly y-hat changes in respect to x. remember, the slope of x is not the reciprocal of the slope of y, so you'd have to create a whole new model from the data if you wanted to find a model for x
Term
conditions for regression models
Definition
1. quantitative
2. straight enough
3. no outliers
Term
proper scatter plot of residuals
Definition
very boring, spread horizontally with even scatter, no interesting features like direction or shape.
Term
variance
Definition
the sum of squared deviations from the mean, divided by the count minus one. (almost the average deviation from the mean)
Term
R-squared
Definition
gives the fraction of the data's variation accounted for by the model.
Term
regression conditions
Definition
1. quantitative variable
2. straight enough
3. no outliers
4. spread of data of the data around the generally straight relationship seems consistent
Term
regression to the mean
Definition
because the correlation is always less than 1.0 in magnitude, each predicted y-hat tends to be fewer standard deviations from its mean than its corresponding x was from its mean.
Term
y intercept
Definition
gives a starting value of y-hat values. it's the y-hat value when x is 0.
Term
leverage
Definition
when a data point is unusual because it's x value is far from the mean of x-values
Term
influential
Definition
if omitting a point from the analysis changes the model enough to make a meaningful difference
Term
sampling distribution
Definition
a simulation of what we'd get if we could see all the proportions from all possible samples
Term
sampling distribution model
Definition
allows us to quantify the variation of a statistic from sample to sample and to make statements about where we think the corresponding population parameter is.
Term
sampling error/variability
Definition
sample to sample variation
Term
conditions for the normal model
Definition
1. independence model
2. randomization
3. 10% condition
4. success/failure
Term
10% condition
Definition
once you sample about 10% of the population, the remaining individuals are no longer really independent of each other
Term
success/failure condition
Definition
you should have at least 10 successes and 10 failures in your data.
Term
central limit theorem
Definition
the sampling distribution of any mean becomes more normal as the sample size grows. shape of the population distribution doesn't matter.
remember, CLT doesn't talk about the distribution of the data from the sample. It talks about the distribution of sample means and sample proportions of many different random samples drawn from the same population
Term
conditions for central limit theorem
Definition
1. independence
2. randomization
3. sample size
Term
standard error
Definition
an estimation of the standard deviation of a sampling distribution
Term
confidence intervals
Definition
the percentage of samples of this size will produce confidence intervals that capture the true proportion.
"we are ___% confident that the true proportion lies in our interval.
Term
critical value
Definition
the number of standard errors to move away from the mean of the sampling distribution to correspond to the specified level of confidence.
Term
null hypothesis
Definition
hypothesis to be tested. assumed status quo
Term
alternative hypothesis
Definition
contains the values of the parameter that we consider plausible if we reject the null
Term
p-value
Definition
the probability of seeing data like these or something even less likely given that the null hypothesis is true. how surprised we'd be to see the data we collected if the null hypothesis is true
Term
p-value of one vs. two sided alternative
Definition
the p value of one sided alternatives is half the value of two sided alternatives. this means a one sided alternative will reject the null more often for the same data set
Term
effect size
Definition
the difference between the null hypothesis and the true value of a a model
Term
degrees of freedom
Definition
the number of independent quantities that are left after we've estimated the parameters.
Term
conditions for t tests
Definition
1. randomization
3. independence
2. nearly normal-- greater than 15 because t models have fatter tails and narrower centers
Term
alpha/significant level
Definition
if the p-value falls below this point, we can reject the null hypothesis
Term
critical value
Definition
corresponds to our selected confidence level, it's the value in the sampling distribution model of the statistic whose p-value is equal to the alpha level.
Term
Type 1 error
Definition
the null hypothesis is true, but we mistakenly reject it. a false positive
Term
Type 2 error
Definition
the null hypothesis is false, but we fail to reject it. a false negative
Term
power
Definition
the probability that the text correctly rejects a false null hypothesis.
= 1 - probability of type two error
Term
how do you reduce both type one and two error?
Definition
by narrowing the distribution by lowering the standard deviation by increasing the sample size
Term
paired t test conditions
Definition
1. independence between variables
2. randomization
3. paired data
4. nearly normal
Term
GoT conditions
Definition
1. counted data- data must be counts of data for the categories of categorical variables
2. independence- counts are independent of each other
3. expected cell frequency- at least 5 individuals in each cell
Term
Goodness of fit
Definition
compares counts with a theoretical model
Term
homogeneity
Definition
finding whether the distributions are the same across different groups- find expected values from the data
Term
independence
Definition
asking whether 2 variables measured on the same population are independent
Supporting users have an ad free experience!