Shared Flashcard Set

Details

Title

Stats final 1

Description

chapters 5-8

Total Cards

Subject

Mathematics

Level

Undergraduate 1

Created

05/09/2014

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Mathematics Flashcards

Cards Return to Set Details

Term

z-score

Definition

z-scores are used to standardize data. they measure how many standard deviations a value is away from the mean

Term

when are normal models appropriate?

Definition

when data is unimodal and roughly symmetric

Term

what are the conditions for correlation?

Definition

1. quantitative variables-- can't be used with categorical data
2. straight enough- scatter plot is linear
3. no outliers

Term

correlation properties

Definition

1. between 1 and -1
2. no units
3. unaffected by changes in center or scale
4. treats x and y symmetrically
5. measures linear association between two variables
6. sensitive to outliers

Term

lurking variable

Definition

a hidden variable that stands behind a relationship and determines it by simultaneously affecting the other two variables

Term

correlation

Definition

measures the linear association between two linear variables

Term

linear model

Definition

gives an equation of a straight line through the data to help predict/understand the relationship between the variables

Term

residual

Definition

the difference between the observed value and its associated predicted value. tells us how far off the model's prediction is at that point

Term

line of best fit/least squares line/regression lines

Definition

line for which the sum of squared residuals is smallest

Term

slope

Definition

tells how rapidly y-hat changes in respect to x. remember, the slope of x is not the reciprocal of the slope of y, so you'd have to create a whole new model from the data if you wanted to find a model for x

Term

conditions for regression models

Definition

1. quantitative
2. straight enough
3. no outliers

Term

proper scatter plot of residuals

Definition

very boring, spread horizontally with even scatter, no interesting features like direction or shape.

Term

variance

Definition

the sum of squared deviations from the mean, divided by the count minus one. (almost the average deviation from the mean)

Term

R-squared

Definition

gives the fraction of the data's variation accounted for by the model.

Term

regression conditions

Definition

1. quantitative variable
2. straight enough
3. no outliers
4. spread of data of the data around the generally straight relationship seems consistent

Term

regression to the mean

Definition

because the correlation is always less than 1.0 in magnitude, each predicted y-hat tends to be fewer standard deviations from its mean than its corresponding x was from its mean.

Term

y intercept

Definition

gives a starting value of y-hat values. it's the y-hat value when x is 0.

Term

leverage

Definition

when a data point is unusual because it's x value is far from the mean of x-values

Term

influential

Definition

if omitting a point from the analysis changes the model enough to make a meaningful difference

Term

sampling distribution

Definition

a simulation of what we'd get if we could see all the proportions from all possible samples

Term

sampling distribution model

Definition

allows us to quantify the variation of a statistic from sample to sample and to make statements about where we think the corresponding population parameter is.

Term

sampling error/variability

Definition

sample to sample variation

Term

conditions for the normal model

Definition

1. independence model
2. randomization
3. 10% condition
4. success/failure

Term

10% condition

Definition

once you sample about 10% of the population, the remaining individuals are no longer really independent of each other

Term

success/failure condition

Definition

you should have at least 10 successes and 10 failures in your data.

Term

central limit theorem

Definition

the sampling distribution of any mean becomes more normal as the sample size grows. shape of the population distribution doesn't matter.
remember, CLT doesn't talk about the distribution of the data from the sample. It talks about the distribution of sample means and sample proportions of many different random samples drawn from the same population

Term

conditions for central limit theorem

Definition

1. independence
2. randomization
3. sample size

Term

standard error

Definition

an estimation of the standard deviation of a sampling distribution

Term

confidence intervals

Definition

the percentage of samples of this size will produce confidence intervals that capture the true proportion.
"we are ___% confident that the true proportion lies in our interval.

Term

critical value

Definition

the number of standard errors to move away from the mean of the sampling distribution to correspond to the specified level of confidence.

Term

null hypothesis

Definition

hypothesis to be tested. assumed status quo

Term

alternative hypothesis

Definition

contains the values of the parameter that we consider plausible if we reject the null

Term

p-value

Definition

the probability of seeing data like these or something even less likely given that the null hypothesis is true. how surprised we'd be to see the data we collected if the null hypothesis is true

Term

p-value of one vs. two sided alternative

Definition

the p value of one sided alternatives is half the value of two sided alternatives. this means a one sided alternative will reject the null more often for the same data set

Term

effect size

Definition

the difference between the null hypothesis and the true value of a a model

Term

degrees of freedom

Definition

the number of independent quantities that are left after we've estimated the parameters.

Term

conditions for t tests

Definition

1. randomization
3. independence
2. nearly normal-- greater than 15 because t models have fatter tails and narrower centers

Term

alpha/significant level

Definition

if the p-value falls below this point, we can reject the null hypothesis

Term

critical value

Definition

corresponds to our selected confidence level, it's the value in the sampling distribution model of the statistic whose p-value is equal to the alpha level.

Term

Type 1 error

Definition

the null hypothesis is true, but we mistakenly reject it. a false positive

Term

Type 2 error

Definition

the null hypothesis is false, but we fail to reject it. a false negative

Term

power

Definition

the probability that the text correctly rejects a false null hypothesis.
= 1 - probability of type two error

Term

how do you reduce both type one and two error?

Definition

by narrowing the distribution by lowering the standard deviation by increasing the sample size

Term

paired t test conditions

Definition

1. independence between variables
2. randomization
3. paired data
4. nearly normal

Term

GoT conditions

Definition

1. counted data- data must be counts of data for the categories of categorical variables
2. independence- counts are independent of each other
3. expected cell frequency- at least 5 individuals in each cell

Term

Goodness of fit

Definition

compares counts with a theoretical model

Term

homogeneity

Definition

finding whether the distributions are the same across different groups- find expected values from the data

Term

independence

Definition

asking whether 2 variables measured on the same population are independent

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Mathematics Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile