Shared Flashcard Set

Details

Statistics Final Flashcards
Final flashcards for STATS220
54
Mathematics
Undergraduate 1
03/18/2013

Additional Mathematics Flashcards

 


 

Cards

Term
Observational Unit
Definition
basic unit/individual that we are describing in the study
Term
Variable
Definition
data we are recording for each observational unit
Term
Qualitative
Definition
categorical (not-numbers)
Term
Quantitative
Definition
numbers or measurements
Term
Observational Study
Definition
the investigator simply records what is/has happened
Term
Experiment
Definition
the investigator imposes a treatment on the observational units
Term
Sample
Definition
The observational units on which we have data (***if we gave someone a questionnaire but they didn’t return it, they don’t count!***)
Term
Sampling Frame
Definition
All observational units who had a chance of being selected in the sample
Term
Population
Definition
The group of observational units we are ultimately trying to describe.

The population will depend on the question being asked

Sometimes the sample/sampling frame/population can be the same group. That is called a census.
Term
Parameter
Definition
truth about the population

*** Will almost always be in % unless explicitly asks for a number***

We usually do not know the true number, but we can describe it in words (% of UW students who live on campus)
Term
Statistic
Definition
describes the sample

May not be given in % format, but should be converted to match format of the parameter

We will usually be able to calculate this from the data given
Term
Parameter VS Statistic
Definition
Describes the population
Fixed, will not change
True value may be unknown

Describes the Sample
Will vary when different samples are taken
Will be able to compute from information given
Term
Probability sample
Definition
includes SRS. Any type of design in which randomization is used to pick the observational units
Term
Convenience sampling
Definition
the investigator selects which observational units will be in the sample

**almost always biased**
Term
Voluntary sample
Definition
the observational units choose whether they want to be in the sample or not

**almost always biased**
Term
Variability VS Bias
Definition
Think of Variability as how spread out are my estimates

Think of bias as how far away are my estimates from the truth

Not one against the other, both can be high, both can be low, or can be opposite
Term
Sources of Variability (4)
Definition
Random Sampling Error (sampling variability). ***This is the only variability accounted for by the Margin of Error*** Any additional bias or variability caused by poor survey design will add extra variability

Shortcut method for Confidence Interval = p.hat +/- 1/sqrt(n); where n = sample size

Confidence statement- We are 95% confident that the true parameter lies between (confidence interval)
***95% of the time that I follow this same procedure and construct a confidence interval, it will cover the true
parameter***

When the sample size increases we can be more sure about our estimate, so we do not need as large of a margin of error
Term
Sources of Bias (2; 1 with 4 possible)
Definition
Undercoverage- when the sampling frame does not accurately reflect the population (ex. random digit dials won’t include people without phones)

Non-sampling errors-
o Response error- people don’t answer truthfully (ex- how many times have you cheated on a test?)

o Non-response- when people don’t respond because they can’t be contact or don’t cooperate

o Processing errors- typos when recording data

o Question wording- confusing questions or questions which can cause a certain response to be more likely (leading questions)
Term
Explanatory variable
Definition
a variable that may cause a change in the response variable; the cause, usually the X variable
Term
Response variable
Definition
measures the outcome of an experiment; the effect, usually the Y variable
Term
Treatment
Definition
specific condition that is applied in an experiment; often the explanatory variable or mix of explanatory variables
Term
Lurking variable
Definition
variable that may have effect on response variable that is not measured
Term
Confounding variable
Definition
When two variables have effects on the response variable that cannot be distinguished from each other
Term
Statistically significant
Definition
***The result we found would rarely occur simply by chance***
Term
Placebo effect
Definition
The benefit derived from the psychological effect of receiving a treatment
Term
Double Blind
Definition
Both the clinicians and subjects are “blind” to whether they are in the control or treatment group
Term
Random Assignment
Definition
using impersonal chance to assign subjects to either the treatment or control group
Term
Histograms: Shape
Definition
Is the distribution skewed or symmetric? Is there one mode or multiple modes?
Term
Histograms: Range/Body
Definition
where do most of the observations lie? What is the highest/lowest values?
Term
Histograms: Center
Definition
What is the center point of the distribution? (mean, median or mode)
Term
Numerical Descriptions: Mean
Definition
Add up all observations and then divide the total by the number of observations. Highly affected by outliers, changes when you add to/multiply to the data
Term
Numerical Descriptions: Median
Definition
Midpoint of the distribution. Sort all your observations, and choose the middle observation, or average the middle two if there are an even number of observations. Not affected by outliers as much, changes when you add to/multiply to data
Term
Numerical Descriptions: Mode
Definition
Most common obs.
Term
Numerical Descriptions: Percentiles
Definition
The cth percentile of a distribution is defined so that (at least) c% of the observations are at or below it and (at
least) (100-c)% of the observations are at or above it
Term
Numerical Descriptions: Five Number summary
Definition
Min, 25%, Median, 75%, Max
Term
Numerical Descriptions: Standard Deviation
Definition
a measure of how spread out the data are. 68% of all observations lie within +/- 1 sd of the mean, 95% within 2 sd, 99.7% within 3 sd, changes when add to/multiply to data

o First find xbar (mean)
o Then add up (x – xbar)squared for each observation
o Divide that total by n-1
o Take the square root of that ratio
Term
Measures of Center
Definition
Mean, median and mode
Term
Quartiles
Definition
• At least 25% of observations are ≤ 1st Quartile, and at least 75% of observations are ≥ 1st quartile

• At least 75% of observations are ≤ 3rd Quartile, and at least 25% of observations are ≥ 3rd quartile

• Interquartile range = 3rd quartile – 1st quartile

• Changes when you add to or multiply the data
Term
Scatterplots
Definition
plots two variables on same graph. Each point is one individual observation
Term
Correlation
Definition
measures “strength of relationship between two variables

• Always between -1 and 1
• Positive correlation means positive association (as one increases, so does the other). Negative value means negative
association (as one increases, the other decreases)
• ***Correlation does not imply causation!!***
• Must be linear (or football shaped) to be valid measurement of association. No outliers
Term
Ecological Correlation
Definition
correlations based on averages or rates. Usually overstates the correlation
Term
Slope
Definition
rSy/Sx
Term
To find intercept
Definition
y.bar = a + b(x.bar)
Term
Regression SD
Definition
Regression sd is the “average size of error” (√1 − r.squared)(Sy) ***only use this when you are making a prediction involving prior information*** (think about on the quiz. When we picked a random student and guessed their quiz 2 score, we used the quiz two average and sd. But when we knew their quiz one score, then we used the regression sd)
Term
Prediction error
Definition
actual – predicted
Term
Regression Effect
Definition
observations that are extreme in the X-direction are not as extreme in the Y-direction
Term
Rules about probability
Definition
P(A) must be between 0 and 1
Total probability must add up to 1
P(A not happening) = 1- P(A)
Term
Normal Curves
Definition
Symmetric, bell shaped

Only need to know mean and standard deviation to define the whole curve

68% of all observations lie within +/- 1 sd of the mean, 95% within 2 sd, 99.7% within 3 sd

The standard score is the number of standard deviations an observation is away from the mean
std. score = (obs - mean)/ SD

Once we have the standard score, we can look up P(X < standard score) in Table B of the book
Term
Central Limit Theorem
Definition
As we take larger and larger samples, the sum or average (not product or ratio) will begin to look like an normal curve
Term
Central Limit Theorem pt. 2
Definition
If we take a sample proportion many times, the distribution will be a normal distribution with mean = p, and standard deviation √ [ p (1-p) / n ]
Term
Central Limit Theorem pt. 3
Definition
We would expect 95% of all p-hats to be within 2 sd of the mean, or p ± 2√ [ p (1-p) / n ]
Term
Central Limit Theorem pt. 4
Definition
When we don’t know mean, but have an estimate for p,
Term
Test of Significance
Definition
The basic idea, is that we will reject the null hypothesis if our observation is very unlikely to happen if the null hypothesis is true

Null Hypothesis- the status quo, or the no change option
Alternative Hypothesis- usually what we are trying to prove
Term
Calculating test of significance
Definition
Assume the null hypothesis is true, and calculate how likely our sample
1. Determine the mean and standard deviation of our “Null distribution” (the distribution when the null is true)
--Find mean = p and sd = √ [ p (1-p) / n ]
--Find the standard score of p.hat = (p.hat - p) /√ [ p (1-p) / n ]
Look up the value in the table. ***you may need to subtract the value from 1 depending on whether you want the area to the left or to the right of the standard score***

This is the p-value- the probability that something as extreme or more extreme than our current observation would occur when the null is true

If the p-value is less than .05, reject the null hypothesis
Supporting users have an ad free experience!