Term
| The four parts of Statistics |
|
Definition
1)Define Problem 2)Collect Data 3)Analyze and Summarize Data 4)Draw Inference from Data |
|
|
Term
|
Definition
| A list of possible values for a variable along wh how often each value occurs. |
|
|
Term
|
Definition
|
|
Term
|
Definition
|
|
Term
|
Definition
| Graph of the distribution of one quantitative variable. Bars touch. |
|
|
Term
|
Definition
| Distribution of Categorical data. Bars don't touch. |
|
|
Term
|
Definition
| Stems on left, leaves on right. Distribution like histogram, except leaves equal the count, not bars. |
|
|
Term
|
Definition
| Minimum, Q1 (25 percentile), Median, Q2 (75 percentile), Maximum. This is the summary to use if there are outliers. |
|
|
Term
|
Definition
| Boxplots show less detail than histograms or stemplots,used for side-by-side comparison of more than one distribution, visual summary of the five number summary. |
|
|
Term
|
Definition
| Q1=median of values to the left of the actual median. Q3 is median of values to the right of the median. |
|
|
Term
|
Definition
| If one side of a data set is the mirror image to the other side of the data set. This best describes the normal curve. |
|
|
Term
|
Definition
| Tail is on the left or the right, not symmetric |
|
|
Term
|
Definition
| A data point that is quite a bit removed from the rest of the data. Measured by being greater than Q3+(1.5*IQR) or less than Q1-(1.5*IQR) |
|
|
Term
| Inter Quartile Range (IQR) |
|
Definition
|
|
Term
|
Definition
| Measures center of data, balance point. Add up all the numbers, divide by n. Outliers effect mean greatly. |
|
|
Term
|
Definition
| Measures center of data. Cuts the ordered data in half. Order the data and find middle observation. If n is even, median is average of two middle observations. Outliers effect median very little, if at all. |
|
|
Term
|
Definition
| Measures variability of the data around the mean. Outliers make standard deviation greater than it should be. Avg. distance of data from mean |
|
|
Term
| When can you use a Normal Distribution to model a data set |
|
Definition
| When the data is a normal (bell shaped) curve |
|
|
Term
| How do you obtain a proportion or probability from the Normal Curve |
|
Definition
| Covert the value to a z-score and look up the z-score on Table A (Standard Normal Table) It will give you the "less than" percentage. If you want the "greater than" percentage, subtract that probability from 1. |
|
|
Term
| What is a z-score. How to obtain a z-score. |
|
Definition
| Z score tells us how many SD a value is from the mean. Value - mean divided by Standard deviation |
|
|
Term
|
Definition
| 68% of observation on Normal distribution within one SD of mean. 95% of data within 2 SD of mean. 99.7% of data within 3 SD of mean |
|
|
Term
| Explanatory variable. In Regression? |
|
Definition
| variable we are assessing or testing in an experiment. In Regression, this is the x, or the variable that does the predicting. |
|
|
Term
| response variable. In Regression? |
|
Definition
| the measurement we take to assess the explanatory variable. In regression, the y, the variable we want to predict. |
|
|
Term
|
Definition
| is a measure of the linear relationship between x & y |
|
|
Term
|
Definition
symbol for correlation coefficient. The sign of the r is the sign of the slope. Always between -1 and 1. Values close to 0 mean little or no linear relationship. Values close to -1 are strong negative. Values close to +1 are strong positive. No unit of measure. Measures linearity, not any relationship. Correlation between x&y = Correlation between y&x. |
|
|
Term
| Least Squares Regression Line |
|
Definition
| The lined obtained by MINIMIZING the SUM of the Squared RESIDUALS |
|
|
Term
| Why do we use regression equations? |
|
Definition
| Used to model relationships between quantitative variables and also for prediction. |
|
|
Term
| How to make a prediction using the least squares regression line. |
|
Definition
| Plug in given value for x into the equation and solve for y. If you are not given and equation, use the 1st number in the 1st row for output of y, and the 1st # in the 2nd row for slope and use that as your equation. |
|
|
Term
|
Definition
| Observed y - Predicted y. |
|
|
Term
| How do you interpret slope |
|
Definition
| Slope tells us the average increase (or decrease if slope is negative)in y for every one unit increase in x. |
|
|
Term
|
Definition
| Tells us the percentage of total variation in the y's that can be explained by the x's (Regression equation.) |
|
|
Term
| How do you interpret residual plots? |
|
Definition
Uniform scatter (shoe box) means everything is ok. Outliers mean Normality is violated. Megaphone shape means equal variance is violated. A smile/frown which means relationship is not linear, it's curved. |
|
|
Term
| What does standard deviation in regression output mean? |
|
Definition
| The "s" in regression output measures the standard deviation of the y's about the regression line. |
|
|
Term
| Why is extrapolation bad? |
|
Definition
| Extrapolation is using an x value outside of the range of the observed x's to predict y. ad because the relationship outside of the windows may be totally different than relationship observed inside the window of observation. |
|
|
Term
|
Definition
| a variable that affects the relationship between the response and the explanatory variable, but is not part of the study. Bad because they can suggest relationships that don't really exist. |
|
|
Term
| Why don't we say that a clear association between 2 variables establishes causation? |
|
Definition
|
|
Term
| How do we establish causation? |
|
Definition
| With experimentation so that lurking variables can be controlled by randomization. |
|
|
Term
| What is a marginal distribution for categorical data? |
|
Definition
| You take the row (or column) totals and divide them by table totals. (large) |
|
|
Term
| What is a Conditional distribution for categorical data? |
|
Definition
| obtained by using the cell counts in a row (or column) and dividing them by the row (or column) total. (Small) |
|
|
Term
| What happens if all conditional distributions equal the corresponding marginal? |
|
Definition
| The row and the column are NOT related. |
|
|
Term
| What is a voluntary response sample? |
|
Definition
| Samples obtained by having responders contact you instead of contacting the responders (Dear Abby, 900 numbers, Ross Perot & TV Guide) Not probability sample. Many potential responders aren't motivated to respond. |
|
|
Term
| What is a convenience sample? |
|
Definition
| Taking a sample that is not random. It's easily obtained. (Mall samples. Classes on campus. Not probability because many potential responders don't have the chance to be contacted. |
|
|
Term
| What is the population of interest? |
|
Definition
| The group of people the researches wants info on. |
|
|
Term
| What is the response variable? |
|
Definition
| the observation recorded (measured) on each individual. |
|
|
Term
|
Definition
| The subgroup of individuals from the population about which the researcher actually obtains info from. |
|
|
Term
|
Definition
| Means and percentages for the population. |
|
|
Term
|
Definition
| Means and percentages for the sample. |
|
|
Term
| What is bias? How is it eliminated? |
|
Definition
| amount that the sample systematically differs from what it should be. Eliminated by probability samples and using careful wording, etc. |
|
|
Term
|
Definition
| Random sampling from entire population |
|
|
Term
| What is a stratified sample? |
|
Definition
| Sampling from within groups of a population or sampling within different populations. |
|
|
Term
| What is a multistage sample? |
|
Definition
| First sampling groups, then sampling within those groups. |
|
|
Term
| What is a probability sample? |
|
Definition
| Multisatge, Stratified, SRS. Every member of the population has a known non-zero chance of being selected. |
|
|
Term
| What do you need to be aware of in sampling? |
|
Definition
| undercoverage, non-response bias, lying, wording of questions... |
|
|
Term
|
Definition
|
|
Term
| What is an observational study? |
|
Definition
| Studies where info is gathered on the population but nothing is inflicted on the subjects (Power lines). |
|
|
Term
|
Definition
|
|
Term
| Are samples: observational studies or experiments? |
|
Definition
|
|
Term
| Can observational studies establish causation? |
|
Definition
|
|
Term
| What is the placebo effect? |
|
Definition
| A patient's response to any (even fake) treatment is the placebo effect. |
|
|
Term
| What is a control group? Why are they important? |
|
Definition
| The patients or group that recieves the placbo or gets no treatment. It eliminates the effect of lurking variables. |
|
|
Term
|
Definition
| Having more than one experimental unit per treatment. Necessary to obtain a SD. The bigger the number, the more accurate we are. |
|
|
Term
| What do we need to be cautious of in experiments? |
|
Definition
| hidden bias and lack of realism. We take care of the first by treating every experimental unit identically and using a double blind. |
|
|
Term
|
Definition
| Neither the subjects nor the doctor know who is receiving the treatment and who is recieving the placebo. It removes bias! |
|
|
Term
| What is a matched pairs design? |
|
Definition
| Taking 2 measurements on each individual; Groups to be compared are related. If order of treatments has to be randomized, it's a matched pairs. |
|
|
Term
| Why do we like blocked designs of experiments? |
|
Definition
| More precise conclusions. Unwanted variation is removed from standard error. Or variation associated with blocking variable is removed from error. |
|
|