Term
|
Definition
|
|
Term
|
Definition
AKA average. Most common descriptor of the center of data. |
|
|
Term
|
Definition
Midpoint of the data set. Half the values lie above it and half lie below. |
|
|
Term
|
Definition
The most frequent value in a data set. |
|
|
Term
|
Definition
Difference between highest and lowest value. |
|
|
Term
|
Definition
Marked split in values, each bundle is called a cluster. |
|
|
Term
|
Definition
Number of times a value appears in the data. |
|
|
Term
|
Definition
Division of data into four sets based on frequency. Used in boxplots. |
|
|
Term
|
Definition
The difference between the first and third quartiles in a data set. |
|
|
Term
|
Definition
How far the data is from the central value. Measured by standard deviation. |
|
|
Term
|
Definition
How strongly a value will impact the mean. IE, a very large or very small value will affect the mean more than a central value. |
|
|
Term
|
Definition
A measure of how distant values are from the center of the data. |
|
|
Term
Empirical Rule of Standard Deviation |
|
Definition
The rule states that in roughly bell-shaped data sets, 68% of values will fall within one SD, 95% will fall within two SDs, and 99.7% will fall within three. This is also known as the "68-95-99.7 Rule" |
|
|
Term
|
Definition
AKA standard score. A measure of the number of standard deviations a value is from the center. |
|
|
Term
|
Definition
The percent of values that are lower than an examined value. |
|
|
Term
|
Definition
Misuse of statistics in which the height of a histogram is correctly represented but the area is not. |
|
|
Term
|
Definition
This occurs when a conclusion based on individual groups of data is contradicted when the groups are combined. |
|
|
Term
|
Definition
|
|
Term
|
Definition
Fitting a mathematical expression to explain a paired data set. |
|
|
Term
|
Definition
If the explanatory and response variable increase and decrease together. |
|
|
Term
|
Definition
If the explanatory and response variable increase and decrease inversely. |
|
|
Term
Linear Correlation Coefficient |
|
Definition
AKA "r". Measure of how well the data fits a mathematically defined linear correlation. (A regression/equation) |
|
|
Term
|
Definition
1) Standardize the variables. (value minus center, divided by SD) 2) Multiply each standardized x value by its corresponding y value. 3) Divide the sum of those products by the number of terms minus one. |
|
|
Term
|
Definition
The difference between an expected value and the observed value. |
|
|
Term
|
Definition
AKA "SSE". Sum of the squared residuals in a set. A measure to compare how well a regression fits the data. |
|
|
Term
Least Squares Regression Line |
|
Definition
The line with the lowest SSE. This means it is the best possible linear fit for the data set. |
|
|
Term
|
Definition
If the explanatory variable is shown to effect the response variable. Be sure there is no lurking variable that better explains the correlation. |
|
|
Term
|
Definition
The difference between a value and the center of the data set. |
|
|
Term
|
Definition
The difference between the average response variable and the examined one that can be attributed to the explanatory variable. |
|
|
Term
|
Definition
The difference between explained deviation and total deviation. |
|
|
Term
Residual Standard Deviation |
|
Definition
Standard deviation calculated from the deviance between expected and observed values. |
|
|
Term
|
Definition
|
|
Term
Explanatory/Independent Variable |
|
Definition
Variable that explains a correlation, usually on the x-axis. |
|
|
Term
Response/Dependent Variable |
|
Definition
Variable that results from a correlation, usually on the y-axis. |
|
|
Term
Numerical/Quantitative Variable |
|
Definition
Variable that a number defines. (IQ, height, time, etc) |
|
|
Term
Categorical/Qualitative Variable |
|
Definition
Variable that a word or category describes. (eye color, major, name, etc) |
|
|
Term
|
Definition
Variable that can be anything within a range of values. (GPA, weight, etc) |
|
|
Term
|
Definition
Variable that is one of some number of set values. (siblings, shoe size, etc) |
|
|
Term
|
Definition
Categorical variable that has an inherent hierarchy of value. (Grades, business rating, military rank, etc) |
|
|
Term
|
Definition
Variable that is not immediately obvious that may lead to incorrect conclusions. |
|
|
Term
|
Definition
A value that is far removed from the rest of the data. Should only be removed if it is a mistake. |
|
|
Term
|
Definition
Has a hat on top, means the value expected based on a regression. |
|
|
Term
|
Definition
Extreme outlier that significantly changes the line of regression. |
|
|
Term
|
Definition
Frequency of values is greatest near the median and least at the extremes of the range. |
|
|
Term
|
Definition
The frequency of values is consistent across the entire range. |
|
|
Term
|
Definition
Frequency of values is least near the median and greatest at the extremes of the range. |
|
|
Term
|
Definition
Data is almost mirrored on each side of the central value. |
|
|
Term
Right Skewed Distribution |
|
Definition
Data is more frequent in lower values. (Long tail to the right) |
|
|
Term
|
Definition
Data is more frequent in higher values. (Long tail to the left) |
|
|
Term
|
Definition
|
|
Term
|
Definition
Graph that uses stacked dots to show frequency. (Most useful for small ranges with repeated values) |
|
|
Term
|
Definition
Table that sorts data based on 10's place. |
|
|
Term
|
Definition
Useful when one wants to call attention to the relative frequency of variables. |
|
|
Term
|
Definition
Common bar graph, each bar represents a value and its height represents that values frequency. All bars are the same width. |
|
|
Term
|
Definition
Special bar graph in which unranked categorical variables are listed from left to right in order of frequency. |
|
|
Term
|
Definition
A graph that uses uneven widths to represent ranges of values and the area of the bar to represent those value's frequency. |
|
|
Term
Steps of Drawing a Histogram |
|
Definition
1) Calculate the percentage of values in each group. 2) Find the height of each bar based on width. 3) Draw the histogram. |
|
|
Term
|
Definition
When data is graphically represented by four evenly divided (in terms of frequency) ranges called quartiles. |
|
|
Term
|
Definition
If a value is more than three times the interquartile range from the first or third quartiles, it's an outlier. If it's between 1.5 and three times the IQR, it's a potential outlier. |
|
|
Term
|
Definition
When the data of two dependent variables is represented on the same graph. |
|
|
Term
|
Definition
A graph of paired data represented by points. |
|
|
Term
|
Definition
Inversion of the graph to set the linear regression to a slope of zero. Helps determine if points are evenly distributed above and below. |
|
|