Term
|
Definition
A common set of stems is placed in the middle of the display with leaves branching out in either direction to the left and right. |
|
|
Term
|
Definition
A graphical display similar to a dotplot/stemplot but are more feasible with very large datasets (more flexibility than stemplots). |
|
|
Term
How to construct a histogram |
|
Definition
1) Divide the range of data into bins of equal length 2) Count the number of observational units in each bin 3) Construct bars whose height correspond to the frequency/relative frequency. |
|
|
Term
|
Definition
Sub-intervals of equal length in a histogram |
|
|
Term
|
Definition
The number of observational units in each subinterval |
|
|
Term
Relative Frequency(Histogram) |
|
Definition
Proportions of observational units in the subintervals. |
|
|
Term
What is the difference between histograms and bargraphs? |
|
Definition
A histogram (and stem/dot plots) displays the distribution of a quantitative variable, and a bar graph displays the distribution of a categorical variable. |
|
|
Term
What direction is a skew? |
|
Definition
|
|
Term
What do you have to describe when describing distribution? |
|
Definition
Center, spread (clusters and gaps) , shape and outliers, plus conclusions |
|
|
Term
The two common ways to measure the center of a distribution are... |
|
Definition
|
|
Term
|
Definition
Ordinary arithmetic average, the balance point of the distribution. (Pulled in the direction of the longer tail) |
|
|
Term
|
Definition
Middle observation once the values are in order. |
|
|
Term
|
Definition
The u tail symbol for a population and the x with a line above it for a sample. |
|
|
Term
|
Definition
Sometimes the third quantity used as a measure of center: often not useful, however, because the values might not repeat or the mode may not be near the center of the distribution at all. Applies with both types of variables. |
|
|
Term
|
Definition
A measure whose value is relatively unaffected by the presence of outliers in a distribution. (Ex: Median, NOT mean) |
|
|
Term
|
Definition
A very simple, but not particularly useful, measure of variability. When mentioning both range and IQR, use only one numerical value. |
|
|
Term
|
Definition
Intuitively sensible measure of spread, but not used widely. Calculated by: The average of the absolute value of the difference of each data point from the mean. |
|
|
Term
|
Definition
The most widely used of the measures of variability for quantitative data. To computer a sample standard deviation, calculate the sum of all the absolute valued differences between data points and their mean, divide it by one less than the sample size, and finally square root it. (The typical distance that a data value in the distribution deviates from the mean of the sample) |
|
|
Term
When there's an odd number of observations, does the quartiles include the mean? |
|
Definition
|
|
Term
|
Definition
1) Bumpier histograms doesn't equal more variable concepts 2)Focus on the variability int he horizontal values and not the frequencies 3)The number of distinct values doesn't necessarily indicate variability |
|
|
Term
|
Definition
With MOUND-SHAPED distributions, 68% of observations fall within one standard deviation of the mean, approximately 95% fall within two standard deviations of the mean, and 99.7% of the values fall within three deviation of the mean. |
|
|
Term
|
Definition
The process of standardization; calculate by subtracting the mean from the value of interest, then divide by the standard deviation. This indicates how many SDs above/below the mean a particular value falls. |
|
|
Term
What are the measures of spread? |
|
Definition
Range, interquartile range, standard deviation |
|
|
Term
What calculations should you use for each type of shape? |
|
Definition
Skewed: Median, IQR Symmetrical: Median, IQR, Mean, SD |
|
|
Term
|
Definition
The median, quartiles, and extremes (min + max values). Provides a quick and convenient description of where the four quarters of the data in a distribution fall. |
|
|
Term
|
Definition
The basis of a boxplot is the FNS and are useful for comparing distributions of quantitative variables across two or more groups. Only tell you about percentages, and not the individual values or sample size. |
|
|
Term
|
Definition
Convey additional information by treating outlier differently. On these graphs, mark outliers using an asterisk, then extend the whiskers only to the minimum non-outlier value, and NOT the most extreme possible value. (Always create this one) |
|
|
Term
How to check for outliers? |
|
Definition
Multiply the IQR by 1.5 and add it to the third quartile and subtract it from the first. Any values above/below these two respectively are outliers. |
|
|
Term
When comparing variables... |
|
Definition
Draw them into a common scale |
|
|