Term
Four things used to describe a distribution |
|
Definition
(1) the center of the distribution
(2) the spread of the distribution
(3) the shape of the distribution
(4) any unusual features in the distribution |
|
|
Term
|
Definition
Variables whose measurement vary in name or kind only, and cannot be ranked in any order of magnitude.
Pie Charts Bar graphs |
|
|
Term
|
Definition
Can be used to sort a large list of data
More importantly, also can be used to graphically display the distribution of the data so that the distribution can be described
perfect stem and leaf plot consist of 10-15 leafs |
|
|
Term
Extended Stem-and-Leaf Plot |
|
Definition
Split the stems into two parts. The first part uses a symbol to designate leaf digits beginning with 0 through 4, while the second part uses a symbol * to designate leaf digits beginning with 5 through 9 |
|
|
Term
Back-to-Back Stem-and-Leaf Plots |
|
Definition
Use a common column of stems, with one distribution displayed to the right of the stems (as we have been doing) and one distribution displayed to the left of the stems. |
|
|
Term
|
Definition
are a second graphical technique for displaying quantitative data so that the distribution can be described. Unlike the stem-and-leaf plot, a histogram does not retain the original data.
We will only consider histograms with equal class widths. |
|
|
Term
|
Definition
has the right and left sides of the distribution being mirror images of each other. One type of symmetric distribution is a bell-shaped curve called a normal distribution |
|
|
Term
|
Definition
General bell-shape, with a long tail to the left
X = mean less than M = median |
|
|
Term
Skewed Right Distribution |
|
Definition
General bell-shape, with a long tail to the right.
X = mean greater than M = median |
|
|
Term
|
Definition
A distribution with two significant peaks. |
|
|
Term
|
Definition
A distribution with three significant peaks. |
|
|
Term
|
Definition
include things that create distributions that are not symmetric (normal). This can include high concentrations of data, gaps in the distribution, and extreme values at the tails of the distribution (called outliers). |
|
|
Term
|
Definition
is an observation that stands out from the other observations (an extreme value) and that often creates skewed distributions. |
|
|
Term
|
Definition
is the most often used measure of central location, and will be used in many of the inference procedures we will discuss later in the course.
The population mean is denoted by the Greek letter m (read “mu”) and is the sum of all observations divided by how many individuals that there are in the population. This is (usually) an unknown parameter. |
|
|
Term
|
Definition
denoted by X (read “X-bar”).
X = S x = x1 + x2 + x3 + … + xn n n
The sample mean X is a statistic.
The symbol S implies to “sum” or “add” what follows
The mean is highly influenced by outliers (extreme values). |
|
|
Term
|
Definition
is more resistant to outliers than the mean, and is the central value with half of the observations less than it and half of the observations greater than it.
The population median is usually denoted by the Greek letter h (read “eta”), and is estimated by the sample median, denoted by M |
|
|
Term
|
Definition
1. Range = maximum value - minimum value
The range is a measure of overall variation, not variation around a central value. The range will be heavily influenced by outliers
2. Standard Deviation |
|
|
Term
|
Definition
is a measure of variability around the mean.
A deviation is the amount that an observation differs from the mean: x – X.
is denoted by s (read “sigma”).
Since all subjects of the population are rarely known, the population standard deviation is usually unknown and must be estimated by the sample standard deviation, denoted S. |
|
|
Term
|
Definition
is denoted by s2 (read “sigma squared”) and since the entire population is usually unknown the population variance is estimated using the sample variance s2 |
|
|
Term
Interquartile Range (IQR) |
|
Definition
Measures variability around the median. is resistant to outliers and may be a better measure of spread than the standard deviation if the distribution is skewed.
Lower Quartile: observation with 25% of the data less than it and 75% of the data greater than it. Denoted as Q1.
Upper Quartile: observation with 75% of the data less than it and 25% of the data greater than it. Denoted as Q3.
IQR = Q3 - Q1. |
|
|
Term
|
Definition
|
|
Term
|
Definition
Range Standard Deviation IQR |
|
|
Term
|
Definition
Symmetric Skewed Right Skewed Left Bimodal Trimodal |
|
|