Term
|
Definition
A countable subset of the population. A set of actual observations. When the population is uncountable, we draw a sample of observations from the population. |
|
|
Term
|
Definition
A complete set of events in which we are interested. Often uncountable, or infinite. |
|
|
Term
|
Definition
Numerical values summarizing sample data. |
|
|
Term
|
Definition
Numerical values summarizing population data. N.B. Statistics describe the SAMPLE data Parameters describe the POPULATION data |
|
|
Term
|
Definition
A sample in which every member of the population has an equal chance of inclusion in teh sample. If a sample is truly random, then statistics help us define the parameters of our population. |
|
|
Term
Measurement/Quantitative Data |
|
Definition
Data obtained by measuring objects or events. Uses some form of instrument for measuring the variable in question. |
|
|
Term
Categorical/Frequency/Count Data |
|
Definition
Statements that count the frequencies or totals for various categories. |
|
|
Term
|
Definition
Assigning numbers to objects |
|
|
Term
|
Definition
The characteristic of the relationship between objects and the numbers we assign them while measuring. Four kinds of "scales of Measurement" 1. nominal scale 2. ordinal scale 3. interval scale 4. ratio scale |
|
|
Term
|
Definition
Doesn't truly "scale" items in any dimension, but NAMES (labels) them. The numbers used have no real meaning other than differentiating between the items. ex: numbering football players by the numbers on their jersies. |
|
|
Term
|
Definition
a scale where numbers are used to place the items in order along a continuum. ex: class ranking. N.B. The numbers strictly order the values, but do not indicate the nature of the difference between the two. We can know who came first, who came second, but not the difference between their rankings. |
|
|
Term
|
Definition
A scale in which differences between scale points represent legitimate values. Equal distances between two objects represent equal values. Ex: the difference between 1 deg. and 11 deg. is the same as 12 deg. and 22 deg. But we don't know anything about the ratios between two scale points since the values are arbitrary. 40 degrees is not twice as hot as 20 degrees. |
|
|
Term
|
Definition
Has a TRUE ZERO. Temperature is NOT a ratio scale since 0 degrees is an arbitrary value about which numbers are assigned. The zero must represent some physical reality, ex: 0 km/hr Ratios are meaningful! 40 km/hr IS twice as fast as 20 km/hr. |
|
|
Term
|
Definition
properties of objects or events that take on different values. |
|
|
Term
|
Definition
randomly assigning participants to ensure a truly random sample |
|
|
Term
|
Definition
Indicates summation summation rules: sigma(X-Y) = sigma(X) - sigma(Y) sigma(CX) = Csigma(x) sigma(X+Y) = sigma(x) + sigma(Y) |
|
|
Term
|
Definition
A distribution in which values of the dependant variable are tabled or plotted against the frequency in which they occured. |
|
|
Term
|
Definition
A graphical display of a frequency distribution into a histogram. Preserves the actual values obtained, and visually represents the frequency in which they occured. |
|
|
Term
|
Definition
A graphical display of a frequency distribution into a histogram. Preserves the actual values obtained, and visually represents the frequency in which they occured. |
|
|
Term
EDA: Exploratory Data Analysis |
|
Definition
A set of techniques developed to present data in visually meaningful ways. ex. a histogram. |
|
|
Term
Leading/Most Significant digits |
|
Definition
leftmost digits of a number |
|
|
Term
|
Definition
The vertical axis of a stem and leaf display - given by the leading digits. |
|
|
Term
|
Definition
The vertical axis of a stem and leaf display - given by the leading digits. |
|
|
Term
Trailing/Less Significant digits |
|
Definition
digits to the right of the leading digits |
|
|
Term
Trailing/Less Significant digits |
|
Definition
digits to the right of the leading digits |
|
|
Term
|
Definition
horizontal axis of a stem and leaf display: contains the trailing or less significant digits for each leading digit on the stem. |
|
|
Term
|
Definition
A type of graph that accumulates adjacent values into intervals, and plots the intervals as rectangles with respect to the frequency of oberservation. |
|
|
Term
Real Limits (Upper and Lower) |
|
Definition
Lowest and highest possible values which could be classified as belonging to a given interval. ex: Interval 1. 25-29 2. 30-34 3. 35-39 The real lower limit of interval (2.) is 29.5 (half way between. 29&30) and 34.5 is the real upper limit. |
|
|
Term
|
Definition
The center of the interval - average of the upper and lower limits. |
|
|
Term
|
Definition
Extreme point that stands quite removed from the rest of the data. Often due to error. |
|
|
Term
|
Definition
distribution having the same shape about both sides of the center |
|
|
Term
|
Definition
A distribution having two distinct peaks |
|
|
Term
|
Definition
A distribution having a single peak |
|
|
Term
|
Definition
describes the number of meaningful peaks in a distribution |
|
|
Term
|
Definition
A measure of the degree to which a distribution is asymmetrical Positively skewed: distribution trails off to the right of the peak Negatively skewed: distribution trails off to the left of the peak |
|
|
Term
|
Definition
CUMULATIVE frequency counting as you move accross intervals from the outside in. (cumulatively start adding frequencies, beginning at both ends, until summations meet in the middle) |
|
|
Term
Measures of Central Tendancy |
|
Definition
various statistical measures used to describe where the middle of a data distribution lies |
|
|
Term
|
Definition
The most commonly occuring score, or the most populous interval If two adjacent terms have equal and greatest frequencies, then we average the two. If two nonadjacent terms share these properties, then the data is said to be bimodal |
|
|
Term
|
Definition
The score corresponding to the point having 50% of the observations below it, and 50% above, when displayed in numeric order. |
|
|
Term
|
Definition
describes where in an ordered series the median lies median location = (N+1)/2 ex: if N=83, then the median location is 42, meaning the location falls at the 42nd number in the ordered series. |
|
|
Term
|
Definition
most commen measure of central tendancy. the sum of the scores divided by the number of scores |
|
|
Term
|
Definition
the degree to which individual data points are distributed around the mean. |
|
|
Term
|
Definition
distance between the lowest and highest scores. may give a distorted image of the data due to outliers. |
|
|
Term
|
Definition
The range of the middle 50% of the data: distance between 25%ile and 75%ile. Avoids the range's problem of being dependant on outlying data. |
|
|
Term
|
Definition
samples that have had a certain amount of the data from each tail removed. statistics calculated from a trimmed sample are called trimmed statistics. |
|
|
Term
|
Definition
Taking the deviations from X and the mean, and averaging them to get a measure of average variance. DOESNT WORK!!! Will ALWAYS equal zero unless you take the absolute value of these deviations. The positive deviations cancel the negative deviations. |
|
|
Term
|
Definition
common measure of variance. obtained by AVERAGEING the sum of the SQUARED deviations about the mean rather than the absolute value of the deviations. |
|
|
Term
|
Definition
variance of the true population. usually estimated, rarely computed. |
|
|
Term
|
Definition
positive square root of the variance. The standard deviation for the sample variance is given by "s" and the standard deviation for the population variance is given by sigma. |
|
|
Term
|
Definition
A property of a statistic whose long-range average is not equal to the parameter it estimates. |
|
|
Term
|
Definition
graphical method used to represent the dispersion of a sample. |
|
|
Term
|
Definition
The points that cut off the bottom and top quarter of a distribution |
|
|
Term
|
Definition
range between the two hinges. (Inner Quartile Range) |
|
|
Term
|
Definition
In a box plot, a line joining the the hinge with the farthest data point whose distance from the hinge is NO MORE THAN 1.5 times the H-spread. |
|
|