Term
Absolute value of deviation |
|
Definition
A measure of variability. The magnitude of deviation. The distance from the mean. |
|
|
Term
|
Definition
P(AUB) = P(A) + P(B) + P(A∩B) |
|
|
Term
|
Definition
aka Bar chart
Illustrates the distribution of categorical variables, which are usually placed on the x-axis, and percent relative frequency on the y-axis. Includes Pareto diagrams. |
|
|
Term
|
Definition
P(A|B) = [P(A∩B)] / [P(B)] |
|
|
Term
|
Definition
A distribution with two peaks. |
|
|
Term
|
Definition
aka Bernoulli distribution.
When there are two possible outcomes or a probability experiment: success and failure.
You have to use the nCr function on your calculator, simbolized by C.
P(X = x) = (n C x)px(1 - p)n-x |
|
|
Term
|
Definition
An illustration of the five-number summary. The top of the box is the third quartile and the bottom the first quartile. A line through it is the median. Whiskers extend to the maximum and minimum values. Outliers are plotted separtely. Useful for comparing groups. |
|
|
Term
|
Definition
aka Qualitative variable
A variable that falls into one of two or more distinct categories. May be displayed using a bar graph, Pareto diagram, or pie chart. |
|
|
Term
|
Definition
aka Chebyshev's theorem
The proportion of observations that lie within k standard deviations must be at least 1 - (1/k2) |
|
|
Term
|
Definition
A test that quantifies how strong evidence is. |
|
|
Term
|
Definition
aka Bins
Ranges of quantitative variables that data is sorted into when making a frequency table. Appropriate number and range of bins must be selected, with boundaries the same for each class. |
|
|
Term
|
Definition
Random selection of groups within a population, such as towns or households. Within every cluster, every individual is surveyed. Cuts down costs of sampling. |
|
|
Term
|
Definition
A permutation where the order of selection doesn't matter.
x is the number of items in the combination
n is the number of items x is selected from
Cnx = [n!] / [x!(n - x)!] |
|
|
Term
|
Definition
C
An event has not occured.
P(AC) = 1 - P(A) |
|
|
Term
Conceptually infinite population |
|
Definition
A population that is too large or too nebulous and it is practically impossible to list every member. Example: the mosquitoes of Southern Ontario. |
|
|
Term
|
Definition
|
An event that has occured given that another event has already occured.
P(A|B) is the probability of A, given that B has already occured. |
|
|
Term
|
Definition
Variables that are impossible to separate. Cannt study either one without the other being a lurking variable. |
|
|
Term
|
Definition
A sliding continuum of values. There are infinity number of fractions it can be divided into. May be bound to a certain range.
Example: time, weight, distance. |
|
|
Term
|
Definition
A group in an experiment that is exposed to all the same environmental factors excepting one; the variable which is being studied. |
|
|
Term
|
Definition
A measure of the linear relationship between x and y. |
|
|
Term
|
Definition
The number of data points in a class, pluss all the data points in lower classes. |
|
|
Term
|
Definition
The number of independent pieces of information used to estimate a quantity. |
|
|
Term
|
Definition
Plots and numerical summaries used to describe a data set. |
|
|
Term
|
Definition
A measure of variability. The value minus the mean. The sum of all deviations will always equal zero. |
|
|
Term
|
Definition
Having a countable number of possible values. May be infinite or bound to a certain range. Example: money. Can go up to infinity, but the smallest fraction it can be divided into is cents. |
|
|
Term
|
Definition
How often variables take on certain values. Includes symmetric, skewed, unimodal, bimodal, and multimodal. |
|
|
Term
|
Definition
A metho of illustrating data points. Every data point is individually plotted. |
|
|
Term
|
Definition
About 68% of observations lie within 1 standard deviation of the mean, about 95% within 2 standard deviations, and almost all within 3 stanard deviations. Does not apply to extremely skewed data. |
|
|
Term
|
Definition
Represented by a capital letter. A group of outcomes in the sample space. |
|
|
Term
|
Definition
The theoretical value of a mean variable. Not to be confused with the most likely value. The average if an experiment was done infinity times.
μ = E(x) = Σ x p(x) |
|
|
Term
|
Definition
Researchers impose conitions for the explanatory variable that are pre-existing. Well-designed, randomized experiments with a control group can show causal relationships if differences are significant. |
|
|
Term
|
Definition
The variable which we can control for. In an experiment or observational study individuals are categorized into groups. |
|
|
Term
|
Definition
Distribution skewed strongly to the right. |
|
|
Term
|
Definition
A population which is small enough for every member to be listed.
Example: U of G students. |
|
|
Term
|
Definition
aka 25th percentile
The bottom section of the box in a boxplot. Included in the five-number summary. |
|
|
Term
|
Definition
The minimum, the first quartile, the median, the third quartile, and the maximum. Illustrated with a boxplot. |
|
|
Term
|
Definition
The number of observations occuring in a category. |
|
|
Term
|
Definition
A table showing the frequency of categories in data. Use for making bar graphs and histograms. With histograms, data is sorted into classes. |
|
|
Term
|
Definition
The number of trials needed to get the first success in a binomial trial. Must be independent binomial trials with constant probability of success. Modelled by the probability mass function. |
|
|
Term
|
Definition
A measure of central tendency. The nth root of the product of observations.
(Πxi)(1/n) |
|
|
Term
|
Definition
A measure of central tendency. The reciprocal of the mean, using reciprocals of all observations.
n / [∑(1 / xi)] |
|
|
Term
|
Definition
An illustration of the distribution of a quantitative variable. Made using a frequency table. |
|
|
Term
Hypergeometric distribution |
|
Definition
Binomial distribution where the trials are not independent; the probability of outcomes is dependent on the results of previous trials.
You need to use the nCr function on a calcultor, symbolized by C.
X is the number of successes
a is the probability of a success
n is the sample size
N is the population size
P(X = x) = [(a C n)*((N - a) C (n - x))] / [N C n] |
|
|
Term
|
Definition
The occurance of an event has no effect on the probability of an another effect and vise versa.
All three must be true or all three false:
1. P(A∩B) = P(A)*P(B)
2. P(A|B) = P(A)
3. P(B|A) = P(B) |
|
|
Term
|
Definition
aka Unit
aka Case
Objects on which measurements are taken. |
|
|
Term
|
Definition
Investigating the relationship between variables. |
|
|
Term
Interquartile range (IQR) |
|
Definition
A descriptive measure of variance. The difference between the third and first quartile. Not sensitive to extreme values.
IQR = Q3 - Q1 |
|
|
Term
|
Definition
∩
One event and another event have occured together in the same sample point. |
|
|
Term
|
Definition
If you sample an infinitely large number of variales, you get the expected value and expected sample variance. |
|
|
Term
|
Definition
Conversions that are linear, such as the conversion between Celsius and Fahrenheit. |
|
|
Term
|
Definition
Variables that contribute to correlations, but are not included in the study. Researchers may be completely unaware of them. More likely in observational studie than in experiments. |
|
|
Term
|
Definition
The largest value in a dataset. The top line of a boxplot. |
|
|
Term
|
Definition
aka Average
The most popular measure of central tendency. Uses more information, but is more sensitive to extreme values in the data. This sensitivity can make the mean misleding
x bar = [Σxi] / n |
|
|
Term
Mean absolute deviation (MAD) |
|
Definition
The average absolute value of deviation. A reasonable measure of variability, but hard to work with.
MAD = [Σ|xi - x bar|] / n |
|
|
Term
|
Definition
aka Second quartile
aka 50th percintile
A measure of central tendency. The line in a boxplot separating the box. The middle point, if all data points were ordered in ascending order. If n is even, the median is the average ot the two middle values. Not as sensitive to extreme values as the mean. Good for data that is right-skewed, such as property value or salary. |
|
|
Term
|
Definition
A measure of central tendency. The midpont between the minimum and maximum values. |
|
|
Term
|
Definition
The smallest value in a dataset. The bottom line of a boxplot. |
|
|
Term
|
Definition
A measure of central tendency. The most frequenty occuring observation. |
|
|
Term
|
Definition
Distribution with multiple peaks. |
|
|
Term
|
Definition
P(A∩B) = P(A)*P(B|A) = P(B)*P(A|B) |
|
|
Term
Multivariate hypergeometric distribution |
|
Definition
Hypergeometric distribution where there more than two classifications of outcomes. |
|
|
Term
|
Definition
Evens where there is no outcome in the sample space that satisfies both.
P(A∩B) = 0 |
|
|
Term
|
Definition
Skewed to the left. Higher on the right. |
|
|
Term
|
Definition
Perfectly symmetrical distribution. Rare. |
|
|
Term
|
Definition
Researchers observe and measure variables, but do not impose any conditions on the subjects. The groups of explanatory variables are pre-existing.
Done if the experiments are impossible (time, money, ethical reasons). Doesn't provide strong evidence for causal relationships; there may be lurking variables. |
|
|
Term
|
Definition
Extreme values that fall from the overall pattern of distribution. Fall outside the range of boxplot whiskers. Plotted individually in a boxplot. |
|
|
Term
|
Definition
A measure of the strength of evidence. If the probability a result is false is less than 0.05 then the result is considered significant. |
|
|
Term
|
Definition
A numerical characteristic of a population. |
|
|
Term
|
Definition
A bar graph where the categories are sorted by percent frequency from largest to smallest. |
|
|
Term
Percent relative cumulative frequency |
|
Definition
The cumulative frequency expressed as a percent of all data points. The last class should have a percent relative cumulative frequency of 100%. |
|
|
Term
Percent relative frequency |
|
Definition
The relative frequency expressed as a percent. |
|
|
Term
|
Definition
The value of the variable that has p% of the ordered data values at or below this value. |
|
|
Term
|
Definition
An ordering of a set of items.
x is the number of things being ordered
n is the number of things x is selected from
Pnx = [n!] / [(n - x)!] |
|
|
Term
|
Definition
Illustrates the percent relative frequencies of categorical variables as slice-shaped areas on a circle. |
|
|
Term
|
Definition
When events occur independently over a range. The probability of an event within any given range of a certain size does not change.
X is the number of events in a fixed range
x is a positive integer
λ is the theoretical mean of events in a fixed range
P(X = x) = [λxe-λ] / [x!] |
|
|
Term
|
Definition
The set of individuals or objects of interest to an investigator. |
|
|
Term
|
Definition
A parameter. The average of all individuals in a population. |
|
|
Term
|
Definition
Skewed to the right. Skewed distribution that is higher on thhe left. |
|
|
Term
|
Definition
The propotion of times that the outcome would occur in an infinite number of trials. |
|
|
Term
|
Definition
We don't know what is going to happen in any one individual trial, but we can keep traack of the long-run distribution of outcomes. |
|
|
Term
Probability Mass Function (PMF) |
|
Definition
Used to calibrate the probability a success will occur after a certain number of trials.
P(X = x) = p*(1 - p)x - 1
P(X ≤ x) = 1 - (1 -p)x |
|
|
Term
|
Definition
A variable that falls onto a sliding continuous scale of values. |
|
|
Term
|
Definition
Specific percentiles. Useful descriptive measures of the distribution of data. Used in the construction of boxplots. Includes the first, second, and third quartiles. |
|
|
Term
|
Definition
A software program that is used for statistics. |
|
|
Term
|
Definition
Ensures that we avoid systematic bias in the samples. |
|
|
Term
|
Definition
A measure of variability. The maximum value minus the minimum value. Does not provide much information. |
|
|
Term
|
Definition
Frequency divided by n. The proportions of observations in a category. |
|
|
Term
|
Definition
The variable of interest in an experiment; what we look for changes in. |
|
|
Term
|
Definition
A subset of individuals selected from a population. |
|
|
Term
|
Definition
A statistic. The average of all observations in a sample. |
|
|
Term
|
Definition
Individual outcomes of probability experiments. Exclusive; no two points can occur on the same trial. |
|
|
Term
|
Definition
A list of all possible outcomes of a probability experiment. Exhaustive; there are no possible outcomes not included in the sample space. |
|
|
Term
|
Definition
A measure of variability. The average squared deviation. Will give an answer in units squared.
s2 = [Σ(xi - x bar)2] / (n - 1) |
|
|
Term
|
Definition
A bar chart with data for categories is represented by bars side by side to one another. |
|
|
Term
Simple Random Sampling (SRS) |
|
Definition
One of the simplest and most important types of random sampling. Each individual in the population has the same likelihood of being selected for the sample. |
|
|
Term
|
Definition
When the distribution is stretched off to one side. Includes positive and negative skewedness. |
|
|
Term
|
Definition
A meaure of variability. The square of deviation. |
|
|
Term
|
Definition
A bar chart where categories are represented by stacking bars on top of each other. |
|
|
Term
|
Definition
The squared root of variance. Cannot be negative.
s = √s2 |
|
|
Term
|
Definition
A numerical characteristic of a sample. |
|
|
Term
|
Definition
Making statements about population parameters based on sample statistsics. |
|
|
Term
|
Definition
In a stemplot, groups of data based on the second to last digit in the data points (each data point is written to the same number of decimal points). The stems are listed in ascending order in a column, and the leaves going off to the right. |
|
|
Term
|
Definition
aka Stem-and-leaf display
A way of illustrating quantified variable data. The data is sorted into stems and leaves based on the last two digits. The leaves are listed as single digits (the last digit in the data point) to the right of their stem. Must include a legend for the stems. Includes split-stem and back-to-back stemplots. |
|
|
Term
|
Definition
Groups from which samples are taken in stratified random sampling. |
|
|
Term
Stratified random sampling |
|
Definition
The population is divided into strata and random samples are taken from each strata. |
|
|
Term
|
Definition
aka Bell-shaped distribution
Distribution that is roughly the same on either side of the median. Includes normal distribution. |
|
|
Term
|
Definition
Determines if there is a significant difference between variables. If there is, there is a large likelihood that there is a correlation between variables. |
|
|
Term
|
Definition
aka 75th percentile
The top line of a boxplot. |
|
|
Term
|
Definition
A measure of central tendency. A certain percentage of the largest and smallest observations are omitted from calculations, resulting in a mean less sensitive to extreme values. |
|
|
Term
|
Definition
Distribution that is constant over the entire range. |
|
|
Term
|
Definition
Distribution with one peak. |
|
|
Term
|
Definition
U
One event or another event has occured in one sample point. |
|
|
Term
|
Definition
The dispersion of a variable.
Var(x) = E * [(x - μ)2] = E * (x - μ)2 * p * x |
|
|
Term
|
Definition
When individuals volunteer themselves to be included in a sample. Results tend to be biased; measuring statistics of people who would volunteer. |
|
|
Term
|
Definition
An extension up and down from a boxplot indicating the minimum and maximum values if they lie within 1.5 of the length of the box; values outisde this range are outliers. |
|
|
Term
|
Definition
Distribution with a peak near the left and skewed towards the right. |
|
|
Term
|
Definition
A measure of central tendency. A mean where some observations are given more weight in calculations. |
|
|
Term
|
Definition
A unitless measure of how many standard deviations a point is away from the mean. Positive means above the mean, negative means below.
zi = [xi - x bar] / s |
|
|