Term
|
Definition
The process of making an educated guess about a probable value or the result of that guess. |
|
|
Term
|
Definition
A subset of statistics composed of methods that use descriptive statistics to make educated guesses (infer) about population values. |
|
|
Term
|
Definition
Data that has been observed or created via experimentation; real data, not theoretical. |
|
|
Term
|
Definition
The scientific discipline that investigates a wide variety of methods for extracting information from data. |
|
|
Term
|
Definition
The result of ascertaining the size or extent or boundaries of some item of interest using a scaled instrument in units that are commonly understood. |
|
|
Term
|
Definition
A value or definition that describes a characteristic of a sample. |
|
|
Term
|
Definition
A subset of statistics composed of methods that seek to summarize, describe and visualize raw data in order to extract data from it. |
|
|
Term
|
Definition
With reference to sorting observations into classes or categories, the property that each observation can fit into one and only one defined category or class. See collectively exhaustive. |
|
|
Term
|
Definition
A collection of facts often about different individuals and often about different characteristics of those individuals. |
|
|
Term
|
Definition
A method of sampling used on a population that is divided into mutually exclusive, collectively exhaustive subgroups by some characteristic of interest (strata). The full sample is composed of simple random samples from each subgroup. |
|
|
Term
|
Definition
Measurement error due to respondents giving false information. |
|
|
Term
|
Definition
A subset of all the defined items of interest in existence in the universe of the question asked. |
|
|
Term
|
Definition
Observations which record the presence or absence of a particular characteristic in the individual from which information is sought; for example, an individual's marital status, gender or national origin. |
|
|
Term
|
Definition
A list of observations that were measured and which could take on any possible value in an interval, even if the technology is not available to measure the infinitely small differences. |
|
|
Term
|
Definition
A method of selecting individuals for a sample which is as simple and inexpensive as possible, a sample which is easily found. |
|
|
Term
|
Definition
A sampling method where the likelihood of any member of the population being selected for a sample is unknown. |
|
|
Term
|
Definition
Something that changes; in statistics it usually refers to a particular characteristic of the individual or item studied that varies from individual to individual. |
|
|
Term
|
Definition
Non probability sampling. |
|
|
Term
|
Definition
With reference to sorting observations into classes or categories, the property that each observation has a defined category or class into which it fits. See mutually exclusive. |
|
|
Term
|
Definition
Selecting a group of individuals for study based on the opinion of an "expert;" a non-probability sample. |
|
|
Term
|
Definition
|
|
Term
|
Definition
Observations from ranking systems. The underlying concept of interest is qualitative, however, the inclusion of a ranking implies differences in the "amount" of the quality in the individual. For example, if ranking the satisfaction of one's contact with an account manager at AT&T one might select from choices such as "Very Poor," "Poor," "Acceptable," "Good" or "Very Good." |
|
|
Term
|
Definition
Data that is the result of a count and is represented by non-negative integers. |
|
|
Term
|
Definition
Numbers the value of which is determined by chance alone. |
|
|
Term
|
Definition
All defined items of interest in existence in the universe of the question asked. |
|
|
Term
|
Definition
A type of probability sampling where the likelihood of any member of the population being selected for a sample is known and equal. |
|
|
Term
|
Definition
|
|
Term
|
Definition
A specific, individual entity of interest; a person, a firm, a DVD. |
|
|
Term
|
Definition
A recording of information about specific characteristics of an entire population. |
|
|
Term
|
Definition
A method of pseudo random sampling in which locations are randomly selected and, because the individuals of interest tend to gather or cluster at specific, known locations, all of the individuals at the selected locations are measured or sampled. |
|
|
Term
|
Definition
Observations from counts or measurements that are reported as numbers and on which mathematical operations can be performed. |
|
|
Term
|
Definition
A sampling method where the likelihood of any member of the population being selected for a sample is known. |
|
|
Term
|
Definition
Error introduced into a sample by the use of leading or ambiguous questions, the tone used by an interviewer which influences a respondent to make a particular choice, or respondent reporting false information. |
|
|
Term
|
Definition
A value or definition that describes a population; a fact about a population. |
|
|
Term
|
Definition
Information gathered by defining qualitative characteristics as having one of a number of specific subset qualities, each identified by a descriptive label or name. For example, a question about "Marital Status" would anticipate the respondent to select from the most accurate category of "Never Married," "Married," "Divorced" or "Other." |
|
|
Term
|
Definition
A "datum" is a single fact; data is a collection of similar facts about different individuals. |
|
|
Term
|
Definition
A sampling method used on a population which can be listed completely and/or may have some inherent linear order. The first item is selected at random then subsequent items are selected at pre-determined fixed intervals starting with the first item. |
|
|
Term
|
Definition
A data set which has been sorted in some orderly manner, most often by value from smallest to largest value. |
|
|
Term
|
Definition
A "line" chart of a quantitative variable which is created by plotting class marks and the frequencies associated with each class. Frequencies are on the Y-axis and class marks are along the X-axis. By definition, such a chart is "closed" indicating that the line connecting the points begins at (0,0) and ends at (X,0), where X is the class mark of the class which would follow the largest class to have a non-zero frequency. |
|
|
Term
|
Definition
A simple count of the items of interest. |
|
|
Term
|
Definition
An extreme or unusual observation. By definition in this course a value that is more than 3 standard deviations away from the mean of the data set. |
|
|
Term
|
Definition
The frequency of an event or value expressed as a percent of the whole or as a proportion; the absolute frequency divided by the total number of observations in the data set. |
|
|
Term
|
Definition
Refers to a visual representation which, when divided at the midpoint, creates one side which is the mirror image of the other. |
|
|
Term
|
Definition
A graphical display of a quantitative data set that orders the data sequentially then presents the data in a series of rows. Each row has a common "stem" value which is stated at the beginning of the row. A vertical line separates the common stem from the list of final digits from each observation with that stem (leaves), also ordered sequentially. |
|
|
Term
|
Definition
The idea of how much observations in a given data set are alike or different from one another, or how much they vary. Often referred to as "spread" (the idea of how much 'territory" the data set covers), but can also be refered to as homogeneity, stability, and other ideas which represent similarity, difference or variation. |
|
|
Term
|
Definition
A table which summarizes a quantitative variable, composed of rows of classes accompanied by frequencies which represent the occurrence of values of the variable found in a particular class. |
|
|
Term
|
Definition
A term that refers to how symmetric a distribution is and which may include other information about distinctive visual characteristics of the distribution, for example, that the distribution is bimodal. |
|
|
Term
|
Definition
All digits of an observation placed into a stem-and-leaf plot except the last. If an observation was 589, the stem would be 58. |
|
|
Term
|
Definition
The inclination of a data set to either cluster around a value on the number line, to have an easily located "half-way point" or to find a visual balance point. |
|
|
Term
|
Definition
A "column" chart of a quantitative variable which has classes along the X-axis and frequencies along the Y-axis. The data are represented by columns that have the same width, which is the class interval, and the height corresponding to the frequency. |
|
|
Term
|
Definition
A data set that is not symmetric, specifically, one that has at least one unusually small observation which draws the mean down to a value lower than the median. Also referred to as "negative skewed." Such data would have a negative coefficient of skewness. |
|
|
Term
|
Definition
The last digit of an observation placed into a stem-and-leaf plot. If an observation was 589, the leaf would be 9. |
|
|
Term
|
Definition
A "line" chart of a quantitative variable which is created by plotting upper class limits and the cumulative frequencies associated with each class. Frequencies are on the Y-axis and class limits are along the X-axis. By definition, such a chart begins at (0,0). |
|
|
Term
Cumulative Relative Frequency |
|
Definition
As one moves through a table from top to bottom, a cumulative relative frequency is the relative frequency of the current class plus the sum of the relative frequencies of all previous classes. |
|
|
Term
|
Definition
A count or similar value which represents how often a specific event or value occurs. |
|
|
Term
|
Definition
A chart of categorical data in which each category is represented by a wedge of a circle representing the entire data set. The size of the wedge conforms to the relative frequency of the items in a particular category. |
|
|
Term
|
Definition
A specialized "column" chart of categorical data which has categories arranged on the X-axis from highest frequency on the left to lowest frequency on the right. |
|
|
Term
|
Definition
As one moves through a table from top to bottom, a cumulative frequency is the absolute frequency of the current class plus the sum of the absolute frequencies of all previous classes. |
|
|
Term
|
Definition
A numerical category created by specifying an interval along a number line, such as the interval from 0 to 1, or from 5 to 10. |
|
|
Term
|
Definition
The end points of a class which specify exactly which values fit into the class. |
|
|
Term
|
Definition
A graphical representation of paired data, with one variable on the X-axis and the other on the Y-axis. The specific xi and yi values for a particular individual form the cartesian coordinates for one point on the graph. |
|
|
Term
|
Definition
A complete description of a variable, achieved by plotting a graph of its values, by stating a mathematical function which describes the variable, or by stating a measure of center, dispersion and shape of the variable. |
|
|
Term
|
Definition
The midpoint of the class, the mean of the class limits. |
|
|
Term
|
Definition
A graphical representation of categorical data, in which each column or bar represents a specific category and its height (column) or length (bar) represents the frequency of data which fall into that category. (Columns are "vertical bars.") Such frequencies can be absolute or relative. The columns or bars are the same width, but are not so wide as to touch the column or bar of the category which follows, thus, there is space along the axis between each bar or column. |
|
|
Term
|
Definition
The distance from the class' lower limt to its upper limit. |
|
|
Term
|
Definition
A data set that is not symmetric, specifically, one that has at least one unusually large observation which draws the mean up to a value higher than the median. Also referred to as "positive skewed." Such data would have a positive coefficient of skewness. |
|
|
Term
|
Definition
A distribution with more than one modal value. In general, it refers to a graphical representation of a data set which has two "local" peaks, which do not have to be of the same height but must be higher than the surrounding area. |
|
|
Term
|
Definition
The class in a frequency distribution which contains the median value, often determined as that class for which the cumulative relative frequency crosses 0.5. |
|
|
Term
|
Definition
Specifically, the difference between the value of an observation and the mean of the observation's data set, (xi - µ). "Deviation" generally refers to the difference between an observation and some central value. |
|
|
Term
|
Definition
Data which has been put into a frequency distribtuion but for which the actual observed values are not available. |
|
|
Term
|
Definition
The arithmetic average value of a population. |
|
|
Term
|
Definition
The physical midpoint of an ordered array. |
|
|
Term
|
Definition
A statistic is said to be resistant if it is not influenced by extreme values. See sensitive. |
|
|
Term
|
Definition
A statistic is said to be sensitive if it is influenced by extreme values. See resistant. |
|
|
Term
|
Definition
The most frequently occuring value in a data set, of which there may be more than one. |
|
|
Term
|
Definition
A distribution with more than one modal value. In general it refers to a graphical representation of a data set which has more than one "local" peak; the peaks do not have to be of the same height but must be higher than the surrounding area. |
|
|
Term
|
Definition
A method of calculating the mean of a data set, where each particular value in the set is "weighted" by how often it appears in the data set. This method is particularly useful for data sets that have many repeated values. The result is identical to the arithmetic average of the data set. |
|
|
Term
|
Definition
The arithmetic average of a sample. |
|
|
Term
|
Definition
The class in a frequency distribution which has the highest frequency. |
|
|
Term
|
Definition
The average squared distance between all observations in a data set and their mean; a measure of dispersion. |
|
|
Term
|
Definition
The value in a data set below which fall a specified percentage of the observations in a data set. |
|
|
Term
|
Definition
The sum of the absolute value of the deviations in a data set, divided by the number of deviations, a measure of dispersion. |
|
|
Term
|
Definition
The difference between the first and third quartiles of a data set, specifically, Q3 - Q1; a resistant measure of dispersion. |
|
|
Term
|
Definition
The values below which fall 25, 50 and 75 percent of the observations in a data set. See percentiles. |
|
|
Term
|
Definition
The difference between the maximum and minimum values of a data set, a measure of dispersion. |
|
|
Term
|
Definition
The average distance between all observations in a data set and their mean; the square root of the variance; a measure of dispersion. |
|
|
Term
|
Definition
A unit-free measure of dispersion which expresses the standard deviation of a data set as a percentage of the mean of the data set. |
|
|
Term
|
Definition
The same, possessing the same qualities. |
|
|
Term
|
Definition
|
|
Term
|
Definition
A data set that is not symmetric, specifically, one that has at least one unusually small observation which draws the mean down to a value lower than the median. Also referred to as "left-skewed." Such data would have a negative coefficient of skewness. |
|
|
Term
|
Definition
A data set that is not symmetric, specifically, one that has at least one unusually large observation which draws the mean up to a value higher than the median. Also referred to as "right-skewed." Such data would have a positive coefficient of skewness. |
|
|
Term
|
Definition
The theorem provides a method for predicting the minimum percent of values that will fall within plus and minus a selected number of standard deviations from the mean. The number of standard deviations must be greater than 1. The theorem applies to any data set for which a mean and a standard deviation can be calculated. Sometimes spelled "Tchebycheff" or "Chebychev." |
|
|
Term
|
Definition
A method for approximating broad probabilities for bell-shaped, symmetric distributions. Same as Normal Rule. |
|
|
Term
|
Definition
A method for approximating broad probabilities for bell-shaped, symmetric distributions. Same as Empirical Rule. |
|
|
Term
|
Definition
Different, possessing different qualities. |
|
|
Term
|
Definition
The rate of change of a line, specifically the change in Y for a unit change in X. |
|
|
Term
|
Definition
The line which results from minimizing the sum of squared error when error is defined as the distance between an observed yi value and the predicted y'i value at the same xi. |
|
|
Term
|
Definition
A unit-free measure of the strength of a possible linear relationship between a pair of variables. |
|
|
Term
|
Definition
A relationship between two variables where one variable (the causal, independent, regressor, predictor or explanatory variable) directly produces an effect on the other variable (the response, dependent, predicted or explained variable.) |
|
|
Term
|
Definition
A measure of the strength of a possible linear relationship between a pair of variables. |
|
|
Term
|
Definition
Left over; estimated error. |
|
|
Term
|
Definition
False, counterfeit or artificial. |
|
|
Term
|
Definition
The value on the Y-axis where a given line crosses the axis. |
|
|
Term
|
Definition
The variable in a causal relationship that is being effected by the other variable or other variables. |
|
|
Term
|
Definition
Describing or pertaining to a line; a function of variables that does not exceed the first degree. |
|
|
Term
|
Definition
The variable or variables in a causal relationship that effect the dependent variable. |
|
|
Term
|
Definition
A subset of outcomes of interest, for example, if rolling a die, getting an even number is a possible event, which consists of the subset 2, 4, 6. |
|
|
Term
|
Definition
A mathematical statement of the relative certainty or uncertainty that an event will occur. |
|
|
Term
|
Definition
The probability of at least one event of multiple events occurring can be calculated by summing the probabilities of the individual events and subtracting the probability of any intersection or overlapping of the events. |
|
|
Term
|
Definition
When outcomes are mutually exclusive, the joint probability of those outcomes is the sum of their individual probabilities. |
|
|
Term
Classical Probability Approach |
|
Definition
The approach to probability that derives probabilties of events using theory or mathematics. |
|
|
Term
General Law of Multiplication |
|
Definition
The joint probability of two events can be calculated by multiplying the probability of one given the other has occurred with the probability of the other event, P(A & B) = PA|B)*P(B). For example, if a restaurant notes that 70% of its customers use mustard (P(B))and that 55% of those who use mustard also use ketchup (P(A|B)), the probability that a customer uses both mustard and ketchup (P(A&B)) is 55%*70% = 38.5%. |
|
|
Term
|
Definition
The likelihood of an event given that another event has happened or is true. |
|
|
Term
|
Definition
The likelihood of a single event occurring, a marginal probability. |
|
|
Term
|
Definition
Describes two events when the fact of the occurrence of one has no effect on the probability that the other will occur. |
|
|
Term
|
Definition
The probability that two or more events happen simultaneously or occur in the same subject. |
|
|
Term
Special Law of Multiplication (Independent Events) |
|
Definition
When two events are independent, the general law reduces to this special law of multiplication: joint probabilities of two independent events are the product of their simple probabilities. |
|
|
Term
|
Definition
A cross-tabulation of two variables measured from the same set of individuals, rather like a merging of two frequency distributions. Categories or classes are designated horizontally for one variable and vertically for the other, creating "cells" which represent one horizontal category and one vertical category jointly. The observations are sorted and placed in appropriate cells based on their particular combination of categories. |
|
|
Term
|
Definition
A unique group of observations drawn from a large set. The order of the observation values is irrelevant, thus the same observations drawn in a different order would not be a new combination. |
|
|
Term
|
Definition
The list of all possible results of an experiment which depends on chance to determine outcomes. |
|
|
Term
|
Definition
To complete, or that which completes. In terms of probability the complement is the subset of all outcomes which are NOT part of an event which has been defined as being of interest. For example, if when rolling a die, the event of interest is getting an even number and consists of outcomes 2, 4 and 6, the complement is the subset of the odd numbers, 1, 3 and 5. |
|
|
Term
|
Definition
Outcomes which have no effect on another outcome occurring. |
|
|
Term
|
Definition
Describes two events when the fact of the occurrence of one affects the probability that the other will occur. |
|
|
Term
|
Definition
The likelihood of a single event occurring, a simple probability. |
|
|
Term
|
Definition
The subset of outcomes which belong to more than one event. |
|
|
Term
|
Definition
Classical or empirical probabilities, those which do not depend on the opinion of a person or persons. |
|
|
Term
|
Definition
A test or trial in which at least some of the conditions are under the control of the observer, conducted to gather evidence of behavior or specific results. |
|
|
Term
|
Definition
Probabilities that are derived at least partly based on the opinion of an expert or experts. |
|
|
Term
Empirical Probability Approach |
|
Definition
The approach to probability that derives probabilties of events from experiment or observations of real events. |
|
|