Term
|
Definition
Using mathematical models of uncertainty to answer questions (connecting probability concepts to our data) |
|
|
Term
Exploratory Data Analysis |
|
Definition
Analysis that relies heavily on plotting data and looking looking for patterns that suggest interesting conclusions or questions for further study; however, it can rarely provide convincing evidence for its conclusions alone. |
|
|
Term
|
Definition
Data taken by simply drawing conclusions from our own experience, making no use of more broadly representative data (a magazine article claiming men need pilates more than women, e.g.). It is not a sound basis for drawing conclusions. |
|
|
Term
|
Definition
Based on haphazardly selected individual cases, which often come to our attention because they are striking in some way. These cases need not be representative of any larger group of cases. |
|
|
Term
|
Definition
Data that were produced in the past for some other purpose but that may help answer a present question. |
|
|
Term
|
Definition
The usual tool for answering questions such as, "how have the attitudes of Americans, on issues ranging from abortion to work changed over time? |
|
|
Term
|
Definition
A SAMPLE group is used to study a part in order to gain information about the whole POPULATION. |
|
|
Term
|
Definition
An attempt to contact every individual in the entire population (as opposed to taking data from a sample, which is much more time-efficient). |
|
|
Term
|
Definition
A study in which we observe individuals and measure variables of interest but do not attempt to influence the responses. Even when based on a statistical sample, it is a poor way to determine what will happen if we change something. |
|
|
Term
|
Definition
A means of creating data by deliberately imposing some treatment on individuals and observing their responses. In principle, experiments can give good evidence for causation. |
|
|
Term
|
Definition
The best way to see the effects of a change--where we actually impose the change in an experiment. |
|
|
Term
|
Definition
Mix up with. E.g., we say that the effect of child care on behavior is confounded with other characteristics of families who use child care. |
|
|
Term
|
Definition
Answers specific questions with a known degree of confidence. Based on statistical techniques for producing data. |
|
|
Term
Experimental Units/Subjects/Treatment |
|
Definition
The individuals on which the experiment is done are the EXPERIMENTAL UNITS. When the units are human beings, they are called SUBJECTS. A specific experimental condition applied to the units is called a treatment. |
|
|
Term
|
Definition
The explanatory variables in an experiment. Many experiments study the joint effects of several factors. In such an experiment, each treatment is formed by combining a specific value (often called a LEVEL) of each of the factors. |
|
|
Term
|
Definition
The response to a dummy treatment. |
|
|
Term
|
Definition
The group of subjects who receive a sham treatment. It enables us to control the effects of outside variables on the outcome. Comparison of several treatments in the same environment is the simplest form of control. |
|
|
Term
|
Definition
If a study systematically favors certain outcomes. |
|
|
Term
|
Definition
The use of chance to divide experimental units into groups. |
|
|
Term
Principles of Experimental Design |
|
Definition
1) COMPARE two or more treatments. This will control the effects of lurking variables on the response. 2) RANDOMIZE--use impersonal chance to assign experimental units to treatments. 3) REPEAT each treatment on many units to reduce chance variation in the results. |
|
|
Term
|
Definition
An observed effect so large that it would rarely occur by chance is called STATISTICALLY SIGNIFICANT. |
|
|
Term
|
Definition
A way to randomize without software. It is a list of the digits 0,1,2,3,4,5,6,7,8,9 in which: 1) The digit in any position on the list has the same chance of being any one of 0,1,2,3,4,5,6,7,8,9 2) The digits in different positions are independent in the sense that the value of one has no influence on the value of any other. |
|
|
Term
Completely Randomized Design |
|
Definition
When all experimental units are allocated at random among all treatments. They can compare any number of treatments. The treatments can be formed by levels of a single factor or by more than one factor. |
|
|
Term
|
Definition
When neither the subjects themselves nor the medical personnel who worked with them new which treatment any subject had received. It avoids unconscious bias by, for example, a doctor who doesn't think that "just a placebo" can benefit a patient. |
|
|
Term
|
Definition
When the subjects or treatments or setting of an experiment may not realistically duplicate the conditions we really want to study (this is a serious potential weakness of experiments). |
|
|
Term
|
Definition
Compares just two treatments. The subjects are matched in pairs. For example, an experiment to compare which two advertisements for the same product might use pairs of subjects with the same age, sex, and income. The idea is that matched subjects are more similar than unmatched, so comparing responses within a number of pairs is more efficient than comparing the responses of groups of randomly assigned subjects. |
|
|
Term
|
Definition
A BLOCK is a group of experimental units or subjects that are known before the experiment to be similar in some way that is expected to affect the response to the treatments. In a BLOCK DESIGN, the random assignment of units to treatments is carried out separately within each block. |
|
|
Term
|
Definition
The proportion of the original sample who actually provide usable data. |
|
|
Term
Voluntary Response Sample |
|
Definition
Consists of people who choose themselves by responding to a general appeal. Voluntary response samples are biased because people with strong opinions, especially negative opinions, are most likely to respond. |
|
|
Term
Simple Random Sample (SRS) |
|
Definition
An SRS of size n consists of n indiciduals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected. |
|
|
Term
|
Definition
A sample chosen by chance. We must know what samples are possible and what chance, or probability, each possible sample has. |
|
|
Term
|
Definition
To select a STRATIFIED RANDOM SAMPLE, first divide the population into groups of similar individuals, called STRATA. Then choose a separate SRS in each stratum and combine these SRSs to form the full sample. |
|
|
Term
|
Definition
Select successively smaller groups within the population in stages, resulting in a sample consisting of clusters of individuals. Each stage may employ an SRS, a stratified sample, or another type of sample. |
|
|
Term
Undercoverage; Nonresponse |
|
Definition
Undercoverage occurs when some groups in the population are left out of the process of choosing the sample.
Nonresponse occurs when an indiidual chosen for the sample can't be contacted or does not cooperate. |
|
|
Term
|
Definition
The behavior of the respondent or the interviewer can cause bias in sample results. Respondents may lie, especially if asked about illegal or unpopular behavior. The race or sex of the interviewer can influence responses to questions about race relations or attitudes toward feminism. Answeres to questions that ask respondents to recall past events are often inaccurate because of faulty memory. |
|
|
Term
|
Definition
The most important influence on the answers given to a sample survey. Confusing or leading questions can introduce strong bias. |
|
|
Term
|
Definition
Any characteristic that takes on different values for different individuals |
|
|
Term
|
Definition
place an individual into one of several groups
Generally shown with pie charts and bar plots
• Examples: gender, race, preferred candidates |
|
|
Term
|
Definition
take on numerical values that are usually considered as continuous
• Examples: height, weight, wages |
|
|
Term
|
Definition
describes what values a variable takes and how frequently these values occur.
• The distribution of a variable can be described graphically and numerically in terms of: Center, Spread, Shape, and Outliers. |
|
|
Term
|
Definition
Where are most of the values located? |
|
|
Term
|
Definition
How variable are the values? |
|
|
Term
|
Definition
Is the distribution symmetric or skewed? Are there multiple peaks or just one? |
|
|
Term
|
Definition
Are there certain values that seem surprisingly large or small? |
|
|
Term
Box Plots -Box -Median -Whiskers - Outliers |
|
Definition
• Box plots are an effective tool for conveying information of continuous variables • Box contains the central 50% of the data, with a line indicating the median • Median is the value with 50% of data on either side • Whiskers contain most of the rest of the data, except for suspected outliers • Outliers are suspiciously large or small values
Useful for displaying center and spread of a distribution, as well as potential outliers • However, boxplot doesn’t really give us much of an idea of the shape of the distribution |
|
|
Term
|
Definition
• Histograms emphasize frequency of different values in the distribution
• X-axis: Values are divided into bins • Y-axis: Height of each bin is the frequency that values from that bin appear in dataset |
|
|
Term
|
Definition
• Both graphs give a good idea of the spread • Boxplots may be a little clearer in terms of the center and outliers in a distribution
Histograms much more effective at displaying the shape of a distribution • Skewness • Multi-modality |
|
|
Term
|
Definition
presence of multiple high frequency values (shape of distribution) |
|
|
Term
|
Definition
departure from left-right symmetry (shape of distribution) |
|
|
Term
|
Definition
Mean/ Median
Mean can be affected by large outliers and asymmetry more than the median
Trimming of mean can make it more resistant to outliers. Generally trim by 5% on either side but can be more
Median is essentially trimming all but middle value. • Median is often described as a more robust or resistant measure of the center |
|
|
Term
|
Definition
• Symmetric Distributions: Mean ≈ Median (approx. equal)
• Skewed to the Left: Mean < Median; Mean pulled down by small values
• Skewed to the Right: Mean > Median; Mean pulled up by large values |
|
|
Term
|
Definition
The average of the squared deviations of each observation:
[image] |
|
|
Term
|
Definition
Measure of spread.We quantify how far each observation is from center:[image]
• Standard Deviation is also an average (like the
mean) so it is sensitive to outliers |
|
|
Term
|
Definition
Measure of spread. What median does for mean, IQR does for SD. Trims away extreme values. Often used to detect outliers.
• First Quartile (Q1) is the median of the smaller half of the data (bottom 25% point) • Third Quartile (Q3) is the median of the larger half of the data (top 25% point) • Inter-Quartile Range is also a measure of spread: IQR = Q3 - Q1
• Like the median, the Inter-Quartile Range (IQR) is robust or resistant to outliers |
|
|
Term
|
Definition
IQR often used but is an arbitrary definition. Doesn't always apply. • An observation X is an outlier if either:
1. X is less than Q1 - 1.5 x IQR 2. X is greater than Q3 + 1.5 x |
|
|