Shared Flashcard Set

Details

Statistics 111; Lectures 1-5
Terms pertaining to lectures before first homework; Shane T. Jensen, STAT-111 Fall 2010; Introduction to the Practice of Statistics, Moore, McCabe, Ch. 3 (3.1, 3.2) Ch. 1 (1.1, 1.2)
55
Mathematics
Undergraduate 1
09/26/2010

Additional Mathematics Flashcards

 


 

Cards

Term
Inference
Definition
Using mathematical models of uncertainty to answer questions (connecting probability concepts to our data)
Term
Exploratory Data Analysis
Definition
Analysis that relies heavily on plotting data and looking looking for patterns that suggest interesting conclusions or questions for further study; however, it can rarely provide convincing evidence for its conclusions alone.
Term
Anecdotal Data
Definition
Data taken by simply drawing conclusions from our own experience, making no use of more broadly representative data (a magazine article claiming men need pilates more than women, e.g.). It is not a sound basis for drawing conclusions.
Term
Anecdotal Evidence
Definition
Based on haphazardly selected individual cases, which often come to our attention because they are striking in some way. These cases need not be representative of any larger group of cases.
Term
Available Data
Definition
Data that were produced in the past for some other purpose but that may help answer a present question.
Term
Sample Surveys
Definition
The usual tool for answering questions such as, "how have the attitudes of Americans, on issues ranging from abortion to work changed over time?
Term
Sample/Population
Definition
A SAMPLE group is used to study a part in order to gain information about the whole POPULATION.
Term
Census
Definition
An attempt to contact every individual in the entire population (as opposed to taking data from a sample, which is much more time-efficient).
Term
Observational Study
Definition
A study in which we observe individuals and measure variables of interest but do not attempt to influence the responses. Even when based on a statistical sample, it is a poor way to determine what will happen if we change something.
Term
Experiment
Definition
A means of creating data by deliberately imposing some treatment on individuals and observing their responses. In principle, experiments can give good evidence for causation.
Term
Intervention
Definition
The best way to see the effects of a change--where we actually impose the change in an experiment.
Term
Confound
Definition
Mix up with. E.g., we say that the effect of child care on behavior is confounded with other characteristics of families who use child care.
Term
Statistical Inference
Definition
Answers specific questions with a known degree of confidence. Based on statistical techniques for producing data.
Term
Experimental Units/Subjects/Treatment
Definition
The individuals on which the experiment is done are the EXPERIMENTAL UNITS. When the units are human beings, they are called SUBJECTS. A specific experimental condition applied to the units is called a treatment.
Term
Factors
Definition
The explanatory variables in an experiment. Many experiments study the joint effects of several factors. In such an experiment, each treatment is formed by combining a specific value (often called a LEVEL) of each of the factors.
Term
Placebo Effect
Definition
The response to a dummy treatment.
Term
Control Group
Definition
The group of subjects who receive a sham treatment. It enables us to control the effects of outside variables on the outcome. Comparison of several treatments in the same environment is the simplest form of control.
Term
Bias
Definition
If a study systematically favors certain outcomes.
Term
Randomization
Definition
The use of chance to divide experimental units into groups.
Term
Principles of Experimental Design
Definition
1) COMPARE two or more treatments. This will control the effects of lurking variables on the response.
2) RANDOMIZE--use impersonal chance to assign experimental units to treatments.
3) REPEAT each treatment on many units to reduce chance variation in the results.
Term
Statistical Significance
Definition
An observed effect so large that it would rarely occur by chance is called STATISTICALLY SIGNIFICANT.
Term
A Table of Random Digits
Definition
A way to randomize without software. It is a list of the digits 0,1,2,3,4,5,6,7,8,9 in which:
1) The digit in any position on the list has the same chance of being any one of 0,1,2,3,4,5,6,7,8,9
2) The digits in different positions are independent in the sense that the value of one has no influence on the value of any other.
Term
Completely Randomized Design
Definition
When all experimental units are allocated at random among all treatments. They can compare any number of treatments. The treatments can be formed by levels of a single factor or by more than one factor.
Term
Double-Blind
Definition
When neither the subjects themselves nor the medical personnel who worked with them new which treatment any subject had received. It avoids unconscious bias by, for example, a doctor who doesn't think that "just a placebo" can benefit a patient.
Term
Lack of Realism
Definition
When the subjects or treatments or setting of an experiment may not realistically duplicate the conditions we really want to study (this is a serious potential weakness of experiments).
Term
Matched Pairs Design
Definition
Compares just two treatments. The subjects are matched in pairs. For example, an experiment to compare which two advertisements for the same product might use pairs of subjects with the same age, sex, and income. The idea is that matched subjects are more similar than unmatched, so comparing responses within a number of pairs is more efficient than comparing the responses of groups of randomly assigned subjects.
Term
Block Design
Definition
A BLOCK is a group of experimental units or subjects that are known before the experiment to be similar in some way that is expected to affect the response to the treatments. In a BLOCK DESIGN, the random assignment of units to treatments is carried out separately within each block.
Term
Response Rate
Definition
The proportion of the original sample who actually provide usable data.
Term
Voluntary Response Sample
Definition
Consists of people who choose themselves by responding to a general appeal. Voluntary response samples are biased because people with strong opinions, especially negative opinions, are most likely to respond.
Term
Simple Random Sample (SRS)
Definition
An SRS of size n consists of n indiciduals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected.
Term
Probability Sample
Definition
A sample chosen by chance. We must know what samples are possible and what chance, or probability, each possible sample has.
Term
Stratified Random Sample
Definition
To select a STRATIFIED RANDOM SAMPLE, first divide the population into groups of similar individuals, called STRATA. Then choose a separate SRS in each stratum and combine these SRSs to form the full sample.
Term
Multistage Sample
Definition
Select successively smaller groups within the population in stages, resulting in a sample consisting of clusters of individuals. Each stage may employ an SRS, a stratified sample, or another type of sample.
Term
Undercoverage; Nonresponse
Definition
Undercoverage occurs when some groups in the population are left out of the process of choosing the sample.

Nonresponse occurs when an indiidual chosen for the sample can't be contacted or does not cooperate.
Term
Response Bias
Definition
The behavior of the respondent or the interviewer can cause bias in sample results. Respondents may lie, especially if asked about illegal or unpopular behavior. The race or sex of the interviewer can influence responses to questions about race relations or attitudes toward feminism. Answeres to questions that ask respondents to recall past events are often inaccurate because of faulty memory.
Term
Wording of Questions
Definition
The most important influence on the answers given to a sample survey. Confusing or leading questions can introduce strong bias.
Term
Variable
Definition
Any characteristic that takes on different values for different individuals
Term
Categorical variables
Definition
place an individual into one of several groups

Generally shown with pie charts and bar plots

•  Examples: gender, race, preferred candidates
Term
Quantitative variables
Definition
take on numerical values that are usually considered as continuous

•  Examples: height, weight, wages
Term
Distribution
Definition
describes what values a variable takes
and how frequently these values occur.

•  The distribution of a variable can be described
graphically and numerically in terms of:
Center, Spread, Shape, and Outliers.
Term
Center
Definition
Where are most of the values located?
Term
Spread
Definition
How variable are the values?
Term
Shape
Definition
Is the distribution symmetric or skewed? Are there multiple peaks or just one?
Term
Outliers
Definition
Are there certain values that seem surprisingly large or small?
Term
Box Plots
-Box
-Median
-Whiskers
- Outliers
Definition
•  Box plots are an effective tool for conveying
information of continuous variables
•  Box contains the central 50% of the data, with a line indicating the median
•  Median is the value with 50% of data on either side
•  Whiskers contain most of the rest of the data, except for suspected outliers
•  Outliers are suspiciously large or small values

Useful for displaying center and spread of a
distribution, as well as potential outliers
•  However, boxplot doesn’t really give us much
of an idea of the shape of the distribution
Term
Histograms
Definition
•  Histograms emphasize frequency of different
values in the distribution

• X-axis: Values are divided into bins
• Y-axis: Height of each bin is the frequency that values from that bin appear in dataset
Term
Histogram vs. Box Plot
Definition
•  Both graphs give a good idea of the spread
•  Boxplots may be a little clearer in terms of the center and outliers in a distribution

Histograms much more effective at displaying the
shape of a distribution
•  Skewness
•  Multi-modality
Term
Multi-modality
Definition
presence of multiple high frequency values (shape of distribution)
Term
Skewness
Definition
departure from left-right symmetry (shape of distribution)
Term
Measures of Center
Definition
Mean/ Median

Mean can be affected by large outliers and asymmetry more than the median

Trimming of mean can make it more resistant to outliers. Generally trim by 5% on either side but can be more

Median is essentially trimming all but middle value. •  Median is often described as a more robust or
resistant measure of the center
Term
Effect of Asymmetry
Definition
•  Symmetric Distributions: Mean ≈ Median (approx. equal)

•  Skewed to the Left: Mean < Median; Mean pulled down by small values

•  Skewed to the Right: Mean > Median; Mean pulled up by large values
Term
Variance
Definition

The average of the squared deviations of each observation:

[image]

Term
Standard Deviation
Definition

Measure of spread.We quantify how far each observation is from center:[image]

 

•  Standard Deviation is also an average (like the mean) so it is sensitive to outliers
Term
Inter-Quartile Range
Definition
Measure of spread. What median does for mean, IQR does for SD. Trims away extreme values. Often used to detect outliers.

• First Quartile (Q1) is the median of the smaller half of the data (bottom 25% point)
• Third Quartile (Q3) is the median of the larger
half of the data (top 25% point)
• Inter-Quartile Range is also a measure of
spread:
IQR = Q3 - Q1

•  Like the median, the Inter-Quartile Range (IQR)
is robust or resistant to outliers
Term
Detecting Outliers
Definition
IQR often used but is an arbitrary definition. Doesn't always apply.
•  An observation X is an outlier if either:

1. X is less than Q1 - 1.5 x IQR
2. X is greater than Q3 + 1.5 x
Supporting users have an ad free experience!