Shared Flashcard Set

Details

Title

Statistics 111; Lectures 1-5

Description

Terms pertaining to lectures before first homework; Shane T. Jensen, STAT-111 Fall 2010; Introduction to the Practice of Statistics, Moore, McCabe, Ch. 3 (3.1, 3.2) Ch. 1 (1.1, 1.2)

Total Cards

Subject

Mathematics

Level

Undergraduate 1

Created

09/26/2010

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Mathematics Flashcards

Cards Return to Set Details

Term

Inference

Definition

Using mathematical models of uncertainty to answer questions (connecting probability concepts to our data)

Term

Exploratory Data Analysis

Definition

Analysis that relies heavily on plotting data and looking looking for patterns that suggest interesting conclusions or questions for further study; however, it can rarely provide convincing evidence for its conclusions alone.

Term

Anecdotal Data

Definition

Data taken by simply drawing conclusions from our own experience, making no use of more broadly representative data (a magazine article claiming men need pilates more than women, e.g.). It is not a sound basis for drawing conclusions.

Term

Anecdotal Evidence

Definition

Based on haphazardly selected individual cases, which often come to our attention because they are striking in some way. These cases need not be representative of any larger group of cases.

Term

Available Data

Definition

Data that were produced in the past for some other purpose but that may help answer a present question.

Term

Sample Surveys

Definition

The usual tool for answering questions such as, "how have the attitudes of Americans, on issues ranging from abortion to work changed over time?

Term

Sample/Population

Definition

A SAMPLE group is used to study a part in order to gain information about the whole POPULATION.

Term

Census

Definition

An attempt to contact every individual in the entire population (as opposed to taking data from a sample, which is much more time-efficient).

Term

Observational Study

Definition

A study in which we observe individuals and measure variables of interest but do not attempt to influence the responses. Even when based on a statistical sample, it is a poor way to determine what will happen if we change something.

Term

Experiment

Definition

A means of creating data by deliberately imposing some treatment on individuals and observing their responses. In principle, experiments can give good evidence for causation.

Term

Intervention

Definition

The best way to see the effects of a change--where we actually impose the change in an experiment.

Term

Confound

Definition

Mix up with. E.g., we say that the effect of child care on behavior is confounded with other characteristics of families who use child care.

Term

Statistical Inference

Definition

Answers specific questions with a known degree of confidence. Based on statistical techniques for producing data.

Term

Experimental Units/Subjects/Treatment

Definition

The individuals on which the experiment is done are the EXPERIMENTAL UNITS. When the units are human beings, they are called SUBJECTS. A specific experimental condition applied to the units is called a treatment.

Term

Factors

Definition

The explanatory variables in an experiment. Many experiments study the joint effects of several factors. In such an experiment, each treatment is formed by combining a specific value (often called a LEVEL) of each of the factors.

Term

Placebo Effect

Definition

The response to a dummy treatment.

Term

Control Group

Definition

The group of subjects who receive a sham treatment. It enables us to control the effects of outside variables on the outcome. Comparison of several treatments in the same environment is the simplest form of control.

Term

Bias

Definition

If a study systematically favors certain outcomes.

Term

Randomization

Definition

The use of chance to divide experimental units into groups.

Term

Principles of Experimental Design

Definition

1) COMPARE two or more treatments. This will control the effects of lurking variables on the response.
2) RANDOMIZE--use impersonal chance to assign experimental units to treatments.
3) REPEAT each treatment on many units to reduce chance variation in the results.

Term

Statistical Significance

Definition

An observed effect so large that it would rarely occur by chance is called STATISTICALLY SIGNIFICANT.

Term

A Table of Random Digits

Definition

A way to randomize without software. It is a list of the digits 0,1,2,3,4,5,6,7,8,9 in which:
1) The digit in any position on the list has the same chance of being any one of 0,1,2,3,4,5,6,7,8,9
2) The digits in different positions are independent in the sense that the value of one has no influence on the value of any other.

Term

Completely Randomized Design

Definition

When all experimental units are allocated at random among all treatments. They can compare any number of treatments. The treatments can be formed by levels of a single factor or by more than one factor.

Term

Double-Blind

Definition

When neither the subjects themselves nor the medical personnel who worked with them new which treatment any subject had received. It avoids unconscious bias by, for example, a doctor who doesn't think that "just a placebo" can benefit a patient.

Term

Lack of Realism

Definition

When the subjects or treatments or setting of an experiment may not realistically duplicate the conditions we really want to study (this is a serious potential weakness of experiments).

Term

Matched Pairs Design

Definition

Compares just two treatments. The subjects are matched in pairs. For example, an experiment to compare which two advertisements for the same product might use pairs of subjects with the same age, sex, and income. The idea is that matched subjects are more similar than unmatched, so comparing responses within a number of pairs is more efficient than comparing the responses of groups of randomly assigned subjects.

Term

Block Design

Definition

A BLOCK is a group of experimental units or subjects that are known before the experiment to be similar in some way that is expected to affect the response to the treatments. In a BLOCK DESIGN, the random assignment of units to treatments is carried out separately within each block.

Term

Response Rate

Definition

The proportion of the original sample who actually provide usable data.

Term

Voluntary Response Sample

Definition

Consists of people who choose themselves by responding to a general appeal. Voluntary response samples are biased because people with strong opinions, especially negative opinions, are most likely to respond.

Term

Simple Random Sample (SRS)

Definition

An SRS of size n consists of n indiciduals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected.

Term

Probability Sample

Definition

A sample chosen by chance. We must know what samples are possible and what chance, or probability, each possible sample has.

Term

Stratified Random Sample

Definition

To select a STRATIFIED RANDOM SAMPLE, first divide the population into groups of similar individuals, called STRATA. Then choose a separate SRS in each stratum and combine these SRSs to form the full sample.

Term

Multistage Sample

Definition

Select successively smaller groups within the population in stages, resulting in a sample consisting of clusters of individuals. Each stage may employ an SRS, a stratified sample, or another type of sample.

Term

Undercoverage; Nonresponse

Definition

Undercoverage occurs when some groups in the population are left out of the process of choosing the sample.

Nonresponse occurs when an indiidual chosen for the sample can't be contacted or does not cooperate.

Term

Response Bias

Definition

The behavior of the respondent or the interviewer can cause bias in sample results. Respondents may lie, especially if asked about illegal or unpopular behavior. The race or sex of the interviewer can influence responses to questions about race relations or attitudes toward feminism. Answeres to questions that ask respondents to recall past events are often inaccurate because of faulty memory.

Term

Wording of Questions

Definition

The most important influence on the answers given to a sample survey. Confusing or leading questions can introduce strong bias.

Term

Variable

Definition

Any characteristic that takes on different values for different individuals

Term

Categorical variables

Definition

place an individual into one of several groups

Generally shown with pie charts and bar plots

• Examples: gender, race, preferred candidates

Term

Quantitative variables

Definition

take on numerical values that are usually considered as continuous

• Examples: height, weight, wages

Term

Distribution

Definition

describes what values a variable takes
and how frequently these values occur.

• The distribution of a variable can be described
graphically and numerically in terms of:
Center, Spread, Shape, and Outliers.

Term

Center

Definition

Where are most of the values located?

Term

Spread

Definition

How variable are the values?

Term

Shape

Definition

Is the distribution symmetric or skewed? Are there multiple peaks or just one?

Term

Outliers

Definition

Are there certain values that seem surprisingly large or small?

Term

Box Plots
-Box
-Median
-Whiskers
- Outliers

Definition

• Box plots are an effective tool for conveying
information of continuous variables
• Box contains the central 50% of the data, with a line indicating the median
• Median is the value with 50% of data on either side
• Whiskers contain most of the rest of the data, except for suspected outliers
• Outliers are suspiciously large or small values

Useful for displaying center and spread of a
distribution, as well as potential outliers
• However, boxplot doesn’t really give us much
of an idea of the shape of the distribution

Term

Histograms

Definition

• Histograms emphasize frequency of different
values in the distribution

• X-axis: Values are divided into bins
• Y-axis: Height of each bin is the frequency that values from that bin appear in dataset

Term

Histogram vs. Box Plot

Definition

• Both graphs give a good idea of the spread
• Boxplots may be a little clearer in terms of the center and outliers in a distribution

Histograms much more effective at displaying the
shape of a distribution
• Skewness
• Multi-modality

Term

Multi-modality

Definition

presence of multiple high frequency values (shape of distribution)

Term

Skewness

Definition

departure from left-right symmetry (shape of distribution)

Term

Measures of Center

Definition

Mean/ Median

Mean can be affected by large outliers and asymmetry more than the median

Trimming of mean can make it more resistant to outliers. Generally trim by 5% on either side but can be more

Median is essentially trimming all but middle value. • Median is often described as a more robust or
resistant measure of the center

Term

Effect of Asymmetry

Definition

• Symmetric Distributions: Mean ≈ Median (approx. equal)

• Skewed to the Left: Mean < Median; Mean pulled down by small values

• Skewed to the Right: Mean > Median; Mean pulled up by large values

Term

Variance

Definition

The average of the squared deviations of each observation:

[image]

Term

Standard Deviation

Definition

Measure of spread.We quantify how far each observation is from center:[image]

• Standard Deviation is also an average (like the mean) so it is sensitive to outliers

Term

Inter-Quartile Range

Definition

Measure of spread. What median does for mean, IQR does for SD. Trims away extreme values. Often used to detect outliers.

• First Quartile (Q1) is the median of the smaller half of the data (bottom 25% point)
• Third Quartile (Q3) is the median of the larger
half of the data (top 25% point)
• Inter-Quartile Range is also a measure of
spread:
IQR = Q3 - Q1

• Like the median, the Inter-Quartile Range (IQR)
is robust or resistant to outliers

Term

Detecting Outliers

Definition

IQR often used but is an arbitrary definition. Doesn't always apply.
• An observation X is an outlier if either:

1. X is less than Q1 - 1.5 x IQR
2. X is greater than Q3 + 1.5 x

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Mathematics Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile