Shared Flashcard Set

Details

Title

Stats Test #2

Description

Vocab, Methods, and anything worth remembering

Total Cards

112

Subject

Mathematics

Level

Undergraduate 1

Created

04/17/2011

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Mathematics Flashcards

Cards Return to Set Details

Term

Basic Context for Data (5-6 questions to ask)

Definition

Who, What, When, Where, Why and How?

Term

Categorical Variable

Definition

When Data answers questions but does not represent a sumable or manipulatable quantity. Can be represented by a #

Term

Quantitative Variable

Definition

Whenever a variable is in units representing exact amounts of something or some occurrence.

Term

Identifier Variables

Definition

A number assigned to each individual case for sorting purposes

Term

Frequency Table/ Relative Frequency Table

Definition

A table with different categories and and total counts or one which represents the proportion of each count as a percent

Term

Bar Chart

Definition

Displays distribution of a categorical variable. NOT a quantitative variable

Term

Contingency Table

Definition

A table which represents categories and breaks down the totals into their representative parts. The margins represent the totals

Term

Area Principle

Definition

When graphing data, make sure each catagory has an area which is proportional to its total in the group

Term

Simpson's Paradox

Definition

Unfair averaging over different groups without the same conditions and quantity

Term

Histogram

Definition

Only for quantitative data. Looks like bar graph (only for catagorical data) except that there is no space between bars unless there is a gap in the data. Good for illustrating distribution

Term

Stem and Leaf Displays (and Dotplots)

Definition

Writing the first digit on one side of the table, then listing one following digit for each case in that range. Dotplots replace digits with dots

Term

Three things to mention when describing distribution

Definition

Shape: Describe how many modes in data set/ symmetricallity/ outliers?
Center: Median/ Mean
Spread: Average variation/ interquartile range

Term

Unimodal /Bimodal/ Multimodal

Definition

With one hump/ 2 humps/ more than 2 heads

Term

Uniform (Shape)

Definition

Data which is fairly consistent, no modes or trend

Term

Skew

Definition

When there is a Tail (thinner ends of the distribution) one way or the other, the graph is said to be this

Term

Interquartile Range (IQR)

Definition

The upper quartile (75th percentile)- lower quartile (25th percentile)

Term

Variance

Definition

The total sum of the difference between each y value and the mean squared divided by (n-1)

It is just before you square root to find the standard deviation

Term

Standard Deviation

Definition

Take the square root of:
Sum of difference between y and the mean squared/ (n-1)

Term

Drawing Boxplots

Definition

1. Make boxes with lower, upper quartiles and mean. Add whiskers up to 1.5 times the IQR and add outliars

Term

Z-Score (Standardized Value)

Definition

(y-the mean of y)/ standard deviation. Written z(x) or z(y)

Term

How does standardizing data change data

Definition

Shape: Does not change
Center: Makes the mean 0
Spread: The standard deviation becomes 1

Term

Nearly Normal Condition

Definition

The shape of the data's distribution is unimodal and symmetric, then you can apply different things. Make a Picture

Term

The 68-95-99.7 Rule

Definition

Within 1 sd positively and negatively of 0 is 68% of data, within 2 is 95% of data, within 3 is 99.7

Term

Finding Normal Percentiles

Definition

Calculate Z-Score then look to left of table for 1st 2 digits and match with the top of the table to find the corresponding normal percentile

Term

Normal Probability Plot

Definition

The y axis is the x of the corresponding histogram (ex. mpg) and the x axis is each data points Z-score. Should be a diagonal, left-right graph

Term

Things to look for in Scatterplots

Definition

Direction: Is it positive or negative
Form: Is it linear? Curved?
Strength: How much does it scatter?
Outliers: Anything that significantly skews the data

Term

Predictor/ Explanatory Variable

Definition

The x-axis which is believed to inform or predict the y value

Term

Response Variable

Definition

The y axis and variable of interest. This is the variable used in St. dev. etc...

Term

Correlation (r)

Definition

Measures the strength of the linear association between two quantitative variables.

r= The sum of z(x) times z(y) / (n-1)

Term

Correlation Conditions

Definition

Quantitative Variables Condition: Make sure data isn't categorical
Straight Enough Condition: It is subjective, but make sure the data isn't clearly non-linear
Outlier Condition: Make sure outliers are not present as they can distory the correlation dramatically

Check these conditions with a scatter plot

Term

Lurking Variable

Definition

The explanation of why correlation is misleading and does not prove causation

Term

Kendall's tau

Definition

Designed to assess how close the relationship between two variables is to being monotone. A monotone relationship is how consistently they increase or decrease, not necessarily linearly. A value of -1 means constant decreasing, 1 means constant increase. Its a nonparametric value

Term

Spearman's Rho

Definition

Is less sensitive to outliers. Gives a rank (starting with 1, 2,3 etc....) to each x value. Also between -1 and 1. It is a nonparametric value.

Term

Residual

Definition

The difference of the y value of a coordinate and the predicted y value of a linear regression (also refered to as y(hat).

Term

Line of Best Fit

Definition

Also know as the least squares line

Term

Linear Regression equation

Definition

y(hat)= b0+ b1(x)

Term

b1 (The slope of linear regression) equation

Definition

r (sy/sx)
or
the correlation x times (standard deviation of y/ stand. dev. of x)

Term

b0 (y intercept)

Definition

y (avg)- b1*x(avg)

Term

R^2 value

Definition

Gives a positive fraction of the data's variation accounted for by the model

Term

Does the Plot Thinken? Condition

Definition

When you plot the residuals against the model, there should be no discernable pattern. If there is, your model isn't ideal

Term

Inverting the Regression

Definition

You can't simply rearrange regresion line equations unless correlation is 1.0. You must do the b1 and b0 formulas again

Term

Leverage

Definition

The extent to which a point influences analysis

Term

Subsets

Definition

Distinguishable traits of the data that can allow you to fit different regression lines to different segments of information (male/female etc...)

Term

Goals of Re-expression

Definition

1. Make the distribution of a variable more symmetric
2. Make the spread of several groups (as seen in side-by-side boxplots) more alike, even if their centers differ (often achieved with logs)
3. Make the form of a scatterplot more nearly linear
4. Make the scatter in a scatterplot spread out evenly rather than thickening at one end

Term

Ladder of Powers: 2

Definition

Try for unimodal, left skewed histograms

Term

Ladder of Powers: "0" aka Logs

Definition

This is the go to. You can't have negative or 0 numbers, so add small constants to all data to avoid mistakes. Try logging y, then logging x, and if all else fails log both.

Term

Ladder of Powers: -1/2

Definition

Negative square root perserves the direction of relationships. Your last bet

Term

Ladder of Powers:-1

Definition

Positive or negative, depending on which way you want the data to go. Ratios of 2 quantities benefit the most.

Term

Sample Strategies and Ideals to keep in mind:

Definition

1: Examine a Part of the Whole: Try to avoid bias by representing all parts of the population equally proportional to their representation in the whole
2: Randomize: When in doubt, make sure there is nothing that could be associated with what your sample
3: Its the Sample Size: The fraction of the population doesn't matter, just the actual sample size (2,000 is a good number).

Term

Sample Strategies and Ideals to keep in mind:

Definition

Term

Census

Definition

A sample of the entire population, often quite inefficent

Term

Parameter v. Statistics

Definition

Parameters are real information about the world that we are trying to get at, often in vain.
Statistics are anything we calculate from data

Term

Simple Random Sample (SRS)

Definition

A method by which any combination of samples could be selected. The basis for comparison with all other statistical methods

Term

Sampling Frame

Definition

The list of individuals from which the sample is drawn

Term

Stratified Random Sampling

Definition

Dividing the population into distinct strata of samples, and using a simple random sample within each strata.

Term

Cluster Sampling

Definition

Taking a representative cluster of the population which expresses the population as a whole. If it doesn't represent the population as a whole it will be bias. Can also be a piece of multistage samples

Term

Systematic Sample

Definition

When you use a nonrandom, but systematic sample of individuals. For example, selected every 20th person in a population.

Term

Pilot

Definition

A trial run of a survey before it is employed in a larger group at higher cost. Gives you a chance to recognize flaws in your design

Term

Sampling Technique Errors

Definition

Voluntary Response Sample: Because it is self-selective, it is inherently bias
Convenience Sampling: Does not usually make unbiased information

Term

Mistakes Which Can Arise

Definition

Nonrespondants: Its always a good investment to limit the amount of Nonrespondants, because their lack of incorporation can shift data
Response Bias: Anything in the survey which influences response (wording of a question, the environment its taken in)

Term

Observational Studies

Definition

When people or subjects are viewed in their natural environments. Often retrospective studies

Term

Prospective v. Retrospective Studies

Definition

Prospective studies follow randomly picked individuals and watch them for a given amount of time, generally favored over retrospective options

Term

Experiment

Definition

When you attempt to isolate very simple variables through random assignment of treatments to subjects. Active manipulation by researchers.

Term

The 4 Principles of Experimental Design

Definition

1. Control: Control sources of variation other than what we are testing
2. Randomization: Equalizes the effects of unforseen or uncontrollable sources of variation
3. Replicate: Results have to be replicated in slightly altered situations to show no bias
4. Block: Sometimes attributes affect outcomes of an experiment, so grouping different blocks together is more accurate

Term

The 4 Principles of Experimental Design

Definition

Term

Blinding

Definition

Limiting the effect knowledge can influence the experiment, by keeping key catagorical variables a secret from the subject and from the researcher. An experiment is "double blind" when even those who interprete the data are unaware of its identity.

Term

Matching

Definition

Pairing subjects because they are similar in ways not under study

Term

Discrete v. Continuous Random Variables

Definition

Discrete random variables are randomly selected from a set of outcomes which can be listed, while continuous random variables cannot be listed, they are infinite

Term

Expected Value of a discrete random variable

Definition

Multiple each possible outcome by its probability and add them all together

Term

Calculating variance

Definition

The difference between observed and expected (mean), squared and multiplied by liklihood of it happening + the same process for all different outcomes

For an insurance policy, if average cost is $20 per policy, with a payout of 10,000 and a likihood of having to pay out of 1/1,000, then

(10,000-20)^2*(1,000)

Term

Calculate S.D. (given variance)

Definition

SD(x)= sq. rt. Var (x)

Term

Adding/subtracting rules for SD and variance

Definition

-The variance of the sum of two independent random variables is the sum of their individual variances NOT S.D.

-If random variables are independent, the variance of their sum or difference is always the sum of the variances.

Term

Adding/subtracting means

Definition

The mean of the sum/ difference of two random variables is the sum/difference of their means

Term

Calculating z-score

Definition

Difference between Expected value and observed (or theoretically observed) value over S.D., then use z-score technology/table

Term

Definition of and Calculating Covariance

Definition

Measures how X and Y vary together. When two things correlated (i.e. X above its mean and y above its mean) they will have positive covariance.

Covariance (X,Y)= E((X-u)(Y-v))

In other words, the difference of individual data point and mean of x and y times one another.

Term

Geometric probability model for Bernoulli trials

Definition

p= probabilty of success
X=number of trials until first success

P(x)=p*(1-p)^x-1

In other words, the probabilty of success with only x trials equals the individual probability of success times the probability of failure to the x-1 degree.

Term

Expected number of trials for geometric probability model for Bernoulli trials

Definition

E(X)=1/p

1/ the probabilty of an accurance equals how many times you would expect to have to run the experiment before a success.

Term

Standard deviation geometric probability model for Bernoulli trials

Definition

S.D.= sq. rt.(1-p)/(p^2)

Term

Does X+X+X=3X

Definition

Random events labeled x are not algebraically manipulatable, X(1)+X(2)+X(3) cannot be simplified. Insuring 3 people for 10,000 each is not the same as insuring one for 30,000

Term

If shifting a data set by a constant, describe effect on s.d., variance and mean

Definition

The mean of the data fluctuates the same way the change influenced.

Variance and Standard Deviation are completely unaffected by addition/subtraction of a constant

Term

Multiplying data by a constant

Definition

Multiplying data by a constant shifts the mean that same amount

The variance of the constant is multipied by the square of the constant.

If we multiply X by a, then E(x*a)=a*E(x)
Var (x*a)= a^2Var(x)

Term

Probabilty of certain outcome given
-x successes and
-n trials

Definition

The number of possible outcomes giving x successes in n trials* p^x * (1-p)^n-x

The probability is the number of possible outcomes times the probability of individual success to the number of successes and the probability of failure to the number of failures.

Term

Standard deviation of a binomial model

Definition

square root (n*p*(1-p))

or the number of outcomes times the probability of success times the probability of failure

Term

Estimate binomial probability for large sample size using the normal method

Definition

The difference of the mean and the observed (or necessary) number of successes over the standard deviation

n*p- observed/ sq. rt. (np(1-p))

Term

To estimate the probability you will get your first success on a certain trial, use...

Definition

Geometric trial

P(x)= (p)(1-p)^(x-1)

Term

To estimate the probability you'll get a certain number of success in a specified number of independant trials, use...

Definition

the Binomial method

nCx= n!/(x!)(n-x)!
nCx ("n choose x")*(p^x)*(1-p)^n-x

Number of possibilities * probability of success to the number of successes*probability of faliure to the number of needed failures

Term

To estimate probability involving (large) quantitative variables, use

Definition

The normal model

exp. mean- observed/ sd. sq. rt. [np(1-p)]

Term

Sampling distribution model

Definition

allows us to quantify variation between samples and talk about how likely it is that we'd observe a sample proportion in any particular interval

Term

SD for a proportion

Definition

sq. rt. [P(1-P)/n]

square root of probabilty of success * failure/ number of cases

Term

Assumptions and Conditions for normal model usage in proportions

Definition

Independance assumption: each sample is indep.
Sample size assumption: enough "n"s
Randomization Condition: Subjects randomly assigned to treatments
10% condition: sample size must be no larger than 10% of population
Success/Failure condition: Sample size has to be big enough to have 10 successes and 10 failures

Term

Central Limit Theorm

Definition

The mean of a random sample is a random variable whose sampling distribution can be approximated by a normal model. The larger the sample, the better the approximation will be.

Term

Sampling distribution model for a mean (CLT)

Definition

If you take a sample out of a known population, the standard deviation for that sample is smaller than one random instance. Your new sample (new mean) is always smaller than standard deviation of each sample point. It is represented universally as
SD(y bar, or the sample)=SD(population)/(sq. rt. [n]) --> the sample size

Term

Z score calculation for sampling distribution for the mean

Definition

Book def: y(bar)-mu/SD(y bar)

given difference that we are testing (in question) - parameter(what we are given as true)/ new standard deviation (SD of population/ sq. rt.[n]

Term

Standard deviation of a sampleing distribution

Definition

sq. rt. [(p)(1-p)/n]

Term

Estimating the standard deviation of a sampling distribution if parameter is unknown

Definition

Is called Standard Error, found with the same formula substituting p(hat) for p.

Term

Given p and n, find margin of error w/ 95% confidence interval

Definition

Calculate SE = sq. rt. [(p*(1-p))/n]

Multiply standard error times z*(1.96) to get margin of error

Term

To find sample size to get the confidence interval for a proportion you want

Definition

Use p=.5 and the Margin of error you want (often 0.03) and work backwards until you solve for n

Term

Null v. Alternative Hypotheses

Definition

We assume the null hypothosis is true, alternative hypothesis is something we consider plausible should the null be overturned

Term

Conditions for Hypothesis testing (4)

Definition

Independence Assumption
Randomization Condition
10% Condition
10 Success/Failure Condition

Term

Calculating Margin of Error

Definition

ME= z* x SE(p)

Where SE= sq rt [(p)(1-p)/n]

Term

Errors in Hypothesis Testing (2)

Definition

Type 1: The null hypothesis is true, but we mistakenly reject it
Type 2: The null hypothesis is false, but we fail to reject it

Term

Power (of a test)

Definition

The probability that it correctly rejects a false null hypothesis. If B is the probability that a test fails to reject a false hypothesis (Type 2 error), 1-B is the power of the test.

Term

Effect size

Definition

The distance between the null hypothesis value and the truth the effect size. This can be estimated with the observed mean

Term

Assumptions and Conditions for comparing proportions

Definition

Independence Assumption
Randomization Condition
10% condition if sampled w/o replacement
Success/failure condition
Independant Groups Assumption: Two groups comparing must be independent of each other

Term

Two-proportion z-interval

Definition

p(1)-p(2) +/- (z*)(SE {p(1)-p(2)})

(SE {p(1)-p(2)})= sq rt[(p)(1-p)/n + (p)(1-p)/n] for both p(1) and p(2), using appropriate "n"s as well

Term

Pooling proportions (not means!)

Definition

Add # of successes and divid by sum of trials, but when calculating SE,

SE {p(pooled)})= sq rt[(p)(1-p)/n(1) + (p)(1-p)/n(2)] for both p(1) and p(2), noting that although the p and q values are pooled, the n value is NOT, and remains distinct for both calculations.

Term

Calculating degrees of freedom

Definition

df= (n-1)

Term

One-sample t-interval for the mean

Definition

y(bar) +/- (t*)(SE[estimated from y])
SE= s/ sq rt[n]

Term

Getting the standardized sample mean t

Definition

[y(bar)-mu]/ SE(y bar)

In other words, the mean from the data, the parameter mean divided by estimated Standard Deviation (s/sq. rt. [n])

Term

Assumptions and conditions for t test

Definition

Independance asumption
Randomization condition
10% condition
Nearly normal condition- unimodal, symmetric distribution. You can use histograms or normal probabilty plot

Term

Bonus assumptions for Counts

Definition

Counted Data Condition- There must be counts in each cell, not %s or anything else
Expected Cell Frequency Condition- There must be at least 5 counts in each bar of the table

Term

Basic 4 Assumptions and Conditions
+ 3 test specific assumptions/ conditions

Definition

1. Independance
2. Randomization
3. 10% condition
4. Success/failure condition
AND
5. Independence of groups- The two groups we are comparing have to be independent of each other
6. Nearly normal condition
7. Paired data assumption

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Mathematics Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile