Shared Flashcard Set

Details

Biostatistics
Biostatistics
146
Other
Graduate
12/08/2009

Additional Other Flashcards

 


 

Cards

Term
:A complete set of people, events, etc. that share a common characteristic. The scientific notation is (N).
Definition
Population
Term
:A subset or subgroup that should be representative of the entire population. The scientific notation is (n).
Definition
Sample
Term
:The number that summarizes or describes a characteristic of a population.
Definition
Parameter
Term
:the variable that is manipulated in an experiment to determine its effect on the dependent variable.
Definition
Independent
Term
:the variable that depends on the independent variable. This is often a measure of behavior or outcome.
Definition
Dependent
Term
What are bar charts good at showing?
Definition
Counts or measures displayed in their categories.
Term
What are plot graphs good at showing?
Definition
Good for showing trends and relationships.
Term
What are scatterplots good at showing?
Definition
At showing raw data points.
Term
What are pie charts good at showing?
Definition
At visualizing proportions
Term
:the unit in which measurements are made.
Definition
Observation
Term
:the generic thing we measure
Definition
Variable
Term
:a realized measurement
Definition
Value
Term
:the set of units we are interested in learning about
Definition
Population
Term
:characteristic of an individual population unit that we are interested in learning about
Definition
Variable
Term
:subset of population observed that must be representative of the population.
Definition
Sample
Term
:generalization about a population based on sample data
Definition
Statistical Inference
Term
:statement about the uncertainty associated with a statistical inference
Definition
Measure of Reliability
Term
:a statistic that compares a numerical description of two sets of data and the direction of the relationship.
Definition
Correlation Coefficient
Term
:a statistical method of predicting one set of values from another set of measured values
Definition
Regression Analysis
Term
:data type that has an absolute zero (zero means no value)
Definition
Ratio data
Term
:data type with a relative zero (zero has a value) - temperature
Definition
Interval Data
Term
What type of measurement classifies variables into different categories (grouping M&M colors together) and is used to ID only?
Definition
Nominal measurement
Term
What type of measurement is it when the amount of variable is placed in order of magnitude (Putting M&Ms in order of color preference).
Definition
Ordinal Measurement
Term
What type of measurement is a measurement where the differences between the scores are equal?
Definition
Interval Measurement
Term
What type of measurement is it when the measurement has a true zero point and an absence of that variable.
Definition
Ratio Measurement
Term
What type of data is measurable or countable?
Definition
Quantitative data
Term
What kind of data is used to grade or sort based on objective findings (grade breast cancer, sort sex, sort blood groups)?
Definition
Qualitative data or (Categorical data)
Term
A statistical __________ is the set of all possible values for the variable.
Definition
population
Term
:a subset of the population.
Definition
Sample
Term
:a sample in which each member of the population has an equal probability of entering the sample. Everyone has an equal chance to be chosen for the sample
Definition
SRS - simple random sample
Term
Type of error involving exclusion of a subset of the population of interest prior to sampling?
Definition
Selection bias
Term
Type of error introduced when responses are not obtained from all sample members?
Definition
Nonresponse bias
Term
Type of error due to an inaccuracy in recorded data and can be due to survey design or interviewer impact?
Definition
Measurement error
Term
Type of error due to transcription error or data corruption?
Definition
Processing error
Term
What are the 3 main aspects of a distribution of data?
Definition
SCS - Shape, Center and Spread
Term
Class intervals are ranges of score values which scores are grouped, what is the general rule of thumb for the range of how many intervals there should be?
Definition
There should be no fewer than 6 intervals and no more than 15.
Term
:a distribution that when folded in half, produces two identical shapes.
Definition
Symmetry
Term
:a distribution in which scores are clustered at one end, and rarity of scores (the tail) occur on the other end
Definition
Skewness
Term
Skew direction if the tail (rare scores) occurs for the high scores to the right?
Definition
Positively skewed
Term
Skew direction if the tail (rare scores) occurs for the low scores to the left?
Definition
Negatively skewed
Term
:tendency of data to center (mid point) about certain numerical values
Definition
Central tendency
Term
3 commonly used measures of central tendency?
Definition
Mean, median and mode
Term
:the sum of scores divided by the number of scores (average)
Definition
Mean
Term
:the score with an equal amount of scores above and below it (50th percentile or middle number)
Definition
Median
Term
:the score that occurs the most often
(can be unimodal, bimodal,or multimodal)
Definition
Mode
Term
How is the sample mean denoted?
Definition
a small x with a line over it
Term
How is the median denoted?
Definition
a small m
Term
:refers to the variety exhibited by a set of observations (If all values in a data set are the same, there is none. If the values are not all the same, there is some.)
Definition
Dispersion (AKA variation, variability, spread or scatter)
Term
What other terms are used synonymously with dispersion?
Definition
Variation, variability, spread or scatter.
Term
:the spread of the data across possible values
Definition
Variability
Term
What are 3 commonly used measures of variability?
Definition
Range, variance and standard deviation
Term
:Largest measurement minus the smallest measurement
Definition
Range
Term
:Involves measuring dispersion relative to the scatter of values in a data set about their mean (better than range for describing the variety that exists among the values in a data set)
Definition
Variance
Term
:Useful as a measure of variation within a given set of data and is perhaps the most useful in determining variability when examining values in a data set by measuring dispersion in original units by taking the square root of the variance.
Definition
Standard deviation
Term
How is variance and standard deviation noted for the sample and the population?
Definition
Variance is little s (squared)
for sample and lower case sigma-σ (squared) for population. Standard deviation is lower case s for sample and lower case sigma-σ for population.
Term
:the square root of the sample variance.
Definition
Sample standard deviation
Term
:the square root of the population variance.
Definition
Population standard deviation
Term
What is the empirical rule for 1, 2, and 3 standard deviations of the population or sample.
Definition
1 standard deviation should encompass 68% of the population/sample, 2 standard deviations should encompass 95% of the population/sample and 3 standard deviations should encompass 99.7% of the population/sample.
Term
:determines how many standard deviations an observation is above or below the mean and allows comparisons of observation from different normal distributions.
Definition
Z-score
Term
2 common measures that measure the relationship of a measurement to the rest of the data?
Definition
Percentile ranking/score and Z-score
Term
:the distance between a measurement x and the mean, expressed in standard units
Definition
Z-score
Term
What are the 2 best detection methods for outliers?
Definition
Box plots and Z-scores
Term
:diagram based on quartiles, values that divide the dataset into 4 groups (Lower Quartile, Middle Quartile, Upper Quartile and Interquartile Range)
Definition
Box plot (Box and whisker)
Term
What values are considered outliers for Z-scores?
Definition
scores of plus/minus 3 are considered outliers
Term
What is the best method for graphing the relationship between two quantitative variables?
Definition
scatterplot
Term
Every time you see a _______, someone has tested a hypothesis.
Definition
p-value
Term
:by chance the groups’ results were different
Definition
Random variation
Term
:the two treatment arms really had different results
Definition
True difference
Term
:the two groups were chosen poorly and are different for reasons that have nothing to do with the treatment
Definition
Sampling error or bias
Term
:the probability of obtaining by random sampling error (or chance) alone a result as extreme or more extreme than that obtained if the null hypothesis were true (could the observed differences between the groups be explained by sampling error or not).
Definition
p-value
Term
What is the null hypothesis?
Definition
An assumption that there is no difference between the two study groups.
Term
What is the alternate hypothesis?
Definition
What the study is trying to prove, that there is a difference between the two study groups. Can be one or two tailed (A could be preferred over B or vice versa)
Term
What is a two-tailed alternate hypothesis?
Definition
Alternate hypothesis that tries to disprove the null by saying that either A is better than B or that B is better than A but they are not the same(null).
Term
Is it more acceptable to use a one or two tailed alternative hypothesis?
Definition
Most authorities advocate a two-tailed approach unless there exists strong evidence that the one-tailed alternative hypothesis could not go in any other direction.
Term
If a result is unlikely due to chance it is said to be?
Definition
Statistically significant
Term
If a result is due to chance it is said to be?
Definition
Not statistically significant
Term
Where is the p-value most often set? What is this value known as? What does it mean if a p-value is higher than the set mark?
Definition
P values are most often set at 0.05 and is referred to as alpha. If a p-value is higher than 0.05 then the results of the study are not statistically significant enough to reject the null hypothesis.
Term
What type of error involves falsely rejecting the null hypothesis and accepting the alternative hypothesis?
Definition
Type 1 or alpha error
Term
What type of error involves falsely failing to reject the null hypothesis or falsely failing to accept the alternative hypothesis?
Definition
Type 2 or beta error
Term
What 3 factors does the choice of statistical test depend on?
Definition
Type of data, distribution of data and type of study design.
Term
:significance tests for data from interval or ratio scales.
Definition
Parametric tests
Term
:tests used to test hypotheses with nominal and ordinal data.
Definition
Nonparametric tests
Term
Are parametric or nonparametric tests preferred?
Definition
Parametric are more powerful and are preferred.
Term
What are the advantages of nonparametric tests?
Definition
Easier to use than parametric, they are appropriate for non-normal population distributions, and they can be used with nominal and ordinal data.
Term
What are the 3 questions that you should ask when selecting a test?
Definition
1. How many samples are involved?
2. Are the individual cases independent or related?
3. Is the measurement scale nominal, ordinal, interval or ratio?
Term
What are the indications for the parametric test (Z-score)?
Definition
large sample sizes (exceeding 30 for both independent samples) or with smaller samples when the data are normally distributed and population variances are known.
Term
What are the indications for the parametric test (t-score or t-test)?
Definition
t-test is appropriate when the population variances are not known but the population distribution is normal.
Term
What are the indications for the nonparametric test (Chi-square test)?
Definition
chi-square test is appropriate for situations in which a test for differences between samples is required.
Term
:measure of the differences between actual and expected frequencies between two categorical variables (nominal and ordinal measurements)
Definition
Chi-square test
Term
:estimates the standard deviation of the difference between the measured values and the true values (how close is the sample mean to the population mean).
Definition
standard error
Term
Which test is used if you have interval or ratio data and the standard deviation of the population is known? If it is not known?
Definition
If it is known then you use a Z-test. If it is not known then you use a t-test.
Term
To test a sample of normal continuous data, we need?
Definition
expected value, observed mean, standard error and degrees of freedom.
Term
When testing a sample of normal continuous data, what does the expected value indicate?
Definition
the population or true mean
Term
When testing a sample of normal continuous data, what does the observed mean indicate?
Definition
the average of your sample
Term
When testing a sample of normal continuous data, what does the standard error indicate?
Definition
a measure of the spread, estimation of the standard deviation of the difference between the measured and the true values.
Term
When testing a sample of normal continuous data, how do you calculate the degrees of freedom?
Definition
the number of things in the sample minus 1. (n-1)
Term
Describe the Central Limit Theorem (CLT).
Definition
The theorem that the distribution of sample means taken from a large population approaches a normal (Gaussian) curve = perfect bell curve. The larger the sample the more you can arrive at the perfect curve.
Term
What is a sampling distribution of a mean?
Definition
When you take a sample of size “n” from a population and calculate its mean, then sample the same population again and calculate the mean, then sample it again and calculate the mean, and keep doing this many times...then graph all of the means on one curve.
Term
If a sampling distribution is Gaussian then it will follow what rule?
Definition
Empirical rule of standard deviation (68-95-99%)
Term
:a measure of the difference between your continuous data and what you expect to see, in units of standard error. Used when the standard deviation of the population is not known.
Definition
t-test
Term
:used to test the means of 3 or more groups of continuous, normally distributed data to see if they are all equal to one another
Definition
Analysis of variance of ANOVA
Term
How are categorical data usually tested?
Definition
Chi square test
Term
What does a large chi-square statistic indicate?
Definition
That the observed frequencies greatly differ from the expected frequencies.
Term
What is the chi-square test a part of?
Definition
Contingency table analysis
Term
How are the degrees of freedom for a chi-square test calculated?
Definition
(number of rows - 1) x (number of columns - 1)
Term
What is an important assumption of the ANOVA test?
Definition
All treatments have similar variance.
Term
What kind of data is used in ANOVA testing?
Definition
Categorical (Qualitative) or Continuous (Quantitative)
Term
What is the end result of an ANOVA test procedure?
Definition
an F-statistic that is used to calculate the p-value
Term
If the null hypothesis is rejected using an ANOVA study, what information is gained? What must you do to gain more information?
Definition
Only that at least two groups were different. Run an ANOVA (or t-test bc they are the same with only 2 groups) between each pair in the study to determine where the difference is.
Term
:when a relationship exists between two variables, the value of one variable can be predicted if the value of the other variable is known. What is this an example of?
Definition
Regression analysis
Term
What does the magnitude of the slope in linear regression analysis indicate?
Definition
It represents the amount that the dependent variable changes for each unit change in the independent variable.
Term
How is the magnitude of the slope represented in a regression analysis?
Definition
the correlation coefficient (can be positive or negative).
Term
If a regression line correlation coefficient is positive, what will the relationship between the predicted and explanatory variable be?
Definition
If the slope is positive, the predicted variable increases as the explanatory variable increases
Term
If a regression line correlation coefficient is negative, what will the relationship between the predicted and explanatory variable be?
Definition
If the slope is negative, the predicted variable decreases as the explanatory variable increases
Term
What is the range for correlation coefficient?
Definition
-1 to +1
Term
What correlation coefficient values indicate little or no relationship between the variables, fair degree of association, moderate degree of association and a strong degree of association?
Definition
Little or no relationship = (0 to +/- 0.25).
Fair degree of association = (+/- 0.25 to +/- 0.50).
Moderate degree of association = (+/- 0.50 to +/- 0.75).
Strong degree of association = (+/- 0.75 to +/- 1.0).
Term
:validity that refers to generalizability of the results obtained from the study
Definition
External validity
Term
:validity that refers to methodology utilized in the study
Definition
Internal validity
Term
What are the three general categories of threats to internal validity?
Definition
Chance, Bias and Confounding variables
Term
:“Any systematic error in design, conduct or analysis of a study that results in a mistaken estimate of an exposure’s effect on the risk of disease”.
Definition
Definition of Bias according to our book, Gordis.
Term
:kind of bias resulting from misclassifying exposure or outcome status (predicted variable)
Definition
Misclassification bias
Term
:misclassification that occurs in the same proportion in each group being studied
Definition
Random (nondifferential) misclassification
Term
:misclassification that occurs in different proportions in each group
Definition
Non-random (differential) misclassification
Term
:bias that occurs when one group is followed more closely than the other group.
Would this result in random or non-random misclassification?
Definition
Surveillance or detection bias. Would result in non-random misclassification.
Term
:bias that may result due to a potentially relevant exposure that may be remembered by a “case” and be forgotten by a “control”.
What type of missclassification is this an example of?
Definition
Recall bias. It is an example of non-random misclassification.
Term
:bias that may result as subjects may not be willing to report an exposure accurately
Definition
Reporting bias
Term
:bias that result if data collection methods differ between groups.
What type of misclassification is this?
Definition
Interviewer bias. This is non-random misclassification.
Term
:bias occuring due to the way in which cases and controls, or exposed and non-exposed individuals, are selected is such that an apparent association is observed even when in reality, the exposure and the disease is not associated
Definition
Selection bias
Term
:bias occuring when there is a third factor between a variable and the exposure status that is an independent risk factor for the outcome but is not taken into account for the study.
Definition
Confounding variable/bias
Term
What is the major threat to external validity? Does this invalidate the study results?
Definition
Interaction or Effect Modification. It does not invalidate the study results, it just needs to be identified.
Term
What are the differences between Confounding variables and Effect Modification/Interaction?
Definition
Confounding variables need to be eliminated and Effect Modification just needs to be described and reported.
Term
How is the correlation coefficient notated?
Definition
a little (r)
Term
Describe the independent variable. What are some other names for it?
Definition
Independent variables are the variables that the study designer want to examine where typically one group has the independent variable and the other does not. Other names include: factor variable, explanatory variable and exposure variable.
Term
Describe the dependent variable. What are some other names used for the dependent variable?
Definition
Dependent variable is what the study designer wishes to observe as an outcome, possibly due to the independent variable. Other names include: Predicted variable and outcome variable.
Term
What is quantitative data?
Definition
They are measures that occur on a natural scale and include ratio and interval data.
Term
What is qualitative data?
Definition
It is categorical data that is measures by classification only and examples are ordinal and nominal data.
Term
Which test would you do if you have one sample and the data is continuous?
Definition
1-sample t-test
Term
Which test would you do if you have two samples and the data is continuous?
Definition
2-sample t-test
Term
Which test would you do if you have paired data and it is continuous?
Definition
Paired t-test
Term
Which test would you do if you have three or more samples and the data is continuous?
Definition
ANOVA
Term
What type of error results from Random misclassification?
Definition
Type II or Beta error bc it makes you accept the null hypothesis even if it is false.
Term
What type of error results from Non-random missclassification?
Definition
It can be either a Type I or a Type II error.
Term
After deriving the null and alternative hypotheses from your research question, describe the next steps in statistical analysis.
Definition
Set alpha, Generate your statistics, compare your statistics to known distributions to find your p-value, and reject or fail to reject the null.
Term
Describe Chebyshev's rule of 3 standard deviations.
Definition
1 = No useful info.
2 = At least 75%
3 = At least 8/9.
Term
How do you calculate class relative frequency?
Definition
class frequency / (n)
Term
How do you calculate class percentage?
Definition
class relative frequency x 100
Supporting users have an ad free experience!