Shared Flashcard Set

Details

Biostatistics
Biostatistics
79
Medical
Graduate
01/07/2010

Additional Medical Flashcards

 


 

Cards

Term
What is biostatistics?
Definition
Application of statistical reasoning and methods to biological, medical and public health problems.
Term
What is the role of biostatistics?
Definition
1. To design studies
2. To develop hypothesis.
3. Descriptive statistics: To describe data (exploratory data analysis: order, group, summarize and graph).
4. Hipothesis testing: To apply statistical methods to test competing hypothesis
Term
What is bias?
Definition
It refers to the difference between the true value and the observed or measured value.
Term
What is variance?
Definition
Variation that exists between measurements about their average (sigma (population) or s (for sample).
Term
What types of studies exist?
Definition
1. Experimental: RCT and basic science.
2. Observation: longitudial, cross-sectional, retro-prospective.
Term
What is the median?
Definition
It is a measure of central tendency. It is the values that represents 50th percentile (p50) or the second quintile (q2). 50% of the observations will fall below this value. It is more stable than the mean because it is insensitive to outliers.
Term
What is the relationship between mean and median in a positively skewed distribution (skewed to the right)?
Definition
mean>median
Term
What does the relationship between the mean and the median tells us?
Definition
This relationship can be used to assess the symmetry of the distribution.
Term
What is the use of a Logarithmic Scale?
Definition
A Log Scale uses a constant multiplier on the Y axis. Allows for plotting in the same graph changes of different magnitudes (eg biological). It is also used for data analysis and transformation.
Term
What is p values and what is assumed for its calculation?
Definition
It is a probability. Pr (sample statistic is >= to the observed statistic | Ho is true).
The assumptions to estimate p are:
1. Random sampling.
2. Assumptions made on statistical model are valid.
3. No other bias is present.
Term
What is probability?
Definition
It is a measure of uncertainty associated with the occurrence of an event. It ranges from 0 to 1.
Term
When are 2 outcomes statistically independent?
Definition
Two outcomes are statistically independent if and only if Pr (A and B)= Pr (A)* Pr (B). This is a JOINT Pr.
Term
When are two outcomes mutually exclusive?
Definition
Two outcomes are mutually exclusive if and only if Pr (A and B)= 0. This is a joint probability.
Term
What is the addition rule of Probability?
Definition
Addition rule (Or): Pr (A or B)= Pr (A)+Pr(B)- Pr (A and B). The exception is when A and B are mutually exclusive.
Term
What is the conditional rule of probability?
Definition
Pr(A|B). Pr (A and B)/Pr(B). B has to be different of 0.
Term
What is the multiplication rule of probability?
Definition
Pr (A and B)= Pr (B) * Pr (A|B).
EXCEPTION: When A and B are Independent (Pr (A and B)= Pr(A) * Pr (B)
Term
What are the 3 rules of probability?
Definition
1. Addition (Or)
2. Conditional (Given)
3. Multiplication (and)
Term
What is Bayes rule?
Definition
It is a conditional probability that is useful when not all data is available.
Term
What is a probability distribution?
Definition
It is a list of probabilities from all the possible values that a random variable. Some of the most common Probability Distributions are:
1. For Discrete Variables (Gaps):
Binomial (Dichotomous outcomes)
Poisson (rare events)
2. For continuous outcomes: Gaussian and exponential.
Term
What is a permutation?
Definition
It is a counting technique. In a permutation, the order of the events matters.
Term
What is a combination?
Definition
It is a counting technique. Order of events DOES NOT matter.
Term
What conditions need to be met to use Poisson as an approximation to the binomial?
Definition
N has to be large (n>20)
p has to be small
Term
What conditions need to be met to use the normal distribution as an approximation of the binomial?
Definition
np>5
Term
What is a Q-Q plot?
Definition
Visual aid that allows to compare multiple percentiles. It is used to compare distributions. When two distributions are equal, observation will fall on a straight line. Any variation from the normal line might indicate difference in the spread or shape. With a Q-Q plot, we can compare sample distribution vs theoretical, sample vs sample, and theo vs theoretical.
Term
What is statistical inference?
Definition
It refers to the methods that are applied to information that is drawn from a sample to make inferences about a population.
Term
What is a parameter?
Definition
It is a numerical descriptor that refers to the population (in Greek)
Term
What is a statistic?
Definition
Numerical descriptor of a sample.
Term
Name the different types of randomization that can be used in a clinical trial?
Definition
1. Unrestricted randomization
2. Restricted randomization: used in small trials. Will assure that groups are balanced.
3. Stratified randomization
4. Matched paired randomization: used to produce balance in the composition of the groups on which matching is made
Term
What is the use of randomization?
Definition
It is a method used in the design of a study to adjust for known and unknown confounders. Only when the units (individual, family community) under study are randomized, we can be certain that the observed changes are due to the intervention and not to underlying differences of the units. Whenever possible, an intervention should be randomized and double blinded.
Term
Name some of the most important sample distributions?
Definition
1. sample mean.
2. Difference of 2 sample means.
3. Proportion.
4. Difference of 2 sample proportions.
The first two are used for continuous variables and the second two for binary outcomes.
Term
What assumptions are made regarding the Central Limit Theorem?
Definition
1. Sample mean=pop mean
2. Sample sd= pop sd
3. Distribution of values of sample mean will be approximately normal.
The CLT is used when sampling from a non-normal distribution with a large n. This distribution is important because many of the distributions are not normally distributed, but we would still like to be able to make inference about the population.
Term
What is reflected by the 95% CI?
Definition
The interval represents the uncertainity associated with a point estimate.
Term
What is hypothesis testing?
Definition
Statistical aid that helps to decide between competing hypothesis by examining a sample from a population.
Term
What are the steps to perform a hypothesis test?
Definition
1. Select Pr model
2. Set up the Ho based on the problem being investigated. Set Ha deciding if the test will be 1 sided or 2 sided.
3. Select test statistic Z or t
4. Select critical region ALFA
5. Compare the observed value to the hypothesized value.
6. Make statistical decision and conclusion.
3.
Term
What happens to the CI when t is used instead of Z?
Definition
Z is used to calculate the CI whenever the SD is available, if not, you need to use t as the test statistic. When using t, the CI will be wider given that less information is available. As N gets larger, t will approximate z.
Term
When is the sampling distribution normal of approximately normal?
Definition
The sampling distribution is normal or approx. normall when:
1. Sample is taken from a population with a normal distribution.
2. When N is large so Central Limit Theorem hold.
Term
What is the Variance Ratio Test?
Definition
It is an F test that is used to test the Ho for equal variance. Based on this result, we can decide if we pool the variance of two samples or if we do not pool.
Term
What is the sample statistic for pre-post designs?
Definition
d. Remember, here samples are not independent
Term
What is ANOVA?
Definition
It is the Analysis of Variance. It is a statistical technique for comparing means of multiple populations by partitioning different information in the datasets by sources of variability: between and within.
The F test provided information of between group to within group variability.
Term
Which are the two criteria needed to determine the sample size?
Definition
1. Precision: how much variability can be tolerated around the CI. 2. Power: Pr (Reject Ho|Ha is true). Thus, a power of 80% means that if the difference between 2 groups is what was expected, 4 out of 5 times that the study is conducted, we will be getting significant results.
Term
What is a regression analysis?
Definition
It is a statistical method that allows us to describe a response or outcome (Y) as a simple function of an outcome or a predictor variable (X)
Term
What is a regression analysis?
Definition
It is a statistical method that allows us to describe a response or outcome (Y) as a simple function of an outcome or a predictor variable (X)
Term
What is an Adjusted Variable Plot?
Definition
Visual Aid used to assess linearity, patterns and outliers in a model.
Term
What are inferences procedures that can be used for all 4 models?
Definition
1. Estimate Bj, SE(Bj) - 95% CI, Hypot test, p value
2. Estimate linear combination (lincom), which allows to combine multiple coefficients. Useful for splines and interaction terms. - 95% CI, Hypot test, p value.
3. Compare extended vs null model: test hypothesis that multiple Bj equal 0
Term
How are inferences for a Bj done in an MLR?
Definition
Using a partial t-test (only used for linear regression).
Term
How are inferences for a Bj done in an Cox, LR or LLR?
Definition
NO T TEST HERE. We use a Wald test or Z test.
Term
What are the inferences for Bj+Bj in a Regression Model?
Definition
Use lincom for hypo test about a specific linear combination of B's.
In an MLR, the test is set as a t
In all the other we use a Z.
Ho: Bj+Bj=0
Term
To test null vs extended?
Definition
We would use a t test for MLR. For other models we would test this hypothesis with a LRT. F test is not applicable with this models (LR, LLR, Cox).
Term
How can we decide if the variable is a confounder or a mediator?
Definition
This decision is not taken with statistics, this decision is based on prior knowledge.
Term
What is effect modification?
Definition
It is when the coefficient for X variable differs depending on the value of one or more Xs. This concept applies to all 4 models.
Term
What type of variables will ANOVA allows to explore?
Definition
Categorical variables
Term
What will ANCOVA allow us to explore?
Definition
Interaction
Term
How do we check for model fit with an MLR?
Definition
1. REsiduals Plots:
2. AVP
We want to see non linear patterns, influential points, variance
Term
How do we check for model fit with an LR?
Definition
1. Inspect observed vs predicted values.
2. Hosmer-Lameshor goodness of fit.
Look for patterns, influential points and changing variance.
Check influence of influential points.
Term
How do we check for model fit with an LLR and Cox?
Definition
Use Complementary Log Log plots.
Term
How do we select a regression model?
Definition
Question of interest.
Purpose
Check for model fit
Criteria used: cross validated mesures, AIC for all 4 models. Do not use R squared
Term
What is the binomial distribution
Definition
it is a probability distribution for a series of random events, each of which can only have 2 values.
It assumes that there are only 2 outcomes, pr of ocurrance of event is equal, independence of events.
Term
What is the CLT?
Definition
Given a population of any non-normal distribution, the sampling distribution of the sample mean, computed from all possible values of size n from this pop. will be approximately normal
Term
What is the Maximum Likelihood Estimate?
Definition
It is the best estimate of the parameter based on the statistic.
Term
What is the interpretation of the 95% CI?
Definition
We are 95% confident that the interval covers the true population mean.
Term
What are the properties of the t distribution?
Definition
mean=median=mode
symetrical about the mean
family distribution determined by n-1 df
approaches n as n-1 approaches infinity
Term
What is bootstraping?
Definition
It is a statistical technique where the sample is treated as if it was the whole population. A random sample is taken with replacement. Process repeated 1000x and an histogram is made. It will approximate the sampling distribution of the statistic
Term
What is the F test in ANOVA?
Definition
It is a global test. Ho is for equality of all means.
Term
What are the assumptions for ANOVA?
Definition
1. Obs are independent
2. Constant variance
3. Distribution approx normal
Term
What are the steps for an ANOVA?
Definition
1. Bartlett's test for EQUAL VARIANCE (ANOVA ASSUMTION).
2. F test for equal means.
3. Estimate difference by multiple comparisions with Bonferroni for all possible pairwise comparisions.
Term
What is a correlation analysis (r)?
Definition
Analysis that shows direction and strength of association between X and Y. -1 is perfect negative, 0 no linear relation, 1 is positive linear relation.
Value of r is independent of units
r is substantially influenced by small fraction of outliers
Term
What is residual analysis?
Definition
Is a check on the assumptions of a regression.
Check:
1. Residuals normally dist. on histogram
2. Random scatter on plot of residuals vs X
3. Random scatter on plot of residuals vs fitted
If assumptions fail:
Look for outliers
Transform
Term
What is the coefficient of determination (r squared)?
Definition
It is the level of variation in Y explained by X. This is not a good measure for selecting a model.
Term
What is the principle objective of many intervention trials?
Definition
To estimate the size of the effect of the intervention on the outcomes. This estimate is subject to error, which derives from bias and sampling error (usually decreased when n is increased. Bias is not modified by this).
Term
Name the 2 criteria used to determine sample size
Definition
1. Precision: how accurate your estimate needs to be. This is observed in the range of the CI around the estimate. The narrower the CI the less occurrence of sampling error.
2. Power: alternative, use power needed to detect effect of a given magnitude. Power depends on: delta, alfa, n and if test is one or two sided.
Term
What is a power curve?
Definition
Power curves are visual aids that are used to aid researchers when deciding between sample size or power. This are usually constucted for 1 ore 2 key outcomes
Term
Aim of sample size calculation based on hypothesis testing
Definition
Have large enough samples to detect a
difference in population means (or in population
proportions)
Term
What is the Aim of sample size calculation based on precision?
Definition
have a large enough sample with
which to estimate a population mean (or
difference in means) or proportion (or difference in proportions) within a narrow interval with high reliability
Term
What are the Ho for sample size calculation based on HT for one and 2 samples
Definition
Δ = μa - μ0 or Δ = pa-p0 for one sample
Δ = μ1 - μ2 or Δ = p1-p2 for two samples
Term
What is the goal of sample size calculation?
Definition
Perform a study with large enough sample
size and sufficient power to detect
(through hypothesis testing) a meaningful difference Δ
Term
On what should sample size be based?
Definition
Sample size calculation should be
informed by previous investigations
Term
Other that statistical basis, on what else should sample size be determined?
Definition
Choice of sample size depends on a balance of reasonable assumptions, time, effort, and expense
Term
What can be some of the potential effects of having a clinical trial with a small sample size?
Definition
No effect but wide CI
Large effect but no power to detect delta
Term
What are other factors affecting sample size?
Definition
Interim analysis
Equivalence trials (large sample size)
Loss to follow-up
Supporting users have an ad free experience!