Term
|
Definition
The probability of making a type II error which the experimenter is willing to accept. Rate is set by the experimenter. Often set at 5% (0.05), however there is nothing "magical" about this value, it is simply a commonly used value, first recommended by R.A. Fisher in 1926. 5% of the time we may see differences that are due to chance and not the treatment, and we accept that we may reject the null hypothesis when it is actually true. |
|
|
Term
|
Definition
A component of validity. Not to be confused with precision. How unbiased and true the measure is. How close the measure is to the true value of the measure. |
|
|
Term
Alternate hypothesis (Ha) |
|
Definition
Part of a statistical hypothesis. There is no specific alternate hypothesis.
Thing 1 ≠ Thing 2
Thing 1 > Thing 2 |
|
|
Term
|
Definition
A test for normality. Focuses on deviations in tails.
A2 = (average of differences)2 |
|
|
Term
|
Definition
Analysis of variance
The data must have a Gaussian distribution to use the ANOVA test. |
|
|
Term
|
Definition
A non-Gaussian distribution. A distribution in which the mean, median, and mode are different values. May be left skewed (mode > median > mean) or right-skewed (mean > median > mode). |
|
|
Term
|
Definition
A non-Gaussian distribution. A probability density function. Examples include proportion of leaf area affected. |
|
|
Term
|
Definition
A non-Gaussian distribution. The number of successes out of total rials. Examples include % of seeds that germinated. |
|
|
Term
|
Definition
Provides local control of the environment to reduce experimental error. The experimental units are grouped in such a way where the variation of experimental units within the blocks is less than the variation among all the units before blocking. Goes hand-in-hand with the selection of homogenous experimental units. Treatments are compared with one another within the blocks, in a more uniform environment. We can separate the variation due to blocks or groups or environment from the variation due to the treatment effects. Criteria for blocks includes proximity, physical characteristics, time, or management tasks in an experiment. |
|
|
Term
|
Definition
A goodness of fit test for continuous data. the most common type is the Pearson χ2, which is sensitive; if you change one value by even a small amount, it can dramatically alter the end results. Corrections include the Yates continuity correction, maximum likelihood of chi-square, and Fisher's exact test. Evaluates whether the observed cell frequencies are different than expected cell frequencies. The smaller the value, the closer the observed and expected frequencies are, and the better the fit. If the value is zero, the observed and expected frequencies are equal. The p-value associated with the statistic and the degrees of freedom may be found in a chart, or calculated on SAS. The null hypothesis is that the observed frequency and expected frequency are the same.
χ2 = Σ((observed - expected)2 / expected) |
|
|
Term
Completely randomized design (CRD) |
|
Definition
An experimental design. Treatments are randomly assigned to experimental units. Variation is partitioned into one source: the treatments. Uses a fixed effects model; treatment effect is fixed. Can used the F-test ratio. A good design if experimental units are homogenous. No control over the experimental error variance. Where Yij is the observation on the jth experimental unit of the ith treatment, μ is the overall mean, ti is the ith treatment effect, and eij is the random error of the jth experimental unit on the ith treatment:
Yij = μ + ti + eij |
|
|
Term
|
Definition
It is symmetric in log scale, and may not be in the data scale. If the confidence interval includes the value of 1 in log scale, or 0 in data scale, we can accept the null hypothesis. |
|
|
Term
|
Definition
A component of reliability. When repeated measures produce the same results. |
|
|
Term
|
Definition
Crosstab table
A table containing two or more categorical variables. We want to examine the variables at the same time. May be frequency-by-frequency. If there are a lot of variables the table may be difficult to display. |
|
|
Term
|
Definition
A comparison of odds from data in a contingency table. Includes the odds ratio. The null hypothesis is that there is no association between the two variables or two measures. The alternate hypothesis is that there is an association between the two variables or measures. Used for categorical data, such as ordinal or nominal data. |
|
|
Term
|
Definition
Interval data
Ratio data
Items that are measured with a continuous scale. You can calculate the average and it has meaning. You can use a test for normality to see if the sample is representative of the larger population. Examples include weight, height, age, temperature, etc. |
|
|
Term
|
Definition
A test for normality.
W2 = (sum of differences)2 |
|
|
Term
|
Definition
Includes quantitative and qualitative data, as well as continuous, discrete, textual, temporal, and spatial data. Different data types have different distributions and call for different types of statistical analysis. Each type of distribution has its own set of properties and appropriate statistical tests. One statistical test cannot fit all data types and/or distributions. |
|
|
Term
|
Definition
Used to find the correct critical value of a t-statistic in a table. The number of groups minus one.
df = (n1 + n2) - 2 |
|
|
Term
|
Definition
Categorical data
Includes nominal and ordinal data. Each datapoint can belong to only one group or category. You may be able to calculate a mean, but it will have no meaning. Cannot be used in a test for normality. May be used in an odds ratio. Typically has a Poisson distribution. |
|
|
Term
|
Definition
Describes the size of the difference between Thing 1 and Thing 2, when the null hypothesis is rejected. Related to the power of a test. |
|
|
Term
Empirical distribution function (EDF) |
|
Definition
Used in the calculation of the Shapiro-Wilk and Kolmogorov-Smirnov tests. |
|
|
Term
|
Definition
Based on the research question. Drives what data to collect and analyse. The process of planning a study to meet specific objectives. The experiment should be designed to match a specified research question. Experimental designs include CRD, RCBD, split-plot, strip-plot, and repeated measures. Steps include:
1. Define experimental unit
2. Identify types of variables you are collecting
3. Define the treatment structure
4. Define the overall structure |
|
|
Term
|
Definition
A factor in the power of the test. Related to experimental design, data collection, and analysis. When experimental error decreases, the power of the test increases. Must be controlled or explained. A measure of the variation that exists among observations taken on experimental units that are treated alike. Sources come from natural variation among experimental units, variability of measurements taken, inability to reproduce treatment conditions exactly, interactions between treatments and experimental units, and any other extraneous factors that may influence the response. Fisher concentrated on experimental error. |
|
|
Term
|
Definition
The unit to which the treatment is applied. Ideally, they should be as homogenous as possible treated the same, with the same environment. Measures should be taken the same way for all. |
|
|
Term
|
Definition
A non-Gaussian distribution. The time between events. Examples include time to flower. |
|
|
Term
|
Definition
A correction for the chi-square test. Used when there are only two groups, and they have small sample sizes, of 10 or less. |
|
|
Term
|
Definition
In a mixed model, the coefficients which provide the minimum least square deviations between observed and the predicted observations by the model. |
|
|
Term
|
Definition
Normal distribution.
A classic bell curve. Described by the mean of the population (μ) and sample (y bar), and variance of the population (σ2) and sample (s2). The mean, median, and mode are equal. A perfectly symmetrical distribution. When testing the difference between two treatments, you do not want the data to be normal; you want the treatment to affect the population. You should always check to see if residuals are normally distributed using a test for normality. Normality is subjective.
Mean ± 1 σ = 68.27% of population
Mean ± 1.96 σ = 95% of population
Mean ± 2 σ = 95.44% of population
Mean ± 3 σ = 99.73% of population |
|
|
Term
|
Definition
Relates to the data types. Makes inferences about whether the sample is truly representative of the ratios seen in the population. Tells how well the data fits to a predetermined ratio, unknown ratio, or distribution of responses. Includes the Chi-square test. |
|
|
Term
|
Definition
When you repeat the study several times, it determines if it is okay to pool these separate datasets. When data is pooled together, there is a larger sample, and the test is more powerful. However, if the data is heterogeneous, we can lose information, and we can't tell whether one replicate of the study reacted differently than others. The null hypothesis is that the populations have a 1:1 ratio, and the alternate is that their ratio is other than 1:1. Includes the Satterwhite test. |
|
|
Term
|
Definition
Includes science hypotheses and statistical hypotheses. |
|
|
Term
|
Definition
The number of decimal places in the measured values. The first non-zero value in the standard error. |
|
|
Term
|
Definition
A test for normality. Focuses on deviations near the centre.
D = largest difference between EDF and normal |
|
|
Term
Least significant difference (LSD) |
|
Definition
An approach used as a crude, inexact way to quickly determine whether reported differences are different or not. The null hypothesis should be accepted if the range of the data is less than the LSD. Developed by R.A. Fisher. Can only be used when there are two treatments.
LSD = (critical t-statistic)(sed)
If n is the same in both groups,
LSD = (critical t-statistic)(√2)(sem) ≅ 3 se |
|
|
Term
|
Definition
An asymmetric distribution in the negative direction. The long tail is to the left. The mean is less than the median. |
|
|
Term
|
Definition
Invented by Fisher. Accounts for the total variation: treatments, treatment interactions, design, covariance, and unknown. The unknown component is random error. Experimental error is split into treatments, treatment interactions, experimental design, covariances, and unknown error. We can account for the variation of all components except unknown error. Solved using two steps:
1. Fixed effects are solved
2. Random effects are solved |
|
|
Term
Log odds ratio (ln(odds)) |
|
Definition
The ratio of the natural log of the odds from a contingency table. If the sample size is adequate, distribution of the log estimates will be normal. If the odds are the same, the log odds ratio will be zero. Used to determine the significance of differences and estimates of the confidence interval. Estimates are converted back into data scale for presentation. The confidence interval will be symmetrical. |
|
|
Term
|
Definition
A non-Gaussian distribution. A log transformed distribution. |
|
|
Term
Maximum likelihood chi-square |
|
Definition
A correction for the chi-square test. Converts the Yates continuity correction into log scale.
χ2 = 2(Σ((observed)(ln(observed / expected)))) |
|
|
Term
|
Definition
Includes planned and unplanned comparisons. |
|
|
Term
Multinominal distribution |
|
Definition
A non-Gaussian distribution. Data where there are more than two possible outcomes. May be ordinal or nominal in nature. Examples include infection type categories. |
|
|
Term
|
Definition
Discrete data in which groups or levels have no relationship between them. Examples include yes/no, male/female, sitting/standing, wet/dry, etc. |
|
|
Term
|
Definition
Part of a statistical hypothesis. The decision is always about the null hypothesis; it is either accepted or rejected, with no condition or adjective added to qualify the hypothesis or effect size. If you reject the null hypothesis, there is a difference, and the alternate hypothesis is accepted. If you accept the null hypothesis, there is no difference.
Thing 1 = Thing 2 |
|
|
Term
|
Definition
The number of "something happening" divided by "something not happening". |
|
|
Term
|
Definition
The ratio of odds from a contingency table. Does not tell the distribution of estimates, or significance of their difference. May be converted to a log odds ratio. If the odds are the same, the ratio will be 1: there is an equal likelihood in both treatments. Assumes that the distribution of observed values follows a Poisson distribution. The confidence interval will be asymmetric. |
|
|
Term
|
Definition
Discrete data in which groups or levels have a relationship among them, or an order to them. Example include XS/S/M/L/X, 1st/2nd/3rd/4th year, number of leaves on a plant, etc. |
|
|
Term
|
Definition
The calculated probability of making a type I error. If the P-value is less than the designated α value, the null hypothesis is rejected. If it is larger than the α value, it is accepted. If your p-value is very close to the set α value,it may be wise to change the α value. This is where personal feelings give leeway to statistical analysis. |
|
|
Term
|
Definition
A non-Gaussian distribution. Used for count data. Examples include number of weeds in a plot. The typical distribution of categorical data. The probability of the number of independent events. The mean is equal to the variance. You cannot conduct statistical analysis; it must first be converted to a log scale. |
|
|
Term
Power of the test (1 - β) |
|
Definition
The probability of correctly rejecting the null hypothesis when it is false. Each comparison in an experiment has its own power. There is no single power for an analysis. Rate is based on the comparison. Related to sample size, effect size, variation of the outcome variable, and the p-value. To increase the power of the test, you can increase sample size (n) or reduce experimental error (σ). The effect size cannot be changed by the experimenter, however when the difference is greater, the power of the test increases. Usually we try to attain a power of 0.8, so that 80% of the time we are confident that the test will correctly reject the null hypothesis when it is false.
λ = ((μ1 - μ2)√n) / σ |
|
|
Term
|
Definition
A component of reliability. Not to be confused with accuracy. How close repeated measures are to each other. |
|
|
Term
|
Definition
"The use of inferential statistics to test for treatment effects with data from experiments where the treatments are not replicated (although samples may be) or replicates are not statistically independent" - Hurlbert (1984).
When there is only one experimental unit per treatment. |
|
|
Term
|
Definition
Data consisting of words. Measures of the quality of something. |
|
|
Term
|
Definition
Data consisting of numbers. |
|
|
Term
|
Definition
The unknown component of variance in a linear additive model. Random, independent of treatment and design, normally distributed with a mean of 0, and has common covariance (homogenous). Based on maximum likelihood estimation (REML). SAS sets a default value of 0 for all random effect coefficients. The values may be negative. |
|
|
Term
Randomized complete block design (RCBD) |
|
Definition
Gives more control over the experimental error variance by grouping experimental units into homogenous groups. The simplest blocking design used to control and reduce experimental error. The experimental units are grouped into blocks of homogenous units. Each treatment is randomly assigned an equal number of experimental units in each block.
Yij = μ + Trmti + Blockj + eij |
|
|
Term
|
Definition
How close the estimate is to a "good" measure. Are the measures repeatable and consistent? Encompasses precision, sensitivity, resolution, and consistency. Clear instructions on how to make the measurement is crucial. |
|
|
Term
|
Definition
Show that the results are reproducible in a given environment. Ensures that the results are "true", and not due to some unforeseen circumstance or accident. Provides an estimate of the experimental error variance, which should be similar to other studies. Increases the precision of treatments. True replicates are replications of experimental units, and not sampling units. |
|
|
Term
|
Definition
Something that we are curious about. Should follow SMART. The hypothesis is developed from the research question. If it is too specific, the research may not be interesting, or unattainable. Without a research question, experimenting is like "fishing with no goal". |
|
|
Term
|
Definition
A component of reliability. The smallest change in the measurements. |
|
|
Term
Right-skewed distribution |
|
Definition
An asymmetric distribution in the positive direction. The long tail is to the right. The mean is greater than the median. |
|
|
Term
|
Definition
In 1926 he recommended using 5% as a p-value in a passing comment when he was discussing precision of field experiments in agriculture. Later when the value became widely used, he apologized for his comment. Also developed the quick rough estimation method of multiplying the standard error by 3 to calculate LSD. |
|
|
Term
|
Definition
Depends on the variance of the outcome variable, the effect size, significance level of the test, and power of the test. Sample size increases as variance, effect size, and power of the test increase, and as the significance level decreases. |
|
|
Term
|
Definition
A fraction of an experimental unit. |
|
|
Term
|
Definition
A software package crucial for conducting statistics in plant agriculture. Can be a "black box" that does functions without the experimenter understanding it. |
|
|
Term
|
Definition
A hypothesis which asks how things work. Provided this scenario, this will happen. You will never say that "nothing" happened. |
|
|
Term
|
Definition
A component of validity. When the measure reflects what is being tested. |
|
|
Term
Se of the difference (SED) |
|
Definition
A type of standard error. Used in calculating the t-statistic.
sed = √(s2((1 / n1) + (1 / n2)))
If n is the same in both groups,
sed = √(2s2 / n) = (√2)(se) |
|
|
Term
|
Definition
A component of reliability. Different things need to be measured with different units to reflect their effect size. |
|
|
Term
|
Definition
A test for normality. Has the greatest power, and is a robust test for most applications, except for extremely large datasets. The value will be close to 1 if the distribution is normal, or greater than 0.8. If the sample size is small or there are long tails, it may say there is a normal distribution when there isn't; look at the graph to check. |
|
|
Term
|
Definition
Add one more decimal to the value of the measures (the implied limit), so that ties can be broken. Means should have no more than the implied limit. Standard error and LSD should have one more than the implied limit. Do not round the p-value: it always has 4 decimals. |
|
|
Term
|
Definition
S, specific
M, measurable
A, attainable and achievable
R, realistic, relevant, and reasonable
T, time-based, timely, and tangible |
|
|
Term
|
Definition
Data collected at different places in space. Examples include measurements at different depths in a soil core sample. |
|
|
Term
|
Definition
A component of validity. When the measure describes only one thing. |
|
|
Term
Standard deviation (sd, StdDev) |
|
Definition
Relates to a population and its distribution. Always larger than standard error. Calculated when you have a large dataset that represents the whole population.
sd = √(s2) |
|
|
Term
Standard error (se, StdError) |
|
Definition
Se of the mean (SEM). Relates to a statistic of a sample, such as a mean, and its distribution. Includes standard error of the mean (SEM). The measurement is one of many possible measurements from the population.
se = √(s2 / n) |
|
|
Term
Standard operating procedure (SOP) |
|
Definition
The instructions for an experiment, when there is more than one person taking measurements. Needs to be clear to provide reliability. Blocking may be used to account for the measurements of each person. |
|
|
Term
|
Definition
A hypothesis associated with a specific statistical test. Includes a null hypothesis and an alternate hypothesis. |
|
|
Term
|
Definition
The practice or science of collecting and analysing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative sample. A branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data. Telling a story, but telling the "true" story based on evidence. Bad statistics involves manipulating the data in order to tell the story that you want to tell, rather than the truth. Statistics is necessary for doing most types of research. You should consider the statistical analysis before you collect data. |
|
|
Term
|
Definition
Student's t-distribution
A test to compare means. Developed by W.S. Gosset in 1906 with funding from Guinness. It was developed to select consistent samples of barley for use in brewing beer. First published in Biometrika in March 1908. Used to test small samples. The ratio of "how big" and "how confident'. The bigger the difference, and/or greater the confidence (lower se), the larger the t-statistic will be. Assumes that variation is the same between both means; error variance is the same. Historically, the value of the t-statistic was compared with its critical value from a table, based on the degrees of freedom. Today, software packages provide the calculated p-value. Not a good test for complex research models. Can only be used if there are two treatments. The null hypothesis is that the means are equal.
t-statistic = (difference of means)/(sed) |
|
|
Term
|
Definition
Time-based information
Adds a dimension to any data collected, whether it be qualitative or quantitative in nature. Examples include weights taken every month, thoughts recorded every hour, etc. |
|
|
Term
|
Definition
Checks if the data follows a normal distribution. The null hypothesis is that the distribution of the sample is equal to a normal distribution. Tests for normality include Shapiro-Wilk, Kolmogorovo-Smirnov, Cramer-van Mises, and Anderson-Darling tests. In SAS, the proc univariate function is a test for normality. Conducted on the residuals of the data, not the raw data itself. The most common test for normality is the Shapiro-Wilk test. |
|
|
Term
|
Definition
An unstructured stream of words. Purely qualitative data. Examples include open-ended survey questions, and descriptions of environments. |
|
|
Term
|
Definition
How you apply treatments across experimental units. Includes factorial, fixed effects only, random effects only, and fractional factorial. |
|
|
Term
|
Definition
Adjusts for multiple comparisons. Used when there are many variables. Adjusts the p-value to be more conservative. |
|
|
Term
|
Definition
False positive
When you reject the null hypothesis when ti is true. In actuality Thing 1 and Thing 2 are the same. Its probability is α. |
|
|
Term
|
Definition
False negative
When you accept the null hypothesis when it is false. In actuality Thing 1 and Thing 2 are different. Its probability is β, related to the power of the test. |
|
|
Term
|
Definition
How close the estimate is to the "right" measure. Does the measurement actually measure what you think you are measuring? Encompasses accuracy, specificity, and scientific validity. |
|
|
Term
|
Definition
The variation between experimental units. Will not give differences of the treatments. May be accounted for using a linear additive model. |
|
|
Term
|
Definition
In 1972, Anderson et al conducted a double-blind study with 1,000 adults, 818 of which completed the study. Each subject received 500 tablets. Half the subjects got a placebo of Na ascorbate and artificial orange flavouring, the other half received vitamin C tablets. They took 4 tablets a day from December to March. Incidence and duration of colds were monitored. Found that the odds of getting a cold on the placebo were 1.5 times higher than on vitamin C treatment. |
|
|
Term
|
Definition
Developed the t-test in 1906 using funding from Guinness. Because Guinness didn't want the test to be used by other companies, they wouldn't allow him to publish the test. He had to publish it under the pseudonym Student. |
|
|
Term
Yates continuity correction |
|
Definition
A correction for the chi-square test that makes it less sensitive to small changes in data.
χ2 = Σ((|observed - expected| - 0.05)2 / expected) |
|
|