Term
|
Definition
The __________ of a sample refers to the method used to choose the sample from the population. |
|
|
Term
|
Definition
Observational studies of the effect of one variable on another often fail because the explanatory variable is confounded with __________. Well-designed experiments take steps to defeat confounding. |
|
|
Term
|
Definition
A stratified sampling design can produce more exact information than a simple random sample of the same size by taking advantage of the fact that individuals in the same __________ are similar to one another. |
|
|
Term
|
Definition
The wording of a question is the most important influence on the answers given to a sample survey. Confusing or loaded questions can introduce strong __________. Never trust the results of a sample survey until you have read the actual questions posed! |
|
|
Term
|
Definition
A statistically significant association in data from a well-designed experiment does imply __________. |
|
|
Term
|
Definition
An observational study is a poor way to gauge the effect of an intervention. To see the response to a change, we must actually impose the change. When our goal is to understand cause and effect, a/an __________ is the only source of fully convincing data. |
|
|
Term
voluntary response sample |
|
Definition
A/an __________ is biased because people with strong opinions, especially negative opinions, are most likely to respond |
|
|
Term
|
Definition
In a voluntary response sample, people choose whether to respond. In a convenience sample, the interviewer makes the choice. In both cases, personal choice produces bias. The statistician’s remedy is to choose the sample by __________. This ensures that neither favoritism by the sampler nor self-selection by respondents takes place in selecting the sample. |
|
|
Term
|
Definition
A simple random sample gives each individual an equal chance to be chosen. It also gives every possible __________ an equal chance to be chosen |
|
|
Term
|
Definition
That some respondents lie, especially when asked about illegal or unpopular behavior, is an example of __________. The sample then underestimates the occurrence of such behavior in the population. |
|
|
Term
|
Definition
Properly designed samples avoid systematic bias, but their results are rarely exactly correct and they vary from sample to sample. However, the results of random sampling don’t change haphazardly from sample to sample. Because we deliberately use chance to select the sample, the results obey the __________ that govern chance behavior. We can say how large an error we are likely to make in drawing conclusions about the population from a sample. |
|
|
Term
|
Definition
When conducting an experiment, we have a treatment group and a control group. The group of patients who received a sham treatment is called a control group, because it enables us to control the effects of __________ on the outcome. |
|
|
Term
|
Definition
Larger random samples give more accurate results than smaller samples. In other words, the __________ determines how close to the population truth the sample result is likely to fall. |
|
|
Term
|
Definition
Because the purpose of an experiment is to reveal the response of one variable to changes in other variables, the distinction between explanatory and response variables is essential. The explanatory variables in an experiment are often called __________. |
|
|
Term
|
Definition
Voluntary response samples and convenience samples are sampling methods which display __________, or systematic error. That is, these sampling methods systematically favor some parts of the population over others. |
|
|
Term
|
Definition
How can we assign experimental units to treatments in a way that is fair to all the treatments? The answer is the same as in sampling: let impersonal chance make the assignment. The use of chance to divide experimental units into groups is called __________. |
|
|
Term
statistically significant differences |
|
Definition
If we observe __________ among the groups in a comparative randomized experiment, then we have good evidence for a cause-and-effect relationship between the explanatory and response variables. |
|
|
Term
|
Definition
The principle of replication means that we should use enough experimental units to reduce __________. |
|
|
Term
|
Definition
The __________ avoids unconscious bias by, for example, a physician who doesn’t think that “just a placebo” can benefit the patient. |
|
|
Term
|
Definition
Sometimes in a matched-pairs design, each subject serves as his or her own control. The __________ of the treatments can influence the subject’s response. So we toss a coin to decide which treatment the subject gets first. |
|
|
Term
exploratory data analysis |
|
Definition
Statistical tools and ideas help us examine data in order to describe their main features. This examination is called __________. We use graphs and numerical summaries to describe the variables in the data set and the relations among them. |
|
|
Term
|
Definition
The distribution of a categorical variable lists the categories and gives the __________ of individuals who fall in each category |
|
|
Term
|
Definition
In any graph of data, we look for the __________ and for unusual features. |
|
|
Term
|
Definition
When comparing two histograms, we use on the __________ not the actual counts but the percents. The reason is that the two histograms may not have the same total number of counts. A histogram of percents rather than counts is also convenient when the counts are very large. |
|
|
Term
|
Definition
If we are interested in the change of a child’s height over time, we make a time plot. We plot each observation against the time at which it was measured. In this plot, we put time on the __________. |
|
|
Term
|
Definition
In conducting exploratory data analysis, we begin by examining each variable in the data set by itself. Then we move on to study the __________ among the variables. |
|
|
Term
|
Definition
When making a pie chart, you must include all the __________ that make up a whole. Bar graphs are more flexible. |
|
|
Term
|
Definition
We can describe the __________ of a distribution by giving the smallest and largest values. |
|
|
Term
|
Definition
A time plot of a variable plots each observation against the time at which it was measure. When we examine a time plot, we look for a/an __________. An example is a long-term upward or downward movement over time. |
|
|
Term
|
Definition
The __________ of a variable describes what values the variable takes and how often it takes these values. |
|
|
Term
|
Definition
In conducting exploratory data analysis, we look at a graph or graphs. Then we make __________ of specific aspects of the data for more complete description |
|
|
Term
|
Definition
The bars of a histogram should cover the entire __________ of values of a variable. Our eyes respond to the area of the bars in a histogram |
|
|
Term
|
Definition
In any graph of data, we look for an overall pattern and for striking deviations from that pattern. A/an __________ is an individual value that falls outside the overall pattern. |
|
|
Term
|
Definition
When observations on a variable are taken over time, we make a time plot that graphs time horizontally and the values of the variable vertically. A time plot can reveal a/an __________ or other changes over time. |
|
|
Term
|
Definition
A/an __________ is symmetric if the right and left sides of the histogram are approximately mirror images of each other. |
|
|
Term
|
Definition
Shape, center and spread provide a good description of the __________ of any distribution for a quantitative variable. |
|
|
Term
|
Definition
An important fact about the __________ as a measure of center is that it is sensitive to the influence of a few extreme observations |
|
|
Term
|
Definition
In a skewed distribution, the mean is farther out in the long tail than is the __________. |
|
|
Term
|
Definition
The __________ is larger than 25% of the observations. |
|
|
Term
|
Definition
The __________ of a set of observations is the average of the squares of the deviations of the observations from their mean. |
|
|
Term
|
Definition
The __________ is a measure of center that uses the actual value of each observation and is thus sensitive to extreme values. |
|
|
Term
|
Definition
The __________ is a measure of center that is resistant to outliers. It is also the second quartile. |
|
|
Term
|
Definition
The __________ of a data set consists of the smallest observation, the first quartile, the median, the third quartile and the largest observation, written in order from smallest to largest |
|
|
Term
|
Definition
A boxplot gives an indication of the symmetric or skewness of a distribution. In a/an __________, the first and third quartiles are equally distant from the median. |
|
|
Term
|
Definition
The __________ is the positive square root of the variance. |
|
|
Term
|
Definition
If the distribution is exactly symmetric, the mean and the __________ are exactly the same. |
|
|
Term
|
Definition
The __________ is a measure of spread that looks at how far the observations are from their mean. |
|
|
Term
|
Definition
In calculating the variance or the standard deviation, we use n - 1 in the formula. The number n - 1 is called the __________ of the variance or the standard deviation. |
|
|
Term
|
Definition
When the standard deviation is zero, all the observations have __________. |
|
|
Term
|
Definition
A skewed distribution that has no outliers will, in all likelihood, pull the __________ toward its long tail. |
|
|
Term
|
Definition
Because the mean is sensitive to the influence of extreme observations, we say that it is not a/an __________ of center. |
|
|
Term
|
Definition
The __________ is the midpoint of the distribution, the number such that half the observations are smaller and the other half are larger. |
|
|
Term
|
Definition
The __________ is larger than 75% of the observations. |
|
|
Term
|
Definition
The __________ of a distribution is used to construct a boxplot |
|
|
Term
|
Definition
The __________ measures spread about the mean and should be used only when the mean is chosen as the measure of center |
|
|
Term
|
Definition
Correlation and regression must be interpreted with caution. Because they describe only __________, we must not forget to first plot the data to see the form of the relationship and also to detect outliers and influential observations. |
|
|
Term
|
Definition
A strong association between X and Y does not necessarily mean that X causes Y. Indeed, the strong association may be explained by __________, and, in this scenario, the conclusion that X causes Y is either wrong or not proved. |
|
|
Term
|
Definition
__________ are straight lines that describe how a response variable Y changes as an explanatory variable X changes. |
|
|
Term
|
Definition
When working on a two-way table, to find the __________ of the row variable for one specific value of the column variable, look only at that one column in the table. Find each entry in the column as a percent of the column total. |
|
|
Term
|
Definition
A comparison between two variables that holds for each individual value of a third variable can be changed or even reversed when the data for all values of the third variable are combined. This is called __________. It is an example of the effect of lurking variables on an observed association. |
|
|
Term
|
Definition
The best way to get good evidence that X causes Y is to do an experiment in which we change X and keep __________ under control. |
|
|
Term
|
Definition
In a two-way table, sometimes the row and column totals do not match. The explanation is __________. |
|
|
Term
|
Definition
In Simpson’s paradox, the lurking variables are __________. The paradox is an extreme form of the fact that observed associations can be misleading when there are lurking variables. |
|
|
Term
|
Definition
__________ of counts organize data about two categorical variables. Values of the row variable label the rows that run across the table, and values of the column variable label the columns that run down the table. |
|
|
Term
|
Definition
The row totals and the column totals of a two-way table give the __________ of the two individual variables. It is clearer to present these distributions as percents of the table total. |
|
|
Term
|
Definition
__________ is the use of a regression line for prediction far outside the range of values of the explanatory variable X that you used to obtain the line. Such predictions are not accurate. |
|
|
Term
|
Definition
In studying a relationship between X and Y, we sometimes find that the relationship is influenced by other variables we did not measure or even think about. A/an __________ can thus falsely suggest a strong relationship between X and Y or it can hide a relationship that is really there. |
|
|
Term
cause-and-effect relationship |
|
Definition
Be careful not to conclude that there is a/an __________ between two variables just because they are strongly associated. |
|
|
Term
|
Definition
To study the relationship between two categorical variables (say, education and age group), we construct a/an __________. In this case, education becomes a row variable and age group becomes a column variable. |
|
|
Term
|
Definition
A/an __________ is an individual point that substantially changes the regression line. It is often an outlier in the X direction, but it need not have a large residual. |
|
|
Term
|
Definition
__________ that you did not measure may explain the relations between the variables that you did measure. Correlation and regression can be misleading if you ignore these variables that you did not measure. |
|
|
Term
|
Definition
In studying the relationship between two categorical variables, say education and age group, we construct a two-way table. Education is the row variable and age group is the column variable. The distribution of education alone is called a marginal distribution because it appears at the __________ of the two-way table. |
|
|
Term
|
Definition
In a two-way table, relationships between the categorical variables are described by calculating __________ from the counts given. Counts are often hard to compare. |
|
|
Term
|
Definition
An association or comparison that holds for all of several groups can reverse direction when the data are combined to form a single group. This reversal is called __________. |
|
|
Term
|
Definition
You can examine the fit of a regression line by studying the __________, which are the differences between the observed and predicted values of Y. |
|
|
Term
|
Definition
We can sometimes describe the overall pattern of a distribution by a density curve. A density curve has __________ underneath it. |
|
|
Term
|
Definition
In drawing a density curve, minor irregularities and outliers are ignored. Of course, no set of real data is exactly described by a density curve. The curve is a/an __________ that is easy to use and accurate enough for practical use. |
|
|
Term
|
Definition
The mean of a density curve can be located by eye. The mean µ is the __________ of the curve. |
|
|
Term
|
Definition
When we subtract the mean of the distribution from an observation x and then divide the difference by the standard deviation, we get what is called a __________. |
|
|
Term
|
Definition
The normal distribution is completely determined when we know its mean and __________. |
|
|
Term
|
Definition
The density curve is a/an __________ for the distribution of a quantitative variable. It is an idealized description. It gives a compact picture of the overall pattern of the data but ignores minor irregularities as well as any outliers. |
|
|
Term
|
Definition
An area under a density curve gives the __________ of observations that fall in a range of values. |
|
|
Term
|
Definition
The median of a density curve can be located by eye. The median divides the __________ under the curve in half. |
|
|
Term
change-of-curvature points |
|
Definition
On normal density curves, the standard deviation is the distance from the mean to the __________ on either side. |
|
|
Term
|
Definition
The __________ of the standard normal distribution is 1. |
|
|
Term
|
Definition
A/an __________ is an idealized description of the overall pattern of a distribution that smooths out the irregularities in the actual data. |
|
|
Term
|
Definition
The z-score of an observation x says how many __________ x lies from the mean of the distribution. |
|
|
Term
|
Definition
The __________ of the standard normal distribution is zero. |
|
|
Term
|
Definition
You can roughly locate the median and the quartiles of any density curve by dividing the __________ under the curve into four equal parts |
|
|
Term
|
Definition
On a normal curve, the point at which the __________ changes is located at a distance on either side of the mean . |
|
|
Term
|
Definition
When we have a large number of observations and graph the distribution of the quantitative variable, we sometimes get an overall pattern which is so regular that we can describe it by a smooth curve. We then draw a smooth curve through the tops of the __________. |
|
|
Term
|
Definition
A density curve is a curve that is always on or above the horizontal axis. It has __________ exactly 1 underneath it. |
|
|
Term
|
Definition
The __________ of a density curve can be located by eye. It is the point with half the observations on either side. |
|
|
Term
|
Definition
If X has the N(, ) distribution, then Z = (X – )/ has the __________ distribution. |
|
|
Term
|
Definition
The mean and median are equal for symmetric density curves. The mean of a skewed curve is located farther toward the __________ than is the median |
|
|
Term
|
Definition
Statistical inference is most secure when we produce data by random sampling. The reason is that when we use chance to choose respondents, the __________ answer the question: “What would happen if we did this many times?” |
|
|
Term
|
Definition
A/an __________ has outcomes that we cannot predict but that nonetheless have a regular distribution in very many repetitions. |
|
|
Term
|
Definition
When we choose many simple random samples from the same population, the sampling distribution of the sample means is centered at the mean of the __________. |
|
|
Term
|
Definition
In estimating the population parameter , the sample mean is correct on the average in many samples. How close the sample mean falls to the parameter in most samples is determined by the __________ of the sampling distribution. |
|
|
Term
|
Definition
__________ are less variable than individual observations. In general, the results of large samples are less variable than the results of small samples. |
|
|
Term
|
Definition
The __________ of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. |
|
|
Term
|
Definition
Draw observations at random from any population with finite mean population mean . The __________ guarantees that as the number of observations drawn increases, the mean of the observed values gets closer and closer to the mean of the population |
|
|
Term
|
Definition
The value of a statistic varies in repeated random sampling. But sampling variability is not fatal. __________ is unpredictable in the short run but has a regular and predictable pattern in the long run. |
|
|
Term
|
Definition
The __________ allows us to use normal probability calculations to answer questions about sample means from many observations even when the population distribution is not normal. |
|
|
Term
|
Definition
In repeated sampling, the sample mean will sometimes fall above the true value of and sometimes below, but there is no __________ to overestimate or underestimate the parameter. |
|
|
Term
|
Definition
The __________ of an event is the proportion of times the event occurs in many repeated trials of a random phenomenon. |
|
|
Term
|
Definition
When we choose many simple random samples from the same population, the sampling distribution of the sample means is less spread out than the distributions of the __________. |
|
|
Term
|
Definition
Draw a simple random sample of size n with mean and standard deviation . According to the __________, when n is large, the sampling distribution of the sample mean is approximately normal. |
|
|
Term
|
Definition
Because the sample means are centered at , we say that the sample mean is a/an __________ of the parameter . |
|
|
Term
|
Definition
The __________ of the sample mean describes how the sample mean varies in all possible samples of the same size from the same population. |
|
|
Term
|
Definition
A/an __________ is one of the two types of statistical inference. We use it when our goal is to estimate a population parameter. |
|
|
Term
|
Definition
The basic idea of significance tests is simple: an outcome that would rarely happen if a claim were true is __________ that the claim is not true. |
|
|
Term
|
Definition
The claim about the population that we are trying to find evidence for is the __________. |
|
|
Term
|
Definition
The __________ of a test is the probability, computed assuming that the null hypothesis is true, that the observed outcome would take a value as extreme as or more extreme than that actually observed. |
|
|
Term
|
Definition
Large p-values fail to give evidence against the __________. |
|
|
Term
|
Definition
A/an __________ is one of the two types of statistical inference. We use it when our goal is to assess the evidence provided by data about some claim concerning a population. |
|
|
Term
|
Definition
The __________ is the statement being tested in a statistical test. The test is designed to assess the strength of the evidence against it. |
|
|
Term
|
Definition
The smaller the p-value is, the stronger is the __________ against the null hypothesis provided by the data. |
|
|
Term
one-sided alternative hypothesis |
|
Definition
We have a/an __________ when we are interested only in deviations from the null hypothesis in one direction. An example is Ha: > 0. |
|
|
Term
|
Definition
If the __________ is as small or smaller than , we say that the data are statistically significant at level . |
|
|
Term
|
Definition
A/an __________ assesses the evidence against the null hypothesis by giving a probability, the p-value. |
|
|
Term
|
Definition
Small p-values are evidence against the null hypothesis, because they say that the observed result is unlikely to occur just by __________. |
|
|
Term
|
Definition
Calculating p-values requires knowledge of the __________ of the test statistic when the null hypothesis is true. |
|
|
Term
|
Definition
The __________ is a claim that we will try to find evidence against. |
|
|
Term
|
Definition
A statistical test is based on a test statistic. The __________ is the probability, computed supposing that the null hypothesis is true, that the test statistic will take a value at least as extreme as that actually observed. |
|
|