Term
|
Definition
– total set of all subjects of interest – the entire group of people, animal or things about which we want information |
|
|
Term
|
Definition
– any individual member of the population |
|
|
Term
|
Definition
– subset of the population from which the study actually collects information – used to draw conclusions about the whole population |
|
|
Term
|
Definition
– a characteristic of a unit that can vary among subjects in the population/sample – Examples: gender, nationality, age, income, hair color, height, disease status, grade in STA 291, state of residence |
|
|
Term
|
Definition
– numerical characteristic of the population – calculated using the whole population |
|
|
Term
|
Definition
– numerical characteristic of the sample – calculated using the sample |
|
|
Term
|
Definition
– Summarizing the information in a collection of data |
|
|
Term
|
Definition
– Using information from a sample to make conclusions/predictions about the population |
|
|
Term
|
Definition
-variables that have a scale of unordered categories -ex: gender, nationality, hair color |
|
|
Term
Ordinal Variables (qualitative) |
|
Definition
-variables that have a scale of ordered categories -often treated in quantitative manner -ex: disease status, company rating, grade in class |
|
|
Term
|
Definition
-variables that are measured numerically -for each subject a number is observed -ex: age, income, height |
|
|
Term
|
Definition
-variables that take on a FINITE number of values -ex: gender, nationality, hair color, disease status, grade in class, number of children -all qualitative variables are this -quantitative can be this or continuous |
|
|
Term
|
Definition
-variables that can take an infinite continuum of possible real number values -ex: time spent on homework -can be subdivided -always quantitative variables |
|
|
Term
Simple Random Sampling (SRS) |
|
Definition
-Each possible sample has the same probability of being selected. [no discrimination, no favoritism.] |
|
|
Term
|
Definition
-a study that watches individuals and measures variables of interest but does not attempt to influence the responses. -purpose is to describe/compare groups or situations. |
|
|
Term
|
Definition
- a study that deliberately imposes some treatment on individuals in order to observe their responses -purpose is to study whether the treatment causes a change in response |
|
|
Term
|
Definition
1. convenience sampling 2. volunteer sampling -poorly represent the population, misleading conclusions, biased |
|
|
Term
|
Definition
-the population is divided into seperate, non-overlapping groups according to some criteria -select a SRS independently from each group |
|
|
Term
|
Definition
-the population is divided into a set of non-overlapping subgroups -these subgroups are then selected at random, and ALL individuals in the selected groups are included in the sample |
|
|
Term
|
Definition
- a value K is specified -then one of the first K individuals is selected at random, after which every Kth observation is included in the sample |
|
|
Term
|
Definition
-selection of the sample systematically excludes some part of the population of interest |
|
|
Term
Measurement/Response Bias |
|
Definition
-method of observation tends to produce values that systematically differ from the true value |
|
|
Term
|
Definition
-occurs when responses are not actually obtained from all individuals selected for inclusion in the sample |
|
|
Term
|
Definition
-error that occurs when a statistic based on a sample estimates or predicts the value of a population parameter -(can be reduced) |
|
|
Term
|
Definition
-everything that could go wrong in a census when asking the whole populationg -examples: bias due to question wording, question order, nonresponse, wrong answers |
|
|
Term
|
Definition
-used for nominal or ordinal data -bars seperated to emphasize the data is categorical not numerical |
|
|
Term
|
Definition
-used for nominal or ordinal data -each slice is proportinal to the frequency |
|
|
Term
|
Definition
-used for continuous numerical type data -Divide the range of possible values into many (contiguous, non-overlap) intervals, then count how many times data falls into each interval. |
|
|
Term
|
Definition
-A listing of intervals of possible values for a variable -Together with a tabulation of the number of observations in each interval. |
|
|
Term
|
Definition
-The proportion of sample observations that fall in that interval -sometimes percentages are preferred |
|
|
Term
Interquartile Range (IQR) |
|
Definition
|
|
Term
|
Definition
-includes: 1. maximun 2. upper quartile 3. median 4. lower quartile 5. minimum |
|
|
Term
|
Definition
-a graphic representation of the 5 number summary -provided the max is within 1.5 IQR of Q3 and the min is within 1.5 IQR of Q1 |
|
|
Term
|
Definition
-when the min or the max is further than 1.5 IQR away from Q1 or Q3 - it is then treated differently with the box plot |
|
|
Term
|
Definition
|
|
Term
Variance and Standard Deviation |
|
Definition
see slides from lecture 7 for the process |
|
|