Term
population definition & what is the name of the numerical measure that describes a characteristic of a population? |
|
Definition
collection of all members of a group parameter |
|
|
Term
sample definition and what is the numerical measure that describes a characteristic of a sample? |
|
Definition
a portion of the population selected for analysis statistic |
|
|
Term
|
Definition
drawing conclusions about a population based only on sample data. |
|
|
Term
|
Definition
collecting, summarizing, and presenting data. |
|
|
Term
discrete vs continuous 1. # people in the room 2. time of commute 3. height 4. td's scored by pack 5. weight |
|
Definition
both are characteristics of numerical (quantitative data) 1. discrete 2. continuous 3. continuous 4. discrete 5. continuous |
|
|
Term
categroical (qualitative) vs numerical (quantitative) 1. marital status 2. deflects per hour 3. voltage 4. eye color |
|
Definition
1. categorical 2. numerical & discrete 3. numerical & continuous 4. categorical |
|
|
Term
nominal vs ordinal vs interval vs ratio data 1. 1st, 2nd places in a race 2. temperature f/C 3. money 4. height 5. age 6.type of car owned 7. student's letter grades 8. service quality rating 9. standardized exam score |
|
Definition
qualitative (nominal, ordinal) vs. quantitative (interval, ratio) nominal: categories (no ordering or direction) ordinal: ordered categories (rankings, ratings, order or scaling) interval: differences between measurements but no true zero ratio: differences between measurements, true zero exists 1. ordinal 2. interval 3. ratio (you can have absolutely no money) 4. ratio 5. ratio 6.nominal 7.ordinal 8.ordinal 9. interval |
|
|
Term
how are nominal/ordinal/interval/ratio graphed? qualitative aka categorical (nominal/ordinal) vs quantitative aka numerical (interval/ratio) |
|
Definition
categorical: bar chart, pie chart, pareto chart, (graphing data) summary table (tabulating data) numerical: stem and leaf display (ordered array), histogram, polygon, ogive ( all frequency distribution and cumulative distributions) |
|
|
Term
i measure 2 students and use their resulting scores to make a statement comparing them. Identify the scale of measurement used: 1. I can only say that the two students are different 2. I can say that one student scored 6 points higher than the other 3. I can say that one student scored higher than the other, but I can't specify how much higher. 4. I can say that the score for one student is 2x the score of the other. |
|
Definition
1. nominal 2. interval 3. ordinal 4. ratio |
|
|
Term
which is an example of qualitative data? 1. social security number 2. score on multiple choice exam 3. height, in meters 4. number of square feet of carpet laid |
|
Definition
social security is qualitative |
|
|
Term
which of the following is an example of quantitative data? 1. number on a baseball uniform 2. serial number on a one dollar bill 3. numer of dependents you claim on your income tax form |
|
Definition
number of dependents you claim on your tax income form |
|
|
Term
which one is not an example of descriptive statistics? 1. histogram 2. estimate of number of alaska residents who have visited canada 3. table summarizing data collected in a sample 4. proportion of mailed out surveys completed and returned |
|
Definition
2. estimate of the number of alaska residents who have visited canada inferential statistics: drawing conclusions about a population based on sample results |
|
|
Term
ordered array is it useful for large or small sets of data? Does it help identify outliers? |
|
Definition
a sequence of ranked data in order. shows range provides some signals about variability may help identify outliers if data array is large, the ordered array is less useful |
|
|
Term
|
Definition
a simple way to see distribution details in a data set |
|
|
Term
|
Definition
a tabulation of the number of occurences of each score value or measurement why use it: it is a way to summarize numerical data, it condenses the raw data into a more useful form, it allows for a quick visual interpretation of the data |
|
|
Term
|
Definition
graph of the data in a frequency distribution is called a histogram the class boundaries are shown on the horizontal axis, the vertical axis is either the frequency, relative frequency or percentage, bars of the appropriate heights are used to represent the number of observations within each class width of bars represents width of class interval |
|
|
Term
|
Definition
used to examine possible relationships between two numerical variables |
|
|
Term
|
Definition
used to study patterns in the values of a variable over time- time is usually measured on the horizontal axis |
|
|
Term
measures of central tendency:arithmetic mean |
|
Definition
1. arithmetic mean: most common, advantage=uses actual numerical values, disadvantage= affected by extreme values (outliers) |
|
|
Term
|
Definition
like a sample mean, is a one-number estimate of the value of a population parameter |
|
|
Term
|
Definition
advantage: less sensitive to extreme values, can be used for ordinal data disadvantage: based on less information than the mean median position= (n+1)/ 2 position in the ordered data- it is not the value of the median, it is only the position of the median in the ranked data |
|
|
Term
|
Definition
value that occurs most often adv: not affected by extreme values, can be used for either numerical or categorical data disadvantage: ignores much information in the data there may be no mode there may be several modes |
|
|
Term
which is best measure of location of "center" 1. if outliers exist 2. when using categorical data 3.if outliers dont exist |
|
Definition
|
|
Term
box & wisker plot how to find position of 1st, 2nd and 3rd quartiles in ranked data |
|
Definition
Q1=(n+1)/4 Q2= (n+1)/2 Q3=3(n+1)/4 advantage: you can use when you have extreme values |
|
|
Term
geometric mean & geometric rate of return |
|
Definition
geo mean=used to measure the rate of change of a variable over time. ROR=measures the status of an investment over time geo mean: = (X1 x X2 x...x Xn) ^ (1/n) ROR=[(1+R1) x (1+R2) x ... x (1+Rn)]^(1/n) -1 |
|
|
Term
geometric vs arithmetic returns which is better? |
|
Definition
geometric, it eliminates risk |
|
|
Term
measure of variation: Range disadvantages? |
|
Definition
the simplest measure of variation difference between the largest and the smallest values in a set of data disadvantages: ignores the way in which data are distributed |
|
|
Term
measures of variation: interquartile range |
|
Definition
some outlier problems can be eliminated by using the interquartile range. some high and low valued observations are eliminated and the range is calculated from the remaining values (middle 50%) Q3-Q1 |
|
|
Term
|
Definition
average of squared deviations of values from the mean. for pop: σ2 = Σ ( Xi - μ )2 / N for sample: s2 = Σ ( xi - x )2 / ( n - 1 ) |
|
|
Term
|
Definition
is the square root of the variance most commonly used measure of variation shows variation about the mean has the same units as the original data pop: sqrt [ Σ ( Xi - μ )2 / N ] sample: sqrt [ Σ ( xi - x )2 / ( n - 1 ) ] |
|
|
Term
measures of variation: summary characteristics |
|
Definition
the more the data are spread out, the greater the range, variance, and standard deviation. if the values are all the same (no variation) all these measures will be zero none of these measures are ever negative |
|
|
Term
advantages of variance and standard deviation |
|
Definition
each value in the data set is used in the calculation values far from the mean are given extra weight (because the deviations from the mean are squared) |
|
|
Term
|
Definition
measures variation relative to mean always in % can be used to compare two or more sets of data measured in different units shows risk in stocks CV: (standard deviation/mean) |
|
|
Term
|
Definition
we use the standard deviation to standardize scores. a z score is a measure of distance from the mean in terms of standard deviation units it is the difference between a value and the mean, divided by the standard deviation a z score about 3.0 or below -3.0 is considered an outlier |
|
|
Term
left skewed median>mean or median |
|
Definition
|
|
Term
|
Definition
if the data distribution is approximately bell-shaped, then the interval, mean+ or - 1 standard deviation = 68% of the values in the population or the sample, 2 S= 95% 3 S= 99.7 |
|
|
Term
|
Definition
regardless of how the data are distributed, at least (1-1/K^2) x 100 of the values will fall within K standard deviations of the mean (for k>1) at least 56% data within 1.5 S of mean at least 75% data within 2 S of mean at least 89% data within 3 S of mean |
|
|
Term
in general, which of the following descriptive summary measures cannot be easily approximated from a box and wisker plot? a. variance b. the range c. the interquartile range d. the median |
|
Definition
|
|
Term
|
Definition
measures the strength of the linear relationship between two variables (called bivariate data). it is a non-standardized measure of the joint variance of the two variables only concerned with the strength of the relationship no causal is implied cov xy= sum of (x-xmean)(y-ymean)/n-1 |
|
|
Term
cov (x,y) >0 = move in ___ direction cov (x,y)<0 = move in ___ direction cov (x,y)=0 = x& Y are ____ |
|
Definition
same opposite independent depends on the units of measurement of x and y, so cannot compare relative strength of the relationship between variables |
|
|
Term
coefficient of correlation |
|
Definition
measures the relative strength of the linear relationship between two variables sample coef. of correlation: r= cov(x,y)/SxSy |
|
|
Term
features of correlation coefficient |
|
Definition
population= p sample=r unt free standardized measure ranges between 1 &-1 the closer to -1 the stronger the negative linear relationship the closer to 1 the stronger the positive linear relationship the closer to 0 the weaker the linear relationship |
|
|
Term
a correlation of -.32 is stronger than .30 |
|
Definition
|
|
Term
True or false: Descriptive statistics are used to draw conclusions about a population based on sample data |
|
Definition
false, Inferential statistics are used to draw conclusions about a population based on sample data |
|
|
Term
Which os the following is false? A pareto diagram: 1. is a bar chart where categories are shown in descending order of frequency 2. is used to portray numerical data on an interval scale 3. is often shown with a cumulative polygon 4. is used to separate the "vital few" from the "trivial many" |
|
Definition
it is false that it is used to portray numerical data on an interval scale
paretos are used to portray categorical data |
|
|
Term
You would like to represent the distribution of students in a class based on class. which is the best for presenting data? 1. pie chart 2. stem and leaf 3. scatter plot 4. time series plot |
|
Definition
|
|
Term
t/f unlike a grouped frequency distribution, a stem and leaf plot usually preserves original data values |
|
Definition
|
|
Term
t/f scatter diagrams are used to examine possible relationships between numerical and categorical data |
|
Definition
false just for numerical data |
|
|
Term
priori vs empirical classical probability vs subjective |
|
Definition
priori=each outcome is equally likely p(y)=p(x) empirical=like relative frequency subjective= an individual judgment or opinion about the probability of occurrence |
|
|
Term
the probability of at least one head in two flips is: 1..33 2. .5 3 .75 4. 1 |
|
Definition
.75 at least= 1- P(no head) 1-.25=.75 |
|
|