Term
|
Definition
- Any characteristic that is recorded for the subjects in a study
- Can be Categorical
- Or can be Quantitative
|
|
|
Term
|
Definition
- Observations belongs to one of a set of categories
- EX. Gender, Religious Affiliation etc.
|
|
|
Term
|
Definition
- Observations that take numerical values to represent different magnitudes of the variable
- EX. age, number os siblings
- Key features: Center and spread (variability)
|
|
|
Term
Quantitative Variable (Discrete) |
|
Definition
- values form set of separate numbers
- have finite number of possible values are discrete
- EX. number of pets in household, number of children in family
|
|
|
Term
Quantitative variable (Continuous) |
|
Definition
- values that form an interval
- have infinite number of possible values
- EX. Height/Weight, age, blood pressure
|
|
|
Term
|
Definition
- frequency (count) of observation in category divided by total number of observations.
|
|
|
Term
|
Definition
- AKA proportions / percentages
|
|
|
Term
|
Definition
- Listing of possible values for a variable
- Counts number of observations / relative freqencies for each value
|
|
|
Term
Describing data using graphical summaries:
Distribution |
|
Definition
- Graph or frequency table describes distribution
- Tells us possible value a variable takes as well as the occurrence of those values (frequency / relative frequency)
|
|
|
Term
Describing data using graphical summaries:
Pie Chart / Bar graph |
|
Definition
- Pie Chart:
- Summarizing categorical variable
- Bar Graph
- Summarizing categorical variables
- height of bar = frequencies
- Pareto Charts = bar graghs arranged by tallest bar to shortest
- This is fucking retarded
|
|
|
Term
Describing data using graphical summaries:
Dot Plot |
|
Definition
- Summarizing quantitative variable
- Horizontal line with regular values of variables
- for each observation, place dot above its value on number line
[image]
|
|
|
Term
Describing data using graphical summaries:
Stem-and-Leaf Plot |
|
Definition
- separate each observation into stem (first part of the number) and a leaf (typically last digit of number)
- This bitch:[image]
|
|
|
Term
Describe data using graphical summaries:
Histogram |
|
Definition
- Use bars to portray frequencies for relative frequencies of the possible outcomes of a quantitative variable
- divide range of data into regular intervals
- label values/end point of intervals on horizontal axis
- draw bar over each value / interval w/ height equal to frequency (or percentage)
[image] |
|
|
Term
Interpreting histogram:
Center |
|
Definition
- found by finding the median
|
|
|
Term
Interpreting histogram:
Spread |
|
Definition
- how much it is.... spread.
|
|
|
Term
Interpreting histogram:
Shape |
|
Definition
- symmetric
- both left and right sides mirror images of each other
- Mean and Median close together
- Skewed to the left
- left tail is longer than right tail
- Mean less than median (mean to the left of median)
- Skewed to the right
- right tail is longer than left tail
- Mean more than median (mean is to the right of median)
[image] |
|
|
Term
Interpreting Hisogram:
Time Plots |
|
Definition
- displaying data set collected over time
- observation on vertical scale vs. time on horizontal scale
[image] |
|
|
Term
|
Definition
- Midpoint of observations when ordered smallest to largest
- if # of observation is:
- Odd, median is middle observation
- Even, median is average of two middle ovservations
|
|
|
Term
|
Definition
- Numerical summary measure is resistant if extreme ovservations have litte if any influence on its value
- Median resistant to outliers
- Mean is not resistant to outliers
|
|
|
Term
|
Definition
- Max value - Min value
- strongly affected by outliers
|
|
|
Term
|
Definition
- deviation from the mean
- positive deviation if it falls above meaan
- Negative iff it falls below mean
- sum of deviation = 0
- [image]
|
|
|
Term
Properties of standard deviation |
|
Definition
- s measures spread of data
- bigger spread = larger S
- Variance = s2
- s is not resistant
|
|
|
Term
|
Definition
- If distribution of data is bell-shaped then:
- 68% of observations fall w/ 1 standard deviation of mean
- 95% of observations fall w/i 2 standard deviation of mean
- nearly all observations fall w/i 3 standard deviations of mean
|
|
|
Term
|
Definition
- ranked data set divided into 4 equal parts
- Median is second quartile Q2
- 1st quartile Q1 = median of lower half of ovservations
- 3rd quartil Q3 = upper half of the observations
|
|
|
Term
|
Definition
- Minimum value
- First Quartile
- Median
- Third Quartile
- Maximum value
|
|
|
Term
Interquaritle range and detecting potential outliers |
|
Definition
- Interquartile rage (IQR) = Q3 - Q1
- Gives spread of middle 50% of data
- Observation is potential outlier if it's more than 1.5 x IQR below Q1 or higher than Q3
|
|
|
Term
|
Definition
- Box goes from Q1 to Q3
- line drawn inside box at median
- Line goes from lower end of box to smallest observation that's not potential outlier and from upper end of box to largest observation that is not a potential outlier
- Potential outliers shown separately
|
|
|
Term
|
Definition
- number of standard deviation that i falls from the mean
- Observation from bell-shaped distribution is potentiall outlier if it's z-score <-3 or >+3
- [image]
|
|
|
Term
Guidelines for constructing effective graphs |
|
Definition
- Label both axes and provide proper headings
- vertical axis start at 0 to better compare relative size
|
|
|