Term
Descriptive and Multivariate Statistics (Chapter 9)
Purpose: Farmiliarize readers with the principles of descriptive and multivariate statistics so that they are better producers as well as consumers of research and police data.
|
|
Definition
Goals:
-
Summarize large and small data sets
-
Examine the integrity of large and small data sets
-
Determine which statistics best portray the data
-
Compare more than one variable to others
-
Apply descriptive statistics to problem solving and data driven decision-making
|
|
|
Term
|
Definition
Science of collecting and organizing data and then drawing conclusions based on data.
|
|
|
Term
Question:
What are the three types of statistics? |
|
Definition
Answer:
Descriptive, multivariate, and inferential |
|
|
Term
|
Definition
Descriptive Statistics summarize large amounts of information in an efficient and easily understood manner. |
|
|
Term
|
Definition
Multivariate Statistics allow comparisons among factors by isolating the effects of one factor or variable from others that may distort conclusions.
|
|
|
Term
|
Definition
Inferential Statistics suggest statements about a population based on a sample drawn from that population. |
|
|
Term
|
Definition
Measurement is a process of assigning numbers or labels to units of analysis or items under study. |
|
|
Term
Concept of Levels of Measurement
|
|
Definition
Four Levels of measurement:
-
Nominal
-
Ordinal
-
Interval
-
Ratio
(Each conveys a different amount of information.) |
|
|
Term
Nominal Level of Measurement |
|
Definition
All categories must be exhaustive (covering all observations that may exist)
Categories must be mutually exclusive (each observation can only be classified one way) Provides names or labels for distinguishing observations.
Lowest Level of Measurement
(EX – Race/Gender – assigned numbers can not be calculated) |
|
|
Term
Ordinal Level of Measurement |
|
Definition
Categories must be exhaustive and mutually exclusive.
Categories must exhibit a degree of difference which indicates order or ranking
Categories are ordered in some way, but the actual distance between these orderings would not have any meaning.
(EX – Opinion based - good, better, best…) |
|
|
Term
Interval Level of Measurement |
|
Definition
Categories must be exhaustive, mutually exclusive and exhibit a degree of difference.
Assumes that all items on a scale have equal intervals between them.
Logical distances between categories expressed in meaningful intervals.
(EX Temp & IQ)
|
|
|
Term
Ratio Level of Measurement |
|
Definition
All Characteristics of Interval
Contains a true ZERO point. (A true zero point allows for measuring the total absence of the concept under measure.)
(Ex. Income, weight, time and age)
|
|
|
Term
Implications regarding Levels of Measurement |
|
Definition
-
Ratio is the highest level of measurement (Includes all characteristics.)
-
Researchers should strive for highest level of measurement possible. Lower levels of measurement can not be converted to higher levels, but higher levels can be converted to lower levels.
-
(Most Important) The statistical technique to be applied with determine the level of measurement needed.
|
|
|
Term
Distribution of Data Sets |
|
Definition
Process by which large and cumbersome data sets are described in a manner that is easily understood. |
|
|
Term
|
Definition
Allows for a basic description of the data set and for graphical representation. Allows for more efficient management and analysis of large data sets.
x (category)
|
f (frequency)
|
fx (f times x)
|
Highest Numerical Value
|
|
|
Lowest Numerical Value
|
|
|
|
N = ∑ f
|
∑ fx
|
|
|
|
Term
|
Definition
A summary statistic that provides limited information but allows for condensing a frequency distribution. The RANGE is obtained by subtracting the Highest Numerical Value from the Lowest Numerical Value. The RANGE is used to obtain a class interval. It is also a simple measure of variation
Class Interval
|
=
|
Range
|
i
|
N of Desired Intervals
|
|
|
|
Term
|
Definition
Process by which the class interval is used to group a Frequency Distribution.
Note: Although the data have been condensed and the distribution has changed, the nature of the data remains the same. |
|
|
Term
Question:
What is the purpose of Charts and Graphs? |
|
Definition
Answer:
To portray the distribution of data for a quick and meaningful understanding. |
|
|
Term
|
Definition
•
Percentages are the relation between two or more numbers for which the whole is accorded a value of 100.
•
•
Calculated by dividing the frequency of each interval by the total number of cases.
Useful for managerial reports and policy evaluations.
|
|
|
Term
|
Definition
Calculated by adding the percent column for each class interval. |
|
|
Term
Question:
What should you do with missing data? |
|
Definition
Answer: In cases containing missing data, you cannot determine whether the missing cases would have fallen into a particular segment or class interval.
Option 1: Include a segment labeled “Missing Cases” and include in the total. (Deflates %)
Option 2: Omit the missing cases altogether. (Inflates %)
In either case document whether missing data has been included or omitted. |
|
|
Term
Measures of Central Tendency |
|
Definition
Most common forms of descriptive statistics. They describe the average value from a distribution of values. The primary measures are MEAN, MEDIAN and MODE. |
|
|
Term
|
Definition
The mean is the arithmetic average and is calculated by dividing the sum of scores by the number of cases. A distribution of data can have only one mean. The one weakness of the mean is that it is affected by extreme score(s) in a distribution. |
|
|
Term
|
Definition
Outliers are extreme scores. Single or small numbers of exceptional cases that deviate from the general pattern of scores. |
|
|
Term
|
Definition
The midpoint or middle score of a distribution. The median is not significantly affected by outliers. In the event that there is an even number of scores in the distribution, rank order the distribution and calculate the average value of the two middle scores. |
|
|
Term
|
Definition
The mode indicates the most frequently occurring score(s) or label in a distribution. It is possible to have more then one score or interval tie for most occurring. The mode is primarily used for nominal measurements as it provides limited information and is not subject of further statistical analysis. |
|
|
Term
|
Definition
Range, Variance and Standard Deviation |
|
|
Term
|
Definition
Variance is a statistical measure that tells us how measured data vary from the average value of the set of data. Variance is the sum of the squared deviations of each score from the mean, divided by the total number of cases. |
|
|
Term
|
Definition
Standard Deviation measures the average distance that each data item is item is away from the mean of all data items in a distribution. It provides insight on how scores in a distribution compare with each other and allows for comparisons between two distributions.
-
Any standard deviation value has no real intuitive meaning;
-
Most useful in a comparative sense;
-
Comparing the relative values for the standard deviation and the mean indicates how much variation there is in a group of cases, relative to the average.
|
|
|
Term
|
Definition
Skewness illustrates the spread of scores weighted to one side of the mean. The are three distinct patterns that may emerge from unimodal distributions: normal, positive, and negative. |
|
|
Term
|
Definition
There is no skew. Scores are evenly distributed throughout a distribution and statistical assumptions regarding the data can be made. Normal distributions will produce equal measures of central tendency. |
|
|
Term
|
Definition
Unimodal distribution of scores weighted to the left with the mode being the largest value followed by the median and then the mean. Hump to the left with an extended tail to the right. |
|
|
Term
|
Definition
Unimodal distribution of scores weighted to the right generally with the mean being the largest value followed by the median and then the mode. Hump to the right with an extended tail to the left. |
|
|
Term
Question:
What should you use to represent a skewed distribution? |
|
Definition
Answer:
Depends on the nature of the data and the purpose of the research or project.
|
|
|
Term
|
Definition
Used to standardize some measure for comparative purposes. Calculated by dividing raw numbers by a comparable denominator.
Rate =
|
Raw Number of Occurrence
|
X
|
Unit of Measure
|
Point of Comparison
|
|
|
|
Term
|
Definition
A variable with only two categories |
|
|
Term
|
Definition
Best description of central tendency with dichotomous data. Proportions are the relationships between two or more categories or values. Obtained by dividing the value of the part by the value of the whole. Proportions can be considered the mean of a dichotomous variable and will have a value range between 0 and 1. |
|
|
Term
|
Definition
•
A percentage change is a way to express a change in a variable. It represents the relative change between the old and the new value.
Note: Care should be exercised when comparing two time periods exclusively especially with long periods of time in between.
Percent Change =
|
After Value - Before Value
|
X
|
100
|
Before Value
|
|
|
|
Term
|
Definition
Statistics that encompass the simultaneous observation and analysis of more than one statistical variable with a focus on assessing the strength and identifying patterns of association between or among the variables. |
|
|
Term
|
Definition
Regression analysis is a procedure for pattern recognition. Regression attempts to plot a line (trendline) through a given set of data points on a graph. Given a strong enough pattern of association, a regression line suggests possibilities for predicting future values. |
|
|
Term
|
Definition
A group of statistical techniques used to measure the strength of the relationship between variables. Correlation measures the relative “fit’” or degree of association between two or more variables. This technique provides a measure of the quality of the regression line, and therefore of the reliability of any predictions based on it. |
|
|