Term
Reliability of Selection Measures |
|
Definition
Legally (via court rulings and Uniform Guidelines) and strategically it is important to make sure you are obtaining reliable and valid information from your applicants
|
|
|
Term
|
Definition
The degree to which interviews, tests, and other selection procedures yield consistent data
|
|
|
Term
|
Definition
Degree to which the inferences we make from a test score are appropriate
|
|
|
Term
Reliability of Selection Measures |
|
Definition
The degree of dependability, consistency, or stability of scores on a measure (either predictors or criteria) used in selection research.
We always expect some degree of error in measuring psychological traits
Measuring height v. intelligence
The higher the reliability, the more readily we can use a selection device for making distinctions between applicants
We want tests that minimize differences WITHIN person and maximize differences BETWEEN persons
|
|
|
Term
|
Definition
Factors not related to the characteristic, trait, or attribute being measured
- Obtained Score Components
- Xobtained = Xtrue + Xerror
where
- Xobtained = obtained score for a person on a measure
- Xtrue = true score on the measure, that is, actual amount of the attribute measured that a person really possesses
- Xerror = error score on the measure that is assumed to represent random fluctuations or chance factors.
|
|
|
Term
|
Definition
|
|
Term
|
Definition
The average score made by an individual on many different administrations of a measure if external and internal conditions were perfect
|
|
|
Term
|
Definition
The score made by individuals that is influenced by factors present at the time of measurement that distort individuals’ scores either over or under what they would have been on another measurement occasion
|
|
|
Term
Factors Affecting Reliability 1 |
|
Definition
- Environmental factors
- Room temperature, lighting, noise
- Personal factors
- Anxiety, mood, fatigue, illness
- Interviewer/recruiter behavior
- If smiling at one applicant then scowling at another
- Individuals scoring the assessment
- Homogeneity of items
- Do all items measure the same KSA?
- Individual differences among respondents
- Will reliability be higher or lower the greater the variability among respondents on the attribute being measured?
|
|
|
Term
Factors Affecting Reliability 2 |
|
Definition
- Trait stability
- Mood, anxiety, emotion v. intelligence
- For which traits would fluctuations within a person’s test score be considered error?
- Test item difficulty level (see next slide)
- What difficulty level is most reliable?
- Item response format
- Multiple choice v. true/false – which is more reliable?
- Length of the test (see example on later slide)
- Longer or shorter test – which is more reliable?
|
|
|
Term
Interpreting Reliability Coefficients |
|
Definition
It represents the extent (in percentage terms) to which individual differences in scores on a measure are due to “true” differences in the attribute measured
- rxx = .90
- 90% of the differences in test scores among individuals who took the test are due to true differences on the attribute,10% are due to “error”
Dependability of a measure for a group of individuals
The higher the better (rule of thumb rxx > .80 is minimum acceptable)
The more critical the decision, the higher it needs to be
Reliability ranges from 0 to +1.0
|
|
|
Term
Choosing a Method of Estimating Reliability |
|
Definition
Depends on what question you are asking…
- How consistent are scores on a test over time?
- How consistent are scores across different forms of a test?
- Parallel forms reliability
- What is the extent to which items on a test are similar in what they are measuring?
- Internal consistency reliability
- When individuals are being rated by more than one rater, to what degree do evaluations vary from one rater to another?
|
|
|
Term
|
Definition
The same measure is used to collect data from the same respondents at two different points in time
The reliability coefficient represents a “coefficient of stability” that indicates the extent to which the test can be generalized from one time period to the next
|
|
|
Term
Person factors that may affect test-retest reliability |
|
Definition
Memory – Recalling how you responded to the items the first time you took the test
Will memory over or underestimate a test’s reliability?
Learning – Acquiring new knowledge between first and second test administration
Will learning over or underestimate a test’s reliability?
|
|
|
Term
Use test-retest reliability when…
|
|
Definition
|
|
Term
Do NOT use test-retest reliability when…
|
|
Definition
The trait being measured is unstable (attitudes, self-esteem, self-concept)
Changes in these traits reflect instability of the trait itself rather than error in measuring the trait
|
|
|
Term
Parallel or Equivalent Forms Strategy
|
|
Definition
Administering two equivalent versions (forms with different items but assessing the same measure) of a measure to the same respondent group
Use this reliability if you have multiple forms for test security purposes
“Coefficient of equivalence”—if both forms are administered at same time
“Coefficient of equivalence and stability”—if the forms (A and B) are administered over time (which would have two potential sources of error: test content and test score stability.
Administrations over time would be inappropriate for unstable traits
|
|
|
Term
Internal Consistency Reliability Estimate
|
|
Definition
Shows the extent to which all parts of a measure are similar in what they measure
- Error can be caused by…
- How items were phrased
- How items were interpreted by respondents
- Having multiple constructs being measured within the same test
|
|
|
Term
Internal Consistency Procedures
|
|
Definition
- Split-half reliability
- Cronbach’s coefficient alpha (α) reliability
|
|
|
Term
|
Definition
Single administration, test is split into two halves, each half is scored and the resulting scores are then correlated to determine reliability
- Coefficient of equivalence – showing similarity of responses across the “two forms” taken at same time
- Will be an underestimate of true reliability
- Use Spearman-Brown correction to show what the reliability would have been on the full measure rather than two halves
- rxxc = nr12 / 1 + (n – 1)r12
- If rxx = .80 on a spilt half, corrected rxx = .89
Split half won’t detect errors of measurement over time
Not appropriate for timed tests (# of items completed)
|
|
|
Term
|
Definition
- Estimates reliability by taking the average of every possible way of dividing up the items
- Can be performed on items scored on an interval scale
- Represents that average correlation of each item with every other item
- Examines if respondents are responding similarly to items measuring the same trait/construct
- See next slide for example of a 4-item measure of conscientiousness
|
|
|
Term
Sources of Measurement Error
|
|
Definition
What is being rated (e.g., applicant)
Ratings of applicant interview or simulation performance will be more subjective than objectively scored tests
Who is doing the rating (rater characteristics)
Biases, opinions, stereotypes
Halo error
|
|
|
Term
The purpose of interrater reliability is |
|
Definition
to determine whether raters are consistent in their judgments on a group of applicants
|
|
|
Term
Standard Error of Measurement |
|
Definition
- All previous reliability estimates are based on a group of scores to describe the reliability of a test
- SEM answers the question - How much error is in any one person’s score?
|
|
|
Term
Calculating the standard error of measurement
|
|
Definition
[image]
Where:
st = standard deviation of test takers’ scores
rtt = the reliability estimate of the test
St = 10, rtt = .75; SEM = 5
|
|
|
Term
|
Definition
- Determines how error in the test impacts individual scores
- Shows that there are a range of scores that should be considered equivalent
- Shows what any one person would have likely scored if they took the test multiple times
- 68% chance that any person’s true score lies within +/- 1 SEM of actual score
- 95% chance that any person’s true score lies within +/- 2 SEM of actual score
- If Person A’s score = 60 (and SEM = 5), there is a 68% chance that if she retook the test, her new score would be in the range of 55 – 65; there is a 95% chance that it would be in the range of 50 – 70
|
|
|
Term
|
Definition
SEM can be used to determine whether scores for individuals differ significantly from one another
- What if you had the following scores on two applicants and an SEM of 5 for the test?
- Applicant 1 score = 50
- Applicant 2 score = 46
Is Applicant 1 better than Applicant 2?
95% chance that upon retaking the test…
Applicant 1 could score between 40-60
Applicant 2 could score between 36-56
|
|
|