Term
Origins and evolution of testing |
|
Definition
-first tests date back 2200-1000 B.C. to see who was the best mail carrier
-test research and development relatively new (last 100yrs)
-interest in testing evolves out of practical need (diagnosis, performance enhancement, prediction) |
|
|
Term
|
Definition
- Educational Assessments
- Personnel Assessments
- Clinical Assessments
|
|
|
Term
|
Definition
perspective shift
-continuous interest in "g"
-shift from CTT towards IRT
-tools/techniques to applications (from perfecting measures to finding applications & limitations)
-Measurement validity (imp to society)
concern for ethics |
|
|
Term
|
Definition
-2010
-what is competence, who is informed for consent?
-shall not use obsolete or out-dated tests (when is it outdated, what to do to stay current)
-release of test data (handling data appropriately) |
|
|
Term
|
Definition
- A sample of behavior demonostration knowledge, skills, abilities, or other attributes (KSAO): not exhaustive, focuses on relevant constructs representative & important to purpose)
- Obtained under standardized conditions reflecting person's typical ability or disposition without effect from testing testing (replicable)
- systematic rules for scoring numeric information that are comprehensive and well defined (objective)
|
|
|
Term
Why is objective scoring hard? |
|
Definition
-decisions must be made regarding rules, which allows for subjective thought |
|
|
Term
|
Definition
Qualitatively different
test- infers right/wrong scoring (knowledge tests)
inventory- assessment of magnitude/level of a dimension (personality) no wright or wrong test
inventroy usually classifies people into categories |
|
|
Term
|
Definition
Stable characteristics/traits (e.g. affect, ability) |
|
|
Term
|
Definition
|
|
Term
|
Definition
construct with an applied rule of measure |
|
|
Term
|
Definition
assigned quantitative values |
|
|
Term
|
Definition
speed- lots of easy questions w/ a time limit
power - difficult items w/out a time limit |
|
|
Term
Tests are only good if they _______ |
|
Definition
|
|
Term
types of psychological assessments |
|
Definition
performance tests
observational assessments/behavioral checklists
self-report inventories
best to use a variety bc they each have flaws |
|
|
Term
|
Definition
Objective: assess maximal performance (perfect environment, notice to prepare)
e.g. content knowledge (final), abilities (musical tonality), skill proficiency (scale formations) |
|
|
Term
Observational Assessments/ Behavioral Checklists |
|
Definition
Objective: Assess typical performance
e.g. Interviews, job performance, situational tasks (social skills) |
|
|
Term
|
Definition
Objective: measure latent constructs
e.g. attitudes, beliefs, values, mental states
may have limited face validity |
|
|
Term
Assessment Challenges (6) |
|
Definition
- No single approach to measurement
- limited samples available
- all measures subject to error
- lack of well-defined measurement scales
- operational measures must have demonstrated relationships
- must have individual differences and stability to measure
|
|
|
Term
|
Definition
-principles that estimate the extent to which error influences measure and methods devised to minimize these problems |
|
|
Term
Sources of score variability |
|
Definition
- lasting general characteristics of ind (ability, good test-taking)
- lasting specific characteristics of ind (specific test form, knowing specific items)
- temporary gen characteristics of ind (health, fatigue)
- temp specific characteristics of the ind (systematic or change factors affecting administration or scoring)
- variance not otherwise accounted for (chance)
|
|
|
Term
Classical Test Theory (CTT) |
|
Definition
X=T+e or e=X-T: X=test score, T=true score, e=error
Goal: to estimate and reduce error in tests. Must understand sources of varaibility to prevent error |
|
|
Term
CTT: what is measurement error |
|
Definition
Score = real score and measuremet error
measurement error = systematic error and random error |
|
|
Term
systematic vs. random error |
|
Definition
-systematic: predictable, measurable error (focus of most psychological research)
e.g. poorly designed test or bad item
-random: unpredictable & immeasurable, but can be estimated
e.g. statistical anomalies, irrelvant error sources |
|
|
Term
Assumptions of estimating random error in CTT |
|
Definition
- mean error of measurement = 0
- true score and errors are uncorrelated
- errors on different measures are uncorrelated
|
|
|
Term
|
Definition
-conditions of person, test, & environment that randomly & proportionally affect all test takers/scores
CCT assumes person and indicator invariance (test happens in a vaccum) |
|
|
Term
|
Definition
- Guessing vs. Blind Guessing
- Reliability in Speed vs. power tests
- reliability in multi-scale tests
|
|
|
Term
Guessing vs. Blind guessing |
|
Definition
(Abbott's formula)
-assumes that probability of correct guess is random
-assumess all individuals have the same relative luck
-affect of guessing influenced by # of items; "answer all" approach |
|
|
Term
Reliability in speed vs power tests |
|
Definition
-"speed": more than just a time limit, time limit and pacing function differently
-time limits interact with individual test-taker characteristics (not person invariant)nough
-don't know if a low score means they didn't know answers or didn't work quickly e |
|
|
Term
Reliability in multi-scale tests |
|
Definition
-internal consistency will vary depending on item selection
-construction affects error |
|
|
Term
|
Definition
-when used to estimate an individual's "true score" CTT is susceptible to test bias, decision errors, discrimination, & adverse impact
-relationships DO exist between test-takers and test-items (persons are NOT invariant) |
|
|
Term
|
Definition
-New direction in response to CTT
-paradigm for test design and interpretation based on statistical modeling of the characteristic relationship between the test-taker and item probability (e.g. difficulty, respone likelihood)
|
|
|
Term
Item Response Theory example |
|
Definition
Item response functioning estimates item endorsement probability (differential item functioning for group comparisons)
item characteristic curve (ICC): typifies the relationship between an item andresponse across groups |
|
|