Term
assessment: definition and purpose |
|
Definition
any of a variety of ways to look at performance. how well does the individual perform? |
|
|
Term
test: definition and purpose |
|
Definition
instrument OR systematic procedure with uniform questions to sample behavior how well does the individual perform? can have NRT or CRT framework |
|
|
Term
|
Definition
a line that represents a construct ex: knowledge of chemical properties of bases |
|
|
Term
content standards v. performance standards |
|
Definition
"what do students need to know?" v. "how good is good enough?" (judgement! - cut score) |
|
|
Term
4 parts of assessment procedure |
|
Definition
Establish the: nature (max/typical), form (MC/construc/perf), use (plc/form/sum/diag), and method of interpreting (CRT v. NRT) the assessment |
|
|
Term
|
Definition
this is one part of the assessment procedure. Are you looking for "maximum/can do" or "typical/will do" performance? implied assessment type: achievement test vs. surveys/obervation |
|
|
Term
illustrative assessments for measuring "max performance" vs. "typical performance" |
|
Definition
max -> achievement or aptitude test typical -> attitude surveys, observations these categories are examples of deciding the "nature" of assessment |
|
|
Term
|
Definition
one part of the assessment procedure. Forms include: MC, constructed response, performance task |
|
|
Term
|
Definition
one part of the assessment procedure. Uses include: placement, formative, diagnostic, summative |
|
|
Term
compare assessment types: placement, formative, diagnostic, summative (see p. 41 table 2.1) |
|
Definition
placement and summative are higher stakes. formative is FYI, correction & reinforcement diagnostic determines causes of struggle placement can be just for goals/modality |
|
|
Term
questions we are asking with assessment |
|
Definition
what do students know? what are they able to do? what do they need to do next? |
|
|
Term
methods of assessment (hint) |
|
Definition
hint: methods of interpreting CRT vs. NRT |
|
|
Term
|
Definition
Criterion referenced test - no details yet |
|
|
Term
|
Definition
norm-referenced test. No details yet |
|
|
Term
|
Definition
|
|
Term
How do test scores become meaningful? |
|
Definition
this is an essential question. answer should address all aspects of validity (how many are there?) and reliability (specify variety of types that might be of interest |
|
|
Term
How can we use tests to improve education and society? |
|
Definition
another essential question from lecture 2. answer should have lots of hedged recommendations. |
|
|
Term
validity: definition & types |
|
Definition
the degree to which an assessment instrument or procedure can be used/interpreted for a specific purpose (context dependent). assessment should: (1) cover content it purports to test, (2) correlate with specified, appropriate criteria (3) generate results consistent with implications of stated constructs (difficulty of items; bloom's taxonomy), (4) have consequences that are fair and appropriate. validity determinations are largely a matter of judgment. |
|
|
Term
reliability (table 5.1): definition, types, & methods |
|
Definition
the degree of consistency of the outcomes of an assessment. 5 diff. measures of reliability, + method(s) for each. One might measure (1) stability across time [using test-retest], (2) equivalence [using equivalent forms], (3) BOTH, (4) internal consistency [using 3 ways methods], or (5) conistency of ratings [using interrater methods]
reliability is reported as statistical coefficients (0-1) |
|
|
Term
|
Definition
analogous to accuracy (i got what I wanted) vs. precision (I got the same result consistently). with tests, reliability is necessary but not sufficient for validity. VALID is specific to a particular stated purpose. RELIABLE is specific to a particular "sample" of takers (aka, context, group) |
|
|
Term
|
Definition
the degree to which an assessment instrument or procedure covers content it purports to test. 4 steps to establishing: (1) objectives?, (2) know-do blueprint (bloom), (3) make test, (4) judge alignment |
|
|
Term
procedure for attaining content validity |
|
Definition
(1) identify objectives/goals, (2) build table of specs (KNOW content , DO bloom), (3) construct test, (4) panel to evaluate alignment |
|
|
Term
criterion-related validity |
|
Definition
Measure of scores' correlation to an “appropriate” criterion, which may be Concurrent (eg current GPA) OR predictive (eg future GPA). Although this aspect of validity involves correlation coefficient, judgment is still required to decide what degree of correlation is good enough. |
|
|
Term
construct-related validity |
|
Definition
the degree to which an assessment generates results that are consistent with the implications of stated constructs (difficulty of items; bloom's taxonomy); when you PROPOSE that an item fits a specific construct (eg, this is a comprehension question - it’s easy) then that construct implies the sorts of scores you should get (HIGH). If evidence (scores) fit that prediction, then your proposed construct interpretation is valid. |
|
|
Term
|
Definition
the degree to which an assessment instrument or procedure (including interpretation) has consequences that are fair and appropriate. |
|
|
Term
|
Definition
measure of stability of test scores over time (one type of reliability) |
|
|
Term
|
Definition
measure of stability of test scores from different versions of test (one type of reliability) |
|
|
Term
|
Definition
measure of stability of test scores from halves of items within a single test. Requires use of spearman-brown formula |
|
|
Term
|
Definition
KR20 and KR21 and Cronbach's alpha coefficient are calculations that measure the internal consistency of a single test, which is one measure of reliability |
|
|
Term
|
Definition
ways to measure the stability scores of the same test given by different raters. This is one type of reliability. |
|
|
Term
ways that tests may be consistent or not, aka, sources of variation (table 5.4) |
|
Definition
1. testing procedure (use any method except interrater) 2. student "characteristics"/response (use interval) 3. sample items (use eq forms or internal consist) 4. judgmental scores (use interraters) |
|
|
Term
consistency in testing procedure |
|
Definition
part of reliability; inconsistency will be detected by all methods of reliability estimation EXCEPT interrater |
|
|
Term
consistency in student characteristics (how kids respond to test) |
|
Definition
part of reliability; inconsistency will be detected by any time interval method, and to a less useful extent by test-retest |
|
|
Term
consistency over diff. samples of items |
|
Definition
part of reliability; inconsistency will be detected by test-retest OR internal consistency methods |
|
|
Term
|
Definition
one aspect of reliability, it can be measured by split-half method (which requires spearman-brown formula), KR20 and KR21 and Cronbach's alpha coefficient (remember "generalizability theory" too?) |
|
|
Term
|
Definition
Standard Error of measurement (need to know formula?). To get range of likely values for a student's "true" score, add a "confidence band" of +/- 1 SEM around every score for a category/domain.
SEM is determined by a student's standard deviation AND the test's reliability coefficient (table 5.6) |
|
|
Term
|
Definition
2nd step of process for establishing content validity. Table can separate out both both content and construct (bloom's taxonomy) . In 4th step, items from test are placed into "cells" in the spec. table to see if they are distributed in the way you intended |
|
|
Term
|
Definition
A correlation coefficient that relates to the reliability of a test (eg, correlation between 2 forms of test, or test-retest, or between odd & even items, etc). Corr. coefficients range between +/- 1, but reliability can only go as low as zero. A neg. corr. coeff is stated as "zero" reliability. |
|
|