Term
|
Definition
Selecting items involces considering each item's relevance, difficulty level and ability to discriminate between examinees with different levesl of the characteristic being studied. |
|
|
Term
|
Definition
Does the item assess the content of the behavioral domain the test is designed to evaluate? |
|
|
Term
|
Definition
Does it reflect the appropriate cognitive or ability level? |
|
|
Term
|
Definition
Does it require knowledge, skill or abiliities outside of the domain of interest? |
|
|
Term
|
Definition
- The extent to which a test item distinguishes between high and low scorers on the whole test. - Ranges -1.0 - +1.0 1.0 = if all examiniess in the upper group and none in the lower group answer item correctly -1.0 = if opposite D = .35 or higher is considered acceptable - Moderate difficulty has greatest potential for discrimination. |
|
|
Term
|
Definition
Item characteristics derived from IRT are the same across samples - possible to equate scores from different sets of items and from different tests because test scores are reported in terms of level on the trait measured - Like GPA rather than individual scores in different classes (90% in math different than 90% in English) |
|
|
Term
Item Characteristic Curve |
|
Definition
- Difficulty level of an item is indicated by the ability level where 50% of examinees obtained a correct response. - Item's ability to discriminate between high and low achievers is indicated by the slope of the curve. The steeper the slope, the greater the discrimination. |
|
|
Term
|
Definition
How much the scores reflect the truth and how much it reflects error - Estimate of the proportion of variability in examinees obtained scores that is due to true differences among examinees on the attributes measuredd by the test. - CONSISTENCY = RELIABILITY |
|
|
Term
|
Definition
A correlation coefficitent .0-+1.0 Correlating test with itself ex. .84 indicates that 84% of variability in scores id due to true score differences among examinees while 16% is due to measurement error. |
|
|
Term
Internal Consistency Reliability - Split-Half |
|
Definition
Test is split in 2. Scores on 2 halves are correlated. Problem: reliability is based on half of the length of the test and reliability decreases as length of the test decreases - so usually underestimates true reliability. |
|
|
Term
Spearman-Brown Prophecy Formula |
|
Definition
Corrects the split-half reliability - estimates what the reliability coefficient would have been if based on full length. |
|
|
Term
Cronbach's Coefficient Alpha |
|
Definition
Administer test 1 time to a single group. - Formula to determine average degree of inter-item consistency (average obtained from all possible splits of the test) - When test items are scored dichotomously (right/wrong) use KUDER-RICHARDSOM (KR-20) |
|
|
Term
|
Definition
Calculate correlation coefficient with KAPPA statistic - Can determing % of agreement between 2 raters Error sources: lack of motivation of rater, rater biases, measuring device. |
|
|
Term
Factors that Affect the Reliability Coefficient |
|
Definition
1. Test Length - the longer the test, the larger the test's reliability coefficient 2. Range of Test scores - Reliability coefficient is maximized when range of scores is unrestricted. 3. Guessing - As probability of guessing right increases, the reliability coefficient decreases. |
|
|
Term
Standard Error of Measurement |
|
Definition
An index of amount of error is expected in obtained score for individual due to unreliability of the test * If raw score was converted to percentile rank, confidence interval = percentile band Ex. Polls - 45% +/- 3% (3% standard error of measurement |
|
|
Term
|
Definition
Refers to test accuracy A test is VALID when it measures what it is intended to measure |
|
|
Term
|
Definition
Associated with achievement tests that measure knowledge of 1 or more content domain Involves clear identification of the content and then writing or selecting items that represent it. Establishment relies on judgement of subject matter experts. |
|
|
Term
|
Definition
Measures the hypothetical trait it is intended to measure * Do all items measure the same construct? * Does it accurately distinguish between people who have different levels of construct? * Do test scores change with manipulation in a direction predicted by theory? |
|
|
Term
Multitrait-multimethod matrix |
|
Definition
Used to systematically organize data collected when assessing a test's convergent and discriminent valididty |
|
|
Term
Monotrait-monomethod coefficients |
|
Definition
* Same trait-same method Reliability coefficients indicating the coreelation between a measure and itself. |
|
|
Term
Monotrait-heteromethod coefficients |
|
Definition
* same trait-different methods Correlations between different measures of the same trait - when large = convergent validity |
|
|
Term
Heterotrait-monomethod coefficients |
|
Definition
* different traits-same method Correlations between different traits within the measure - When small, indicates that the test has discriminant validity |
|
|
Term
|
Definition
* Different traits-different methods Correlations between different measures of different traits - when small indicates discriminant validity |
|
|
Term
5 Steps of Factor Analysis |
|
Definition
1. Administer several tests to a group 2. Correlate scores on each test with scores on every other test to obtain a correlation matrix 3. Using 1 of several techniques, convert the correlation matrix to a factor matrix. 4. Simplify the interpretation of the factors by rotating them 5. Interpret and name the factors in the rotated factor matrix. |
|
|
Term
|
Definition
Indicates common variance or amount of variability in test scores due to factors the test shares in common or total amount of variability in test scores that is explained by the identified factors. |
|
|
Term
|
Definition
1. Orthogonal - resulting factors are uncorrelated and independent 2. Oblique - resulting factors are correlated and not independent. |
|
|
Term
Criterion-Related Validity |
|
Definition
* Is of interest when scores are to be used to conclude/predict how examinee will likely stand on another measure. Assessed by correlating the scores of a sample of individuals on the predictor with their scores on the criterion. |
|
|
Term
|
Definition
Criterion data are collected prior to or at about the same time as data on the predictor to estimate CURRENT status |
|
|
Term
|
Definition
Predicitive validity is collected some time later to predict future performance on the criterion. |
|
|
Term
|
Definition
Items from original item pool included in the final version of the predictor test are those that are correlated the most with the criterion. |
|
|
Term
Norm-Referenced Interpretation |
|
Definition
Involves comparing score to scores obtained by people included in a normative sample. The raw score is converted to another score that indicates the examinee's relative standing in the norm group. * Standard scores and Percentile ranks |
|
|
Term
|
Definition
* Expresses an examinee's raw score in terms of % of examinees in the norm sample who achieved lower scores. * Does not provide info about absolute differencces, only order. Can say one is more than the other but not by how much |
|
|
Term
|
Definition
* When raw test scores is converted to a standard score, the transformeds core indicates the examinee's position in the normative sample in terms of standard deviations from the mean. |
|
|
Term
Properties of a Z-score distribution |
|
Definition
1. The mean of the z-score distribution is equal to 0 2. The standard deviation is equal to 1. 3. All raw scores below the mean are negative scores, above mean are positive 4. Unless it is "normalized" the z-score distribution has the same shape as the raw score distribution. |
|
|