Shared Flashcard Set

Details

Title

Tests and Measurment Exam 3

Description

USF CLP 4433 Dr. Stark

Total Cards

Subject

Psychology

Level

Undergraduate 4

Created

11/16/2011

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Psychology Flashcards

Cards Return to Set Details

Term

Base Rate (BR)

Definition

proportion of people in population who can successfully do the job

Term

Selection Ratio (SR)

Definition

proportion of persons hired/admitted (SR= #selected / #applicants)

Term

The accuracy of selection decisions based on test scores depends on 3 factors:

Definition

Based Rate
Selection Ratio
Test Validity

Term

There are four possible outcomes of every selection decision. These can be arranged in a table.

Definition

True Positive (TP)
False Positive (FP)
False Negative (FN)
True Negative (TN)

Term

When the base rate is high

Definition

When the base rate is highthere will be more TP and FN

Term

When the base rate is low

Definition

will be more TN and FP.

Term

Taylor-Russell Table

Definition

--The relationship of validity coefficients to the practical effectiveness of tests in selection.
--Taylor and Russell catalogued expected hit rates (TP) for different base rates, selection ratios, and validity coefficients.
--These tables are based on premise that an organization wants to maximize True Positive decisions; other outcomes are not considered.
--The tables assume bivariate normality, which can be violated if a test has floor or ceiling effects. Unless this violation is severe, the tables are reasonably accurate.

Term

Test reliability can be increased in two ways:

Definition

-- by adding items that correlate positively with others
--by removing items that are problematic (too wordy, tricky/confusing, too hard/easy)

Term

Multiple-choice items
1. Body of a multiple choice item
2. The choices that follow
3. Incorrect response options

Definition

1. Stem
2. Response Options
3. Distractors

Term

2 possibilities for scoring multiple

Definition

If response options can be ordered to reflect different

Term

choice items

Definition

degrees of correctness, a 3 might be assigned to the right answer; 2 for the next best, 1 for the next, and 0 for completely incorrect. This scheme awards points for partial knowledge. (polytomous scoring)

On the other hand, it is more common to simply assign 1 for choosing the correct response; and 0 for all other responses (dichotomous scoring).

Term

According to Murphy, a "perfect test item" has two characteristics

Definition

--all people who know the answer will choose the correct reponse
--those who do not know the answer will choose randomly among the distractors, which implies (some respondents will guess correctly, each possible incorrect response will be equally popular)

Term

Distractors that are rarely chosen

Definition

decrease the difficulty of an item

Term

Test items may be scored DICHOTOMOUSLY

Definition

**two possible socres for each item
EX. math items have a right and wrong answer.. --assign 0 if wrong answer is chosen --assign 1 if right answer is chosen

Term

Survey and multiple choice items may be scored POLYTOMOUSLY

Definition

**3 or more possible scores per item
EX. attitude surveys do not have right and wrong answers (-2=SD -1=D and etc)

Term

Traditional methods of item analysis

Definition

judge the quality of items with respect to the intended sample of test takers. Three psychometric properties are important: --how difficult/easy is the item for the target group of examinees --how well does the item discriminate among persons having different levels of ability --good test items are of moderate difficulty and discriminate well among examinees.

Term

To examine the difficulty and discriminating power of items, we often consider 3 basic statistics:

Definition

** P-values
** Item-total correlations
** Inter-item correlations

Term

P-Values

Definition

*which represent the proportion of persons correctly answering or endorsing an item.
high p-value -> easy (too high >.9)

Term

Item-Total Correlations

Definition

correlation of responses to individual test items with the total test score
(correlations greater then .3 you would want to keep!)

Term

Inter-Item Correlations

Definition

(the correlation of items with each other)
--in general, a reliable test can be created by adding items that correlate positively with each other, even if the correlations are small (0.2)
--Inter-item correlations are LARGE when test content is homogeneous and small when heterogeneous

Term

Test reliability is influenced

Definition

by the variance of total test scores

Term

One way to increase test score variance is to

Definition

select items having p-valuues near 0.5

Term

Items considered bad or (not useful for the target sample) have the following general properties:

Definition

*p-values less than 0.1 or greater than 0.9
*negative or very low (<.1) inter-item correlations
*negative or low item-total correlations (<0.3)

Term

P-values less than 0.1 or greater than 0.9

Definition

items having p-values less than 0.1 or greater than 0.9, contribute little to test variance. They don't differentiate among examinees of high and low ability. So, they can be dropped without loss of measurement precision.

Term

Negative or very low (<.1)inter-item correlations

Definition

Negative or very low (<.1) inter-item correlations suggest that the test is measuring more than 1 construct.
Removing items having negative or very low inter-item correlations will increase internal consistency reliability (recall, coefficient alpha assumes homogeneity)

Term

Negative or low item-total correlations (<.3)

Definition

Items having negative correlations with the total test score must be dropped. People who did well on the test did poorly on the item, indicating a possible problem with content.
Items having low item-total correlations are

Term

Broad test

Definition

inter-item correlations will be small (hetero)

Term

Narrow Test

Definition

inter-item correlations larger (homo)

*if a test is broad more items needed to achieve acceptable levels (.7 or more)

Term

Item Response Theory (IRT)

Definition

*is a relatively new and powerful methodology for examining the properties of test items. *Items can be compared using parameters that reflect difficulty, discrimination and effects of guessing.
*The greek letter theta is used to represent an examinee's trait level (score) (ability skill, or standing on the construct measured by a test). Scores are standard normal, ranging from about -3 to +3.
*The quality of items is examined using item response functions (IRF's) which graphically illustrate the relationship between trait level and the probability of a correct response.
(plot functions they have S shaped )

Term

Computerized adaptive testing (applications of IRT methods)

Definition

*create test tailored to examinee ability
*administer only items tht provide high information about the examinee, thus, reducing the error in a person's score
*Adaptive tests require only about half as many items as nonadaptive tests to obtain a similar level of accuracy.

Term

Detecting biased items (applications of IRT methods)

Definition

*by comparing IRF's across groups, one can determine whether a test item exhibits psychometric bias (a.k.a differential item functioning, DIF)
*An item is said to be biased if its IRFs differ across groups of examinees (eg., men and women) after a process called "linking"

Term

Adaptive testing

Definition

tailored examinations (to group or individuals applications)

Term

Constructing the test

Definition

*Selecting item types
*Item Writing
*Item Content
*Item response alternatives (response format)

Term

Selecting item types

Definition

*constructed response (short answer/essay)demos of skill
*Low fidelity simulation (describe how something should be produced)
*High fidelity simulation (actually develop the product or do the task)
*

Term

Item Writing

Definition

*first step in test constructions is to generate a pool of items
*generally need 2-3 times as many items in poolas you desire in final version of test
*items will be selected based on both content and psychometric properties

Term

Guidelines for item writing

Definition

AVOID
*Long items
*Double negatives
*Double-barreled statements (mix different concepts) do not inculde asking two things
*Sexist, racist, offensive, language
*Slang that may go out of date quickly
*Using big, complicated or esoteric words (EX. the word HOT can have different meanings)
DO select appropriate reading level for target group (e.g., 5th grade)

Term

Item Content

Definition

*generally there are two approaches to scale development: rational and empirical
*often use a hybrid (mix) of these two approaches; call it rational-empirical method.

Term

Rational Scales (item content)

Definition

*create items based on a theory of behavior; some underlying thought, belief, or rationale used as based for selectiong items. Answers based on theoretical grounds.
*Advantage- can use theory to make predictions about behavior. good face validity
*Disadvantage- items tend to be transparent (ie. clear what they are measuring); so responses are subject to conscious (faking) or unconscious (self-deception) distortion.

Term

Empirical Scales

Definition

*generate broad range of items- not tied to any theory
*compute correlation between item responses and some criterion variable
*select and retain items that predict well (ie. have highest correlation with external criterion) and those that differentiate among members of different groups. For example; select items that best differentiate between schizophrenics and "normal" indviduals.
*Items are scored by empirical keying (aka criterion keying)
*Advantage- ---------.....---------
*Disadvantage- lower face validity

Term

Item Response Alternatives (response format) EXAMPLES

Definition

-the response format refers to the manner in which responses will be collected from the examinees. Ex: True-False, Multiple Choice, Free Responses, Auditory Response , Likert Type, Forced Choice
-MC populat because can be scored objectively difficult to write good distractors
-Free Response; get rich information, requires subjective judgemnet to score, must often ecxamine inter-rater agreement

Term

Response Sets

Definition

-Psychologist frequently use self-report measures; *some questions perceived as too invasive or personal *sometimes persons concerned about confidentiality so they consciously distort responses (fake good, fake bad, respond randomly)
-Test developers try to control these effects, which are called RESPONSE SETS; *use scales designed to detect unusual responses *use warnings that unusual response can be detected and that verifiable information will be examined for accuracy

Term

Examples of response sets

Definition

Social Desirability: IDEA persons tend to answer in ways that present themselves in best light (fake good) or worst light (fake bad), ratherthan answer honestly
-Intentionally distorting one's responses is known as FAKING or DISSIMULATION
-Faking is a big issue in non cognitive assessment (personality, biographical data, worker diaries and etc)
-Sharp disagreement about ramificationof faking
-Can you correct for faking after a measure has been administered (research suggest no)
-Can you prevent faking by strategic construction of items or tests (Maybe)

Term

Random Responding....
How can you try to detect it?

Definition

occurs when the examinees fail to attend to content of items because unmotivated, in a hurry or unwilling to cooperate
-Try to detect by: using scales containing mix of negatively and positively worded items and apply mathematical models

Term

Response Styles

Definition

*tendency to answer in a certain way; characteristic you bring to the test
-Acquiescence: tendency to agree with content of item without attending to content
-Criticalness: tendency to disagree with content of item without attending to content
-Dealing with response Styles: are response styles elicited by items that are ambiguous or confusing. Try to detect by using negatively and positively worded items and perhaps including statements that would be clearly false or true EX, it would be odd if person agreed with statement, "Ive never drank water".

Term

Negative Halo

Definition

once you do something bad, that's it.

Term

Positive Halo

Definition

if the rater likes you they will score high on everything

Term

Normative Tests

Definition

-allow for inter-individual (between person) comparisons
-compare each person's score to those of a normative group
-give indication of amount or level of trait exhibited
**can compare scores across people

Term

Ipsative Tests

Definition

allow only for intra-individual (within person) comparisons
-use a forced-choice format (paired comparison) where examinee must express a preference between two alternatives (think about the Carrots or Broccoli example)
-With forced choice like the carrots and broccoli example, you don't know how strong the liking of any vegetable is.
-Thus this test's scores cannot be used for inter-individual comparison as in job selection. This is the "challenge" for developing "fake-resistant" personality tests.

Term

Normative Scales vs Ipsative Scales

Definition

Normative
-can be used for inter-individual comparisons
-provide information about absolute standing on trait(s) assessed
Ipsative
-can only be used for intra-individual comparisons
-provide information about relative standing on traits assessed

Term

Norming psychological tests

Definition

must choose samples that represent the target population: *good comparison groups provide a "representative" sample (demographic characteristics) *typically have several norm groups for each test; local norms preferred

Term

Steps in developing norms

Definition

defining the target population; *decide on composition of normative group based on intended use of test EX, LSAT, MCAT, ACT
-Selecting the sample; *obtain samples that are a cross-section of he population; *regional samples (rural/urban) *gepgraphical
-Standardication; *administer test same way to all individuals *standardization decreases error variance by keeping conditions uniform across administrations *use anchor items to equate scores from different test forms

Term

Test publication and revision
(WRITING THE MANUAL)

Definition

*state purpose of test, directions for administration and scoring, and describes test development and validity evidence EX, Describe validation samples, reliability, converegent and discriminant validity with other measures *Manual must be revised with each new form or amendment

Term

Test publication and revision
(REVISING THE TEST)

Definition

should be revised when: *language is outdated *security is compromised *cotent disclosed *changes to content, format, medium of administration or scoring

Term

Technical Manual

Definition

every 5 years or so you have to go back and update your manual because it’s out of date
*some test are not as urgent to change/update as others. EX, personality test is pretty stable

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Psychology Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile