Term
Dichotomous/Continuous Variables |
|
Definition
Continuous variables assume an intermediate value between two other values and there can be an infinite amount of possible values between those two values. Dichotomous variables only have two values, such as yes/no. Point-biserial correlation coefficients examine the relationship between a dichotomous variable and a continuous variable. Only used with true dichotomous variables. Biserial correlation coefficients examine the relationship between an artificially-created (made from a continuous variable) dichotomous variable and a continuous variable |
|
|
Term
|
Definition
Goal: To compare test taker's performance to other test takers' performance. Requirements: Large, standardized samples that are representative of the population. Examples: GRE, SAT, IQ Problem: The population of interest may change, which would necessitate obtaining a new standardization sample |
|
|
Term
|
Definition
Predictor variable 1. Synonymous with independent variable 2. It is the variable that is used to predict variance in the criterion 3. Plotted on the X-axis Criterion variable 1. Synonymous with dependent variable 2. The variance of the criterion is predicted by the predictor 3. Plotted on the Y-axis Relationship between predictor and criterion 1. Cannot be assumed to be causal unless the predictor has been manipulated by the researcher 2. Long time periods between assessment of predictor and assessment of criterion can lead to other factors influencing the criterion Assessment of relationship between predictor and criterion 1.Beta (ß) weights: strength of a predictor when all other predictors are held constant 2. R2:unique predictive strength of a predictor 3. Zero-order correlation: relationship between predictor and criterion ignoring all other predictors 4. Multicollinearity (highly correlated predictors) may reduce predictive ability of predictors Criterion problem 1. When the criterion is unreliable, predictor will not be able to adequately predict the criterion Validity coefficient 1. Correlation between predictor and criterion 2. Squared validity coefficient indicates the proportion of variance in criterion that is accounted for by predictor 3. Greater range of scores in both predictor and criterion increases validity coefficient; restricted range decreases validity coefficient 4. Few validity coefficients exceed 0.60 Conceptual criterion 1. Theoretical standard that researchers seek to understand Actual criterion 1. Operational or actual standard that researchers actually assess Criterion deficiency 1. Portion of the conceptual criterion that is not measured by the actual criterion Criterion contamination 1. Outside variables impacting the criterion 2. Occurs when the criterion being measured includes aspects that are not related to the criterion 3. May occur when the rater has knowledge of predictor scores 4. Subjectively-scored measures prone to rater biases Criterion relevance 1. Degree of overlap between the actual criterion and the conceptual criterion Composite criterion 1. Available criterion measure is a composite of separable attributes Selection fairness 1. Social consequences of selection procedures Criterion of discrimination 1. A criterion that inaccurately differentiates between groups, resulting in majority members being overrepresented in comparison to minority groups False and true positive/negatives 1. True positive: the number of individuals in a given group who exceed cutoff on both predictor and criterion 2. False positive: the number of individuals in a given group who exceed cutoff on predictor but fail to exceed cutoff on criterion 3. True negative: the number of individuals in a given group who fail to exceed cutoff on both predictor and criterion 4. False negative: the number of individuals in a given group who fail to exceed cutoff on predictor but exceed cutoff criterion |
|
|
Term
|
Definition
Single-subject designs use one or more participants and are focused on assessing variables within an individual rather than between individuals. Single-subject designs are idiographic (differences within a participant) rather than nomothetic (differences between participants). Types: Case study or experimental Case study: describes an individual by using tests or naturalistic observation. Experimental: determines how the introduction of a factor affects behavior. Problems: Autocorrelation, time intensive, generalizability, practice effects Autocorrelation: When a participant is measured on the same variable multiple times, the variable becomes correlated with itself. Time intensive: multiple assessments and/or in-depth observation in single-subjects designs take a great deal of time. Generalizability: Because single-subject designs can make use of only one participant, the results may not be generalizable to other individuals. Practice effects: Because the participant is tested multiple times, scores may increase simply because of practice |
|
|
Term
|
Definition
T-score: a standardized score that allows for a participant's score to be compared to the norm group. Mean of 50 with standard deviation of 10. Similar to a z-score, which is also a standardized score with a mean of 0 and a standard deviation of 1. About 68 percent of scores fall within one standard deviation (T-scores of 40 to 60); about 95 percent of scores fall within two standard deviations (T-scores of 30 to 70). Used in many psychological tests |
|
|
Term
Criterion-Referenced and Norm-Referenced Tests |
|
Definition
Criterion-referenced testing: Compares the test-taker's performance to an objective standard of achievement Domain-referenced: A type of criterion-referenced test that examines the degree to which the test taker has mastered a specific area. Objectives-referenced: A type of criterion-referenced test that examines the degree to which the test taker has achieved instructional objectives. Norm-reference testing: Compares the test-taker's performance to other test-takers' performance. Requirements: Large standardized samples that are representative of the population. Examples: GRE, SAT, IQ. Problem: The population of interest may change, which would necessitate obtaining a new standardization sample |
|
|
Term
|
Definition
Measures the amount of variance in a set of tests or items that can be accounted for by an underlying factor. Used in factor analysis and principal components analysis. Eigenvalues are often converted into percentages to determine the percentage of variance in a set of tests of items that can be accounted for by an underlying factor. Factor Analysis will provide the same number of eigenvalues as there are items or tests. Large eigenvalues indicate that an underlying factor is explaining a large amount of variance in a set of items or tests. |
|
|
Term
|
Definition
1. A Qualitative Research approach 2. More concerned with how than what 3. Purpose: To study groups or phenomena in their real-world settings 4. Utilizes observations, in-depth, open-ended interviews and written documents 5. Yields rich, in-depth portrait of phenomena studied 6. Characteristics of Naturalistic Inquiry: a. Carried out in a natural setting b. Case-study format c. Mostly qualitative d. NOT concerned with Causality, Objectivity, Bias, and Generalizability. 7. In contrast with Quantitative Research, does not attempt to control and manipulate conditions 8. Criticized for lack of scientific rigor 9. Criteria of authenticity, trustworthiness, and goodness suggested to assess quality and precision 10. Often used in combo with other QR approaches, including: Narrative Inquiry, Grounded Theory, Ethnography, Phenomenology, Kinesics 11. Extensively used in the social sciences 12. Time intensive |
|
|
Term
One-Tailed Test & Two-Tailed Test |
|
Definition
One-tailed test Also known as directional test. Test for rejection in only one tail. Greater chance of rejecting null hypothesis. Two-tailed test. Also known as non-directional test. Test for rejection in both tails. Able to reject null hypothesis in both tails, but each tail has less chance of rejecting the null hypothesis |
|
|
Term
Threats to Internal Validity |
|
Definition
Seven threats to internal validity. History: Events that take place between the pretest and posttest. Maturation: Changes in participants that have occurred naturally. Testing: Practice effects make participants score better over repeated testing. Mortality: Participants dropping out of a study before it is completed. Selection: Participants in one group are different from participants in another group. Regression effects: Tendency of participants with extreme scores on a first measure to score closer to the mean on a second testing (also known as regression to the mean). Demand characteristics: Participants changing their behavior because of their interpretation of the experiment's purpose. |
|
|
Term
Type I Error, Type II Error and Type III Error |
|
Definition
Type I Error 1. Rejecting the null hypothesis when it is true 2. Type I error is usually set to 0.05 in the social sciences Type II Error 1. Failing to reject the null hypothesis when it is false 2. Related to Type II error is power (1 – ß), the probability of rejecting the null hypothesis when it is false Type III Error 1. Rejecting the null hypothesis, but for the wrong reason 2. Because of sampling error, two groups can be correctly identified as being significantly different, but the direction of the difference is the opposite of reality 3. Type III errors are relatively rare 4. As Type I Error becomes smaller, Type II Error becomes larger Power is impacted by sample size, alpha, effect size, and the test used 1.Sample size: larger samples increase power 2. Alpha: smaller alpha levels (e.g. 0.01 or 0.001 rather than 0.05) decrease power 3. Effect size: greater effect sizes (e.g. a larger difference between the means of two groups) increase power 4. Test used: different statistical tests have more power (e.g., a two-way ANOVA is more powerful than a one-way ANOVA) |
|
|
Term
|
Definition
A moderator variable changes the relationship between a predictor and a criterion variable. Equivalent to an interaction effect in ANOVA. Background variables, such as gender and socioeconomic status, are common moderators. When a moderator is present, a test has differential validity (i.e., the validity of the test is different depending on the level of the moderator, such as whether someone is male or female) |
|
|
Term
Multitrait-Multimethod Matrix |
|
Definition
Assesses construct validity (convergent and divergent). Uses multiple methods (multimethod) of assessing multiple constructs (multitrait). Convergent validity is evidenced by high correlations between measures of the same trait. Divergent validity is evidenced by low correlations between measures of different traits. Four types of correlation coefficients: Monotrait-Monomethod: correlation between two tests that measure one trait using one method. Monotrait-Heteromethod: correlation between two tests that measure one trait using different methods. Heterotrait-Monomethod: correlation between two tests that measure different traits using one method. Heterotrait-Heteromethod: correlation between two tests that measure different traits using different methods. |
|
|
Term
Correction for Guessing Formula |
|
Definition
Adjusts obtained test scores to correct for lucky guesses 1. Usually used for multiple-choice exams 2. Corrected Score = R–W/(n– 1), where R is the number of right answers obtained, W is the number of wrong answers, and n is the number of possible answers per question 3. Assumes: a. test-takers either know the answer or they don't b. test-takers who know the answer get it right; those who don't know it guess c. incorrect answers reflect guessing 4. Example: 76 – 24/(4 - 1) = 76 – 8 = 68 (corrected score) a. 76 is the score obtained in the test, 24 is the number of incorrect responses and 4 is the number of options for each answer b. To avoid confusion with the math, recall that solving 24/(4-1) comes first. Once you have that (which is 8) you subtract it from 76 to obtain 68 |
|
|
Term
|
Definition
Measures the degree to which judges agree A measure of inter-rate reliability. Increases when raters are well-trained and aware of being observed. Applicable only with nominal, ordinal, or discontinuous data. Ranges from -1 to +1. 0.80 - 0.90 indicates good agreement |
|
|
Term
Kuder-Richardson Formula 20 |
|
Definition
1. A method of evaluating internal consistency reliability 2. Used when test items are dichotomously scored 3. Used when test items vary in difficulty 4. Indicates the degree to which test items are homogeneous 5. Falsely elevates internal consistency when used with timed tests |
|
|
Term
Item Characteristic Curve |
|
Definition
Used in Item Response Theory. A graphical representation of a test item's difficulty, discrimination, and chance of false positives Difficulty (degree of attribute needed to pass the item): indicated by position of the curve on the X axis. Discrimination (ability to differentiate between high and low scorers): indicated by slope of the curve. Chance of false positives (probability of getting the answer correct by guessing): indicated by the Y-intercept of the curve |
|
|
Term
|
Definition
The percentage of examinees that answer the item correctly (how much of the attribute an individual must possess to pass the item). Referred to as item difficulty index or p ranges between 0 and 1; ) means that no one passed the item (too hard); 1 means that everyone passed the item (too easy); average difficulty should be 0.5. A parameter found in Item Response Theory. Can be visually represented in the item characteristic curve. Difficulty is displayed by the position of the 50 percent mark of the curve on the X-axis; more difficult items are to the right on the item characteristic curve. "Floor" effects refers to a test's ability to distinguish people at the low end of a distribution, while "ceiling" effects refer to a test's ability to distinguish people at the high end of a distribution. Item discrimination is defined as the ability of the item to unambiguously separate out those who fail from those who pass. Can be visually represented in the item characteristic curve. Discrimination is displayed as the slope of the curve, with steeper slopes indicating more discrimination. Item discrimination is assessed by the item discrimination index D: the difference between the proportion of low-scorers who answered the item correctly and high-scorers who answered the item correctly. D ranges from 1 to -1; it is desirable to have positive values of D, which would indicate that more high-scoring examinees (rather than low-scoring examinees) answered the item correctly |
|
|
Term
|
Definition
Focuses on determining specific parameters of test items. Makes use of item characteristic curves. Item characteristic curves provide information about item difficulty, item discrimination, and probability of false positives. Difficulty (degree of attribute needed to pass the item): indicated by position of the curve on the X axis. Discrimination (ability to differentiate between high and low scorers): indicated by slope of the curve. Chance of false positives (probability of getting the answer correct by guessing): indicated by the Y-intercept of the curve. Assumptions Single underlying trait. Relationship between trait and item response can be displayed in item characteristic curve. Requires large sample size IRT used in development of Computer Adaptive Assessments, which customize tests to the examinee's ability level. Classic Test Theory: True Score + Measurement Error |
|
|
Term
|
Definition
Tests for differences in the mean scores of groups based one or more IVs DV must be continuous and IV must be categorical Tests the null hypothesis that the means of the groups are equal. Assumptions: Independence of observations (each participant is in only one cell). Normality (distribution of scores cluster around the mean with fewer observations falling farther from the mean; also known as the bell-shaped curve). Homogeneity of variance (the variance of every group is the same as the variance of every other group), also known as homoscedasticity. Types: One-Way ANOVA: tests the main effect of one IV. Two-Way ANOVA: tests the main effect of the first IV (A), the second IV (B), and the interaction of the two IVs (A*B). Interaction effect: The effect of one IV on the DV differs depending on the level of the other IV. F-ratios Ratios of effect variance to error variance. In One-Way ANOVA, there is one F-ratio for the effect of the IV. In Two-Way ANOVA, there are three F-ratios (main effect A, main effect B, and interaction effect A*B). Advantages of Two-Way ANOVA over One-Way ANOVA: Includes interaction effect (One-Way ANOVA only provides main effect). Increases power (greater ability to detect a significant effect when one exists, which is the same as a greater ability to reject the null hypothesis when it is false). Reduces familywise error rate (when more analyses are conducted, there is a higher chance of making a Type I error; better than doing two One-Way ANOVAs, which would increase the chance of making a Type I error). |
|
|
Term
|
Definition
A statistical technique that identifies underlying patterns in a dataset. Goals: Identify underlying factors that are responsible for variation in a set of items, variables, or tests. Reduce a large set of variables to a smaller number of underlying factors Produces eigenvalues (provide information about the amount of variance explained by a single factor). Produces factor loadings (provide a measure of the correlation between an item and an underlying construct). Higher factor loadings indicate that the underlying factor is accounting for a large amount of variance in the item Factor rotation is used to aid in interpretation of factors. Communality The sum of the squared factor loadings for all factors in one construct. It show the variance in one variable that is accounted for by all the factors. |
|
|
Term
|
Definition
An extension of ANOVA. Identifies trends in data when the IV varies from highest to lowest An example is one group is given 5 mg of a medication, a second group is given 10 mg, a third group is given 15 mg, and so on Trends can be linear, quadratic, cubic, quartic, or quintic Linear: means are arranged in a line. Quadratic: means are arranged in a U shape. Cubic: means are arranged around two points of inflection. Quartic: means are arranged around three points of inflection. Quintic: means are arranged around four points of inflection. Orthogonal polynomial coefficients are used to test trends |
|
|