Term
|
Definition
| The items in the scale appear to be valid on the “face of it”, they seem to be relevant to the phenomena of interest, scale items make sense “on the surface |
|
|
Term
| Content Validity (AKA Object Validity) |
|
Definition
Extent to which items adequately reflect/represent all relevant facets of a concept, need to know or pre-specify content domain, items should cover all concepts defined in the conceptual definition, used especially for skill based tests (achievement, SAT, knowledge) and for assessments in the workplace. |
|
|
Term
|
Definition
| The key to criterion validity is (of course) a clear and unambiguous criterion, preferably one that is unimpeachable. The higher the correlation the better, the main requirement being statistical significance. Criterion validity is often (but not invariably) predictive. Criterion validity is typically atheoretical. Usually criterion validity reflects a test’s ability to have practical value as a tool for classification. Some examples: Employment test tracks status (Fired vs. Promoted), SAT score tracks college status or grade, Measure of health status |
|
|
Term
|
Definition
| he degree to which a test measures the theoretical construct (concept with a proven track record in the literature) that it is designed to measure. Evidence is developed from multiple relationships with other variables, not just a single variable or a single theory. Evaluates how well the test fits into a pre-specified network of theories, aka a nomological network.To validate a construct one must specify what other known measures should, and should not, be correlated with the test or measure. Addresses the measure’s meaningfulness, its connection to other theories, and its relationship to other variables, other tests, and other measures. |
|
|
Term
|
Definition
| The test of interest should differentiate between sub-groups within a relevant category. Note the relation between discriminant validity and both construct validity and criterion validity. Different measures should make important distinctions OR a single measure should allow groups to be classified appropriately (e.g., with a Duncan Range Test or similar tool.) |
|
|
Term
|
Definition
| Different measures of the same construct should be correlated, and members identified by related tests should overlap. |
|
|
Term
| Threats to Internal Validity |
|
Definition
| History, Maturation, Testing effects, Instrumentation, Reaction measures, Selection of participants, Sample attrition, Regression effects, Compensatory rivalry, Resentful demoralization |
|
|
Term
|
Definition
| Nominal, Ordinal, Interval, Ratio |
|
|
Term
| Test-Retest (Temporal) Reliability |
|
Definition
| someone is given a measure and then it is readministered at a later time. Use correlations to test how well the two measures are related. It is important to consider the length of time between measurements (too short and people might remember and put the same response, too long and changes may have occurred that impact their response) |
|
|
Term
| Split-half Reliability Coefficient |
|
Definition
| cut the measure in half and see how similar the responses are in both halves using a correlation. Measures can be split by even vs. odd numbered items or 1st vs. 2nd half |
|
|
Term
|
Definition
| correlate each item to the entire scale excluding that item and then obtain the average item/scale correlation. Floor is 0.7 in peer-reviewed journals |
|
|
Term
| Alternate Forms Reliability |
|
Definition
| develop two versions of the same measure that are correlated |
|
|
Term
|
Definition
| measure extent to which two or more observers, interviewers, or coders get equivalent results using the same instrument |
|
|
Term
|
Definition
| describe what a term means |
|
|
Term
|
Definition
| describe how you will measure the term. Used to make sure terms have meanings in the sense of verifiability |
|
|
Term
|
Definition
| 3 conflicting goals: generalizability; precision in control and measurement of variables; realism with respect to context |
|
|
Term
|
Definition
| spelling out the logical implications of what is already known or ass |
|
|
Term
|
Definition
| process of developing a hypothesis by generalizing from specific instances |
|
|
Term
| Hypothesis testing may be done for: |
|
Definition
| Discovery, Demonstration, Refutation, Replication, Amelioration |
|
|
Term
| A well-formed hypothesis must be: |
|
Definition
| Testable, Relevant, Verifiably Predictive, Parsimonious |
|
|
Term
|
Definition
is the incorrect rejection of a true null hypothesis. It is a false positive. Usually a type I error leads one to conclude that a supposed effect or relationship exists when in fact it doesn't The P value is the probability of making a Type I error As N (sample size) increases, the probability of making a Type I error decreases. |
|
|
Term
|
Definition
is the failure to reject a false null hypothesis. It is a false negative. A type II error leads one to conclude that a supposed effect or relationship doesn’t exist when it does. Statistical Power (b) is the probability of avoiding a type II error A large N increases statistical power & reduces the probability of a type II error. |
|
|