Term
|
Definition
- Cannot be touched, felt, heard, tasted etc...
- Things like depression and anxiety
-Cannot be directly measured
- We rely on behaviors that accompany the construct |
|
|
Term
|
Definition
|
|
Term
|
Definition
|
|
Term
|
Definition
- Presumed to cause item score
- Rely on scale to measure variable although we are not interested in scale scores.... if studying depression, we are not interested in BDI scores but level of depression.... |
|
|
Term
|
Definition
- cannot directly see what true score is
- always some random variability
- we are left with inter-item relationships |
|
|
Term
|
Definition
observered score=true score + measurment error
- it is the difference between the variable and how it is represented in measurment
|
|
|
Term
Assumptions regarding error in CMT
|
|
Definition
- amount of error varies randomly
- error terms uncorrelated across items
- error terms not correlated with the true score of latent variable
|
|
|
Term
Relationship between item score and true score (path coefficients)
|
|
Definition
- the cross product of path coefficients is equal to the correlation between the items
- standardized path coefficients represent the strength of the causal relationship between latent construct and items |
|
|
Term
Relationship between item score, true score and error and CMT parallel assumptions |
|
Definition
- amount of influence from the latent construct to each item is the same
- eac item is assumed to have the same amount of error
- we know these things aren't true |
|
|
Term
Congeneric model (Joreskog, 1971) |
|
Definition
- all items share a common latent variable
- latent variable need not exert same strength of relationship to all items ( path coefficients can be different )
- error variances can be different
- this allows for more accurate representation of magnitude of latent variables on items |
|
|
Term
Definition of reliability |
|
Definition
- proportion of varaince attributable to the true score of latent varaible ... the consistency of measurement
- also consistency of measurment
- can be measured by internal consisitency (coefficient alpha)
- temporal stability (test- retest correlation)
- alternate forms reliability
- split- half reliability |
|
|
Term
|
Definition
the homogeneity of items in a measure |
|
|
Term
|
Definition
Proportion of scales total variance attributable to a common source. (presumed to the true score of latent variable). Need total variance and common variance (covariance matrix) |
|
|
Term
|
Definition
|
|
Term
|
Definition
ratio of common-source variation to total variation in scores.
Look at communal and non-communal. |
|
|
Term
Coefficient alpha formula |
|
Definition
(k/k-1) x (1- Sum of Variance of items (Diagonal- noncommunal) / Sum of total variance. )) |
|
|
Term
Value of coeffiecient alpha that is considered acceptable |
|
Definition
|
|
Term
|
Definition
- Test-retest reliability
- Assumptions: varaible is stable over time and there is no error associated with time of measurment
- Possible moderater involvement
|
|
|
Term
|
Definition
- Inter-rater reliability
- Two raters rate independently
- 2x2 matrix
|
|
|
Term
Kappa Coefficient 2x2 matix formulas |
|
Definition
Rater 1
+ -
Rater 2 + 31 6
- 11 52
Pr(a)- proportion of agreement among raters (look at diagonal)
Pr(a)= .31 + .52= 0.83
Pr (e)- hypothetical chance agreement
Pr (e)= (.42)(.37) + (.58)(.63)
0.16 + 0.37= 0.53
k= Pr(a) - Pr(e)
1 - Pr(e)
= .83-.53
.47
= 0.64
|
|
|
Term
Kappa level interpretation |
|
Definition
- Greater than 0.7 = acceptable
- 0.40 to 0.59 = moderate level
- 0.60 to 0.79 = Substantial
- Greater than 0.80 = excellent
|
|
|
Term
Importance of reliability |
|
Definition
- Validity coefficients is constrained by reliability
- increased reliability = increased power |
|
|
Term
|
Definition
- want difficult items on a test- should distinguish b/w high and low on characteristic being measured
- (want it around .5 unless dichotomous)
- average item difficulty slightly below midpoint between 1.0 and chance level
- should have range of item difficulty
|
|
|
Term
Item-discrimination index |
|
Definition
di= item discrimination index =
(nhi/hi) - (nli/li)
Nhi= no. of persons in high scoring group that passed item
hi= no of persons in high scoring group
nli= no of persons in the low scoring group that passed item
li= no of persons in low scoring group
|
|
|
Term
|
Definition
Relationship between item and rest of items on test
The item-total correlation test arises in psychometrics in contexts where a number of tests or questions are given to an individual and where the problem is to construct a useful single quantity for each individual that can be used to compare that individual with others in a given population. The test is used to see if any of the tests or questions ("items") do not have responses that vary in line with those for other tests across the population. The summary measure would be an average of some form, weighted where necessary, and the item-correlation test is used to decide whether or not responses to a given test should be included in the set being averaged. In some fields of application such a summary measure is called a scale.
The test
An item-total correlation test is performed to check if any item in the set of tests is inconsistent with the averaged behaviour of the others, and thus can be discarded. The analysis is performed to purify the measure by eliminating ‘garbage’ items prior to determining the factors that represent the construct;[1] that is, the meaning of the averaged measure.
It is supposed that the result for a particular test on a given individual is initially used to produce a score, where the scores for different tests have a similar range across individuals. An overall measure for an individual would be constructed as the average of the scores for a number of different tests. A check on whether a given test behaves similarly to the others is done by evaluating the Pearson correlation (across all individuals) between the scores for that test and the average of the scores of the remaining tests that are still candidates for inclusion in the measure. In a reliable measure, all items should correlate well with the average of the others.
A small item-correlation provides empirical evidence that the item is not measuring the same construct measured by the other items included. A correlation value less than 0.2 or 0.3 indicates that the corresponding item does not correlate very well with the scale overall and, thus, it may be dropped.[2][3] |
|
|
Term
|
Definition
Does the test measure what it purports to measure
The meaning of the scores produced by an instrument |
|
|
Term
|
Definition
Content validity-
Criterion Validity-
Construct validity- |
|
|
Term
|
Definition
degree to which elements of an assessment instrument are relevant to and representative of the targeted construct for an assessment purpose
Scale has content valididty when its itmes constitute a randomly selected subset of items drawn from the universe of items |
|
|
Term
Importance of content validity |
|
Definition
final content of a scale ultimately will determine its reliability and the degree to which other forms of construct validity are established |
|
|
Term
How is content validity compromised? |
|
Definition
- items are omitted that reflect important facets
- items measuring facets outside of domain are included
- aggregate score is disproportionately influenced by any one facet
|
|
|
Term
|
Definition
an item of scale that has empirical association with some criterion or gold standard
also called predictive validity |
|
|
Term
|
Definition
- Directly concerned with the theoretical relationship of a variable (test score) to other variables. It is the extent to which a measure accurately refelcts the way that the construct it purports to measure should behave relative to measures of other related (and unrelated) constructs |
|
|
Term
hypothesized relational magnitdues of a multi-trait multimethod matrix |
|
Definition
Same trait- same method (reliability)- should be strongest
Same trait, different method- should be a bit lower (validity)
Different trait- different method- should be lowest
|
|
|
Term
convergent and divergent construct validity |
|
Definition
convergent- measures that assess same construct should be highly correlated
divergent- measures that asses different constructs should not be highly correlated |
|
|
Term
|
Definition
- used instead of correlation coefficient for accuracy
- 2x2 matrix with measure vs gold standard
- Diagnose each respondent (dichotomize)
- Dichotomize test scores
|
|
|
Term
|
Definition
Test Result
+ -
Present TP(hit) FN (miss)
Diagnosis
Absent FP (false alarm) TN (true negative) |
|
|
Term
|
Definition
Probability of having a positive test result among those who really should be positive (with disorder)
Hits
Hits + Misses |
|
|
Term
|
Definition
The probability of having negative test result among those who really dont have disorder
TN
TN + Misses |
|
|
Term
Positive Predictive Power
|
|
Definition
The probability of having disorder among those whith a positive test result
Hits
Hits + false alarms |
|
|
Term
Negative predictive power |
|
Definition
the probability of not having disorer among those with a negative test result
TN
TN + Misses |
|
|
Term
|
Definition
Same as efficiency
How efficient is the test, overall, in accurately detecting presence and absence of pathology
Hits + TN
all cases |
|
|
Term
|
Definition
P
P' = Total - P (compliment)
Prevalence of diagnosis (in signal detection)
=
TP + FN
N |
|
|
Term
|
Definition
the average variance of your sampling distribution
To compute: You can find for TN, TP, FP, FN, EFF, Q, P
√ (TN)(TN')
N |
|
|
Term
|
Definition
Q
Q'= Total no. - Q
How many positive and negatives of test (in signal detection matrix)
= TP + FP
N |
|
|
Term
St. error of specificity needs to be unbiased |
|
Definition
|
|
Term
|
Definition
evaluator descides how large a sample to gather (N) and takes a random (or representative) sample of that size from the population of interest- each patient receives a diagnosis and a test
Unbiased estimator |
|
|
Term
|
Definition
a representative sample of size No is drawn from the populat and each person is diagnosed. This is the screening sample. The a random sample size of N1 is drawn from the among those in the screening sample with a positive dx and a random sample size of N2 from among those with a negative diagnosis. Both samples must have min. no. of 10 people.
This is a biased estimator |
|
|
Term
|
Definition
a representative sample of N0 patients is drawn from the pop., "the screening sample" and each patient is tested. Then a random sample of N1 with a positive test and N2 with a negative test are drwan. These two groups then receive a dx. The proportion of paitents with a positive test 1-Q provides an unbiased estimator of the level of the test, Q |
|
|
Term
Benefits of retrospective sampling |
|
Definition
-More powerful
- power is maximized if outcome robability = .5
|
|
|
Term
|
Definition
As cutoff changes for statistics of a test, the values associated with them will change as well.
Sensitivity and specificity have a negative relationship.
The line is chance level- greater distance is better- this is the ideal place for sensitivity and specificity tradeoff.
Bigger distance=better test
0.5 area is significant difference |
|
|
Term
|
Definition
The degree to which a measure explains or predicts some phenomena of interest relative to other measures |
|
|
Term
How to decide to refine to refine or create a new instrument |
|
Definition
- IRT to see poorly performing items
- Internal consistency?
- Kappa coefficient?
Item-factor loadings (>0.4)?
Look at the proportion of items performing poorly. |
|
|
Term
Goals of the analyses for examining incremental validity |
|
Definition
- estimate relative proportions of variance for your measure and others
- estimate unique variance
- examine interaction (moderator) efffects associated with sex, age, SES etc... |
|
|
Term
What to look for in a correlation matrix for incremental validity. |
|
Definition
- degree of colinearity or shared variance among predictors
- strength of association between each predictor and criterion |
|
|
Term
How to do the data analysis for incremental validity. |
|
Definition
- forced step-wise regression
- enter comparison measures
- enter new measure
- difference in R2 is the index of incremental validity
- R2 is variance accounted for- singularity is really bad- that means you aren't explaining any more variance |
|
|
Term
Null hypothesis for an EFA |
|
Definition
1 factor accounts for all patterns of correlation |
|
|
Term
|
Definition
Sum items = estimate of latent construct
- compute item total correlations (obtained set)
- compute projected inter-item correlations
subtract projected inter item correlations and item total correlations
With a good model- they should be the same
We get beginnings of a residual matrix |
|
|
Term
Purposes of Factor analysis |
|
Definition
- to identify latent structure of an assessment instrument
- item refinement and scale development
- relationship to content validity and construct validity of instrument
(if your not explaining varaince-> you are not capturing facets of constructs) |
|
|
Term
|
Definition
more theoretically driven, testing explicit predictions
- trying to see if model reflects real life (goodness of fit) |
|
|
Term
|
Definition
- need at least 5-10:1 ratio
- N>125
|
|
|
Term
|
Definition
- independnet
-noncorrelated |
|
|
Term
|
Definition
-correlated and dependent |
|
|
Term
|
Definition
When first doing CFA- you get a communality of 1 across all items
you need to regress item on all remaning items to get estimate of R2- which serves as an estimate of communality |
|
|
Term
|
Definition
In factor analysis- SPSS will extract factors with eigen values over 1, however, other measures can be used, such as a scree plot. |
|
|
Term
|
Definition
Look for elbow
- elbow is how many factors you are extracting |
|
|
Term
|
Definition
- can handle measurement error
- can handle non-measurement error
- can reject models
- enables advanced treatment of mising data (full info max lilklihood) |
|
|
Term
|
Definition
- covariance matrix or correlation matrix |
|
|
Term
Assumptions of data of SEM |
|
Definition
- they are intervally scaled
- they have a multivaraite notmal distribution
- have sufficient sample size- need more for complex models
- at least 100 to 150 |
|
|
Term
|
Definition
- effect of latent variable on the measure; if a measure loads on only one facotr, the standardized loading is the measure's correlation with the factor
- this can be interepeted by the square root of the measures reliability |
|
|
Term
|
Definition
- the varaince in the measure isnt explained by latent variable
- this doesnt mean its randon, just that its unexplained by latent variable |
|
|
Term
|
Definition
- the casaul and correlation links between latent variables
|
|
|
Term
exogenous variable and endogenous variable |
|
Definition
exogenous- not caused by another variable in model
endogenous- caused by one or more variables in the model |
|
|
Term
|
Definition
mean of 0 and varaince of 1 |
|
|
Term
|
Definition
a variable in model that is not measured |
|
|
Term
Components of a correlation
decomposing a correlation |
|
Definition
Direct effect
a------------------------>y
indirect effect
a---------------->b-------------------->y
spurious effect (common cause)
Y<------B---->A
Unaalyzable components (correlated causes)
B---->Y
(
C--->A
|
|
|
Term
|
Definition
- correlation b/w any varaibels equals the sum of the products of the paths or correlation from each tracing.
- when we trace we need to set the varaince at 1, otherwise we have more unkowns than knowns in equation- makes it impossible |
|
|
Term
|
Definition
1) specification- describe nature of relationships among variables
2) Identification- can in theory and in practice be estimated with observed data
3) Estimation- the models parameters are statistically estimated from data
4) Model fit- the estimated model parameters are used to predict the correlations or covariances between measured variables and the predicted correlations or covariances are compated to the observed correlations or covariances |
|
|
Term
Things that effect chi-square |
|
Definition
Sample size- higher is easier to get significant results (bad in SEM)
Correlation size- if higher in the model, the poorer the fit (significantly different) |
|
|
Term
comparative fit index
incremental fit index
goodness of fit index
adjusted GFI |
|
Definition
CFI- takes into account for sample size (can be small)
IFI- compare model with saturated model and independent model
GFI- calculates proportion of variance estimated by covaraince matrix
AGFI- does same but with saturated model (rewards for parsimony)
|
|
|
Term
|
Definition
Root mean squared error of approximation
- model misfit
- lower the better
min of 0.07 or lower
|
|
|
Term
Model evaluation
(theoretical, technical, and statistical) |
|
Definition
theoretical- appropriateness of general causal structure, right variables and does it fit with previous findings
- technical- identification status, appopriate estimation method, and apprpriatness of instrumental variables (coninuous)
- Statisical - reasonable parameter values, good coefficients, endogenous are well explained, and model of fit |
|
|
Term
Model ID
(define and min. condition) |
|
Definition
Said to be ID when exists a unique solution (more knowns than unknowns) |
|
|
Term
|
Definition
Everythign correlates with everything- knowns and unknowns are same- 0 df |
|
|
Term
|
Definition
cannot offer unique solution, more unknowns than knowns |
|
|
Term
|
Definition
more knowns than unknowns |
|
|
Term
empirical underidentification |
|
Definition
model is theoretically identified but unstable estimates- example is high collinearity in a model |
|
|
Term
|
Definition
the ability of overid model to reproduce the variables correlation or covariance matrix |
|
|
Term
|
Definition
-Items arranged on a contiuum to measure one attribute
- Passing an item implies greater possession of attribute
- Items differ in level of difficulty but measure same attribute |
|
|
Term
|
Definition
Strength of construct association |
|
|
Term
ICC (item characteristic curve) components |
|
Definition
A- slope (more steep the better)
B- item difficulty- where it lies on theta
C- pseudo guessing parameter (random prob of guessing right answer) |
|
|
Term
|
Definition
Steeper the slope, the better job at discriminating between high and low levels of construct at that particular theta level |
|
|
Term
|
Definition
Indicate rate of false positives |
|
|
Term
Dichotomous models (1pl, 2pl, 3 pl) |
|
Definition
1 parameter logistic model- A
2 Pl- A+B
3 Pl- A+B+C (rarely used) |
|
|
Term
Diff in IRT And CMT (st error) |
|
Definition
CMT- st error is constant across scale
IRT- st error of measurment differs across scores but generalizes across populations |
|
|
Term
Diff in IRT and CMT (length of test) |
|
Definition
CMT- longer is more reliable
IRT- not necessarily |
|
|
Term
Diff in cmt and IRT (comparisons) |
|
Definition
CMT- meaningful scale scores are obtained by comparisons of position in score distribution (distance from mean)
IRT- meaningful scale scores are obtained by comaprisons of distances from various items |
|
|
Term
Diff in CMT and IRT (mulitple forms) |
|
Definition
CMT- comparing test scores across multiple forms depends on parallelism and adequate equating
IRT- comparings cores from multiple forms is optimal when test difficulty levels vary across persons |
|
|