Term
Evaluating clinically significant change |
|
Definition
There are a lot of problems for relying on statistical significance for evaluating change.
Evaluating clinically significant change has become a large area of research bc of this |
|
|
Term
Outcome measures should be: |
|
Definition
Reliable
Valid
Responsive
The first two are pretty east to meet but you have to be able to detect change (responsiveness) and this is often at odds with reliability |
|
|
Term
|
Definition
produces same result when administered on 2 or more occasions - consistent |
|
|
Term
|
Definition
Measures what it is intended to measure
Indexed by sensitivity, specificity, correlational/regression analyses |
|
|
Term
|
Definition
Ability of an instrument to detect clinically important tx effects
No consensus on how to define or quantify this
No gold standard for summarizing responsiveness
Proliferation of responsiveness statistics are available, but one is not determined to be best |
|
|
Term
|
Definition
|
|
Term
|
Definition
Ability of a measure to change over prespecified time frame
Change in a measure within the context of tx
usually evaluated in pretest/posttest repeated measures design
Measures reliable change over time - kind of the opposite of test-retest reliability (wouldn't detect change) |
|
|
Term
|
Definition
more similar to criterion-related validity (being able to predict a person's standing based on their score on your measure
Extent to which changes in a measure over time relate to corresponding changes in reference measure
Reflects relationship b/t changes in a measure & changes in external standard
Changes in measure are not primary interest, changes in external standard are
Examples: changes in blood pressure (measure) and changes in frequency of heart attacks (external standard)
Measures valid change over time of predictor and criterion |
|
|
Term
What are some indices of change you can use to evaluate the internal responsiveness of a measure? |
|
Definition
Paired t test - test hypothesis that there was no change in avg response over time - Time 1-Time 2/SD/√n
Effect size I (cohen's D) = Time 1-Time 2/SDTime 1 - Conventional Cohen standards = .20 small, .50 moderate, .80 large
Effect Size II - Responsiveness-Tx Coefficient (aka Efficiency Index) - Ratio of observed change and SD of change scores - Pre-Post/SD change score
Effect Size III (Guyatt Index of Responsiveness) - Time 1-Time 2/√2*MSE - Denominator of ES adjusts for spurious changes arising from measurement error - this unique from others - account for msmt error - more popular in other fields |
|
|
Term
Why is Guyatt Index of Responsiveness unique from other measures of internal responsiveness? |
|
Definition
It accounts from measurement error and spurious changes that arise from it |
|
|
Term
Interpretation of Effect Size Statistics |
|
Definition
All stats reflect change in a measure over 2 occasions
Observed change in a measure may not reflect important change in an individual's bx (a social validity issue)
One way to validate change is to compare % of changed participants vs % of unchanged participants (Binomial Effest Size Display - BESD - may be a better way to represent this) |
|
|
Term
Examples of socially valid change |
|
Definition
addresses whether a change in target bx represents a socially important change
- change in academic engaged time and change in ADHD status (does a change in engaged time result or represent a change in ADHD status - that's the more important thing)
Change in aggressive bx and change in conduct disorder status etc |
|
|
Term
What are some indices of change you can use to measure the external responsiveness of a measure? |
|
Definition
ROC (Receiver Operating Curve) Method
Correlation
Regression models |
|
|
Term
ROC Method for external responsiveness |
|
Definition
Sensitivity (measure correctly classified Ss who demonstrate change on external criterion)
Specificity (measure correctly classifies Ss who do not demonstrate change on external criterion)
Provides useful overview of relationship b/t a measure (predictor) and external indicator of change
Major disadvantage is external change criteria must be dichotomized (improved/not improved - sacrifices info on magnitude of change) |
|
|
Term
Correlation index of external responsiveness |
|
Definition
Correlation b/t change scores on predictor and criterion
How well do change scores on a predictor predict change scores on a criterion?
Let X be social skills predictor scores Let Y be academic achievement scores X1-X2=Change score on social skills (Dx) Y1-Y2=Change scores on academic achievement (Dy) Correlate Dx with Dy |
|
|
Term
Regression model of external responsiveness |
|
Definition
Typical regression model: D x =a + byX1+byX2+byX3…..+ error
Allow for multiple predictors of change criterion (changed/unchanged)
Several analyses are possible - Logistic regression - less assumptions, easier to meet - Discriminant function analysis - many assumptions |
|
|
Term
Assessment and analysis of clinically significant change |
|
Definition
Traditionally, change is established by NHST (null hyp stat testing) - comparisions made b/t 2 or more groups before and after tx Researcher tries to reject the null (p > .05) Statistical sig does not equal clinical sig -- stat sig can be obtained by increasing sample size -- clinical sig is more difficult to establish |
|
|
Term
What is clinical significance? |
|
Definition
can refer to meaningfulness of a symptom in diagnosing a disorder (red spots and measels)
Can refer to reduction of risk factors for disease (reduced blood pressure and fewer heart attacks)
in terms of change, refers to the meaning of observed change in an individual |
|
|
Term
Determining clinical significance - 2 things needed |
|
Definition
Amount of change large enough it is not due to msmt error (reliable change)
AND a post-tx level of functioning closer to nonclinical population (cutoff point)
Ways of looking at these: - Reliable Change Index - Cut off point: 3 possible methods |
|
|
Term
Reliable Change Index for evaluating first aspect of clinical sig |
|
Definition
RCI = Posttest – Pretest/√2(Spretest√1 – rtest/retest)2 (squared)
Numerator represents difference score
Denom represents msmt error (based on test/retest reliability)
If RCI/1.96 then a reliable change has occurred p < .05 |
|
|
Term
Cutoff points: 3 methods for establishing that post-tx functioning is closer to nonclinical pop |
|
Definition
1. 2 SD from the mean of dysfunctional pop (in direction of functionality)
2. 2 SD from the mean of functional pop (in direction of dysfunctionality)
3. halfway between the means of the functional and dysfunctional populations |
|
|
Term
|
Definition
c is the recommended cuttoff point
Calculated by: c = S(nonclinical)M(clinical) + S(clinical)M(nonclinical)/S(nonclinical)_S(clinical) |
|
|
Term
Reliable change categories |
|
Definition
Improved: reliable change w/o crossing cutoff point
Recovered: reliable change & crosses cutoff point
Deteriorated: change in the direction of dysfunctionality |
|
|
Term
Things to consider in making sure your clinical sig and outcomes measures are valid |
|
Definition
General measures versus specific measures
Monomethod bias (MTMM logic) - sometimes you get consistent change on one method but not other methods (e.g. child self-report says less anxious but parents and teachers show same rates as before) this doesn't look good for tx
Construct irrelevant variance
Construct underrepresentation
Social validity of outcome measure
Type I measures: Socially valid outcomes (arrest rates, dropout rates, ODRs, retention rates, referral rates) Type II measures: Correlate with Type I measures but not socially valid (DOs, DBRs, ratings by others)
Type III measures: Not correlated with Type I or Type II measures |
|
|
Term
Reliability considerations for clinical sig and outcome measures |
|
Definition
Regression effects = Affected by unreliability = Affected by distance from the mean (tails of the distribution)
Use of difference scores
- Errors of measurement are additive - Pretest score has error & posttest score has error
- Error Pretest + Error Posttest
- Could be solved by using residualized difference scores (regressed change scores based on reliability coefficient) |
|
|
Term
Primary means of evaluating clinically significant change |
|
Definition
|
|
Term
Meanings of clinical significance |
|
Definition
Amount or degree of change
Reduction of most symptoms
Reduction of some symptoms (not in normative range - medium change)
No symptom reduction but better able to cope with symptoms (no change)
Measurement issues - standardized instruments - instruments without norms - actual change vs perceived change (change is in the eye of the beholder) |
|
|
Term
Change matrix for determining |
|
Definition
|
|
Term
Indices of change
Gresham (2005) Cheney et al (2008) |
|
Definition
Absolute change in bx (most liberal)
Percent of Non-overlapping data points (PND)
Percent change from baseline - sensitive to change (but we don't know how large a % change we need to have a clinically significant change
Effect size - sensitive to change
Reliable change index (RCI) - most conservative
Identification of change sensitive bx ratings |
|
|
Term
Ways to identify change sensitive bx ratings scales:
indices of change Gresham (2005) Cheney et al (2008) |
|
Definition
1. Paired sample t tests (p < .01) -
2. Effect size estimates (Time1 - Time2/SD pooled) -
3. Calculate internal consistency reliabilities (coeff alpha) -
4. Apply Spearman-Brown to estimate reliabilities for item reduction (kr/1 + (k-1)r) -
Spearman-Brown estimate (10 items) .5(.90)/1 + (.5-1).90= .45/.60 = .75 - Spearman-Brown Estimate: (5) items: .5(.75)/1 + (.5-1) (.75)= .60 |
|
|
Term
Absolute change in bx index of change |
|
Definition
Most liberal
Amount of change from BL to post-intervention levels of performance
An individual no longer meeting established criteria for ED
Total elimination of bx px |
|
|
Term
Informant discrepancies and change |
|
Definition
Meta-analysis of 119 studies showed diff informants (teacher-parent-child) produce discrepant ratings (r = .20s) (Achenbach et al, 1987)
Informant discrepancies also referred to as level of agreement, informant disagreement, discordance among informant ratings etc
No single measure of social bx is a gold standard
No theoretically relevant rationale provided to explain discrepancies among raters - why do they disagree?
Question: How do we meaningfully compare or combine discrepant ratings?
Impact of discrepant ratings: - Assmt and classification of psychopathology (diff prevalence rates) - Tx of childhood/adolescent psychopathology (meta-analytic outcomes) |
|
|
Term
Correlates of informant discrepancies - child characteristics |
|
Definition
Age (less discrepancies for younger)
Gender (no diff)
Ethnicity/Race (lower agreement among AF-Amer samples)
Social desirability (children rate px bx more favorably than other raters)
Px type (externalizing less discrepant)
All correlations in the low-moderate range |
|
|
Term
Correlates of informant discrepancies - Parent characteristics |
|
Definition
Depression (maternal)
Anxiety (maternal)
stress (few studies)
SES (inconsistent findings)
--- Little attn paid to family characteristics |
|
|
Term
Theoretical framework
ABC Model (attribution-bias-context) |
|
Definition
Actor-observer phenomenon & perspective taking & recall
Actor-Observer Phenomenon - Observers of another person’s behavior attribute causes to dispositional/internal qualities - Teacher may cause of hitting another child to trait of aggressiveness (downplay context) - Child may attribute own hitting behavior to being teased (context emphasized) - Most heavily weighted information are parents/teachers (attribution of traits) -Most behavior rating scales focus on dispositional qualities: ----Shy/timid ----Gets distracted ----Argues - Virtually all items are decontextualized
Perspective taking & recall - individuals recall events to support particular views - individuals ignore events that do not conform to their views - studies extensively in cognitive dissonance research - differences in ratings may result from diff perspectives ----parents want intervention for aggressive bx will likely rate aggressive bx highly - Diff informants access & weight info from memory recall |
|
|
Term
Implications of ABC model |
|
Definition
- Current methods of assmt can be modified to reduce discrepancies - context in which ratings occur is crucial in explaining discrepancies --- teacher's rating ADHD symptoms often discrepant from parent ratings - No single informant's ratings can be used as gold standard - one could build context into extant bx rating scales ----hits others when teased or provoked by peers |
|
|
Term
Ways to evaluate informant discrepancy rates
Indices of informant discrepancies |
|
Definition
Correlational analyses -- Pearson r (correlations among diff informants) -- q correlations (Pearson r applied to differnt informants - teacher-parent) Difference scores - Raw and unstandardized - standardized (convert to z-scores) - residual (use one rater to predict another rater's ratings - regression-based approach) |
|
|
Term
Evidence-based interventions - criteria
APA task force on promotion and dissemination of psychological procedures |
|
Definition
Random assignments of Ss to intervention and control conditions
Careful specification of pop undergoing intervention
Use of a manual detailing intervention
Multiple outcomes measures (raters naive to conditions) - reduces monomethod bias and rater bias
Statistically significant differences b/t intervention and comparison groups
Replication of findings supporting intervention by independent investigators |
|
|
Term
Construct validity and interventions |
|
Definition
construct test measures exists
Variations in measurement outcomes causally produced by variations in construct
Is intervention capable of changing construct
If intervention can change construct, does variation in admin of interventions causally produce changes in measures of outcome?
Examples: - does construct of depression exist? - is depression a multidimensional construct? - do tx for depression exist and can they change depression? - are there diff ways of measuring the depression construct? |
|
|
Term
Categories for classifying evidence for change (Kazdin) |
|
Definition
Best evidence for change
Evidence for probable change
Limited evidence for change
No evidence for change |
|
|
Term
Categories for classifying change (Kazdin)
Best evidence for change |
|
Definition
- At least 80% of findings from multiple informants show significant results
- No pattern of measurement-specific, informant-specific, or method-specific results (change appears on all measures)
- Majority of evidence for change across range of informants, measures, and methods suggest intervention changes the construct |
|
|
Term
Categories for classifying change (Kazdin)
Evidence for probable change |
|
Definition
- More than 50% of findings from multiple informants, measures, & methods show significant results
- No clear pattern of informant-specific, measure-specific, or method-specific results
- Simple majority of evidence for change across range of informants, measures, & methods |
|
|
Term
Limited evidence for change |
|
Definition
- 50% or less of findings across informants, measures, or methods show significant results - Pattern of informant-specific, measure-specific, and/or method-specific results - Sparse evidence that intervention changed the dimension of the construct |
|
|
Term
|
Definition
No significant results reported
No evidence for change - Intervention did not change the construct |
|
|
Term
Meta-analytic results and change |
|
Definition
Average effect sizes in child interventions range between .16 to 1.14
Effect sizes informant specific
Average effect size for agoraphobia ranged from .44 to 2.66
Dependent upon source (clinician ratings versus self-report or performance0
Future meta-analyses should classify change according to categories |
|
|