Term
|
Definition
The degree to which a study/test or manipulation measures the thing that it claims to be measuring. Dependent on many things, like the soundness of the operational definitions used, how reliable the DV measurements are, how valid the DV measurements are, etc. |
|
|
Term
|
Definition
The degree to which a study establishes that a factor causes a difference in behavior. Heavily linked with design issues; to what extent is the IV unambiguously linked or related to the DV, how good was the control, were their extraneous variables to create bias, etc. |
|
|
Term
|
Definition
The extent to which results can be generalized beyond the settings, participant group, etc of the study. |
|
|
Term
Self report data collection |
|
Definition
Ways of collecting data by using instruments that require the participant to give direct answers to study questions. Things like questionnaires, interviews, scales, etc are often used. Advantageous in that some things can only be accessed by the self (internal states, etc) and offers a unique perspective. Disadvantageous in that memory can be easily manipulated and we tend to be confident in false memories, could lead to demand characteristics or the ps figuring out the hypothesis, etc. |
|
|
Term
|
Definition
Allows participants to give a brief response rather than choosing from a given set of options. Pro: Allows for more flexibility and depth of answers. Con: Hard to code. |
|
|
Term
Multiple choice questions |
|
Definition
provides discrete categories, grouped values from a continuum, etc. |
|
|
Term
|
Definition
Provides participant with pre-made checklist that they can choose options from. |
|
|
Term
|
Definition
Provides participant with two options. Ex) I like being around other people or I don’t like being around other people. |
|
|
Term
|
Definition
deals with the length and complexity of items. How in depth should a questions be, should it be brief or more involved, etc. Brief frames are positive in that you don’t bog the ps down with lots of details that could confuse them, but pros to more involved frames being that the participants has a better idea of what you are actually asking. |
|
|
Term
Double barrelled questions |
|
Definition
When more than one concept is asked about in a question. Ex: How do you feel about universities allotting more money to student’s physical and mental health? These kinds of questions are fatal to a questionnaire, because we can never be sure which concept the participant is answering about. |
|
|
Term
|
Definition
Questions that use a positive or negative phrasing that leads the participant to the answer that we want. Ex: You don’t approve of the horrible policies that Scott Walker is trying to push through, do you? |
|
|
Term
|
Definition
Use neutral phrasing that does not indicate what answer the experimenter would like to hear. Balanced questions are especially needed when questions are socially desirable. Ex: Most people don’t actually follow the speed limit; how fast would you go in a 55 mph zone? |
|
|
Term
|
Definition
Different methods for sequencing of questions within sections. The funnel sequence begins with an overall assessment and filters down to more specific questions. The inverted funnel starts with more specific questions and ends with broader concepts. These introduce potential problems in that there could be bias due to priming consistency issues. |
|
|
Term
Branching/contingency questions |
|
Definition
Questions that build off of one another, so if one question is not applicable to the participant, these could be a dozen others that they cannot answer. These are best to avoid, since they lead to excluding certain participant’s from answering a significant portion of study material. |
|
|
Term
|
Definition
Designed to study the same construct from various perspectives. |
|
|
Term
|
Definition
Typically 5 to 7 point scales asking things like “how much do you agree with this statement, 1 being strongly agree and 7 being strongly disagree, etc”. Allows for item analysis to see how much each item was actually related the construct being assessed. Also allows us to add in questions from other scales (like social desirability scales) to see how much our construct is related to other concepts. |
|
|
Term
|
Definition
Questions based on an 11 point appraisal system, and questions are rated on how much they endorse a certain attitude. Participants are then given a set of questions and asked to check which ones they agree with, and from that experimenter is able to find a means and variability of participant’s attitude. |
|
|
Term
|
Definition
Score used in the Thurstone scale method that indicates how much a statement corresponds with a certain attitude. |
|
|
Term
|
Definition
A method using bipolar adjectives and ask participant to place a check mark on a 5, 7 or 9 point interval. Ex: Rate puppies on the following scales:
Cute _ _ _ _ _ _ _ _ _ Ugly. Factor analysis typically used with this method and dimensions of meaning, like evaluation(characteristics like good/bad, beautiful/ugly, etc), potency (hard/soft, rough smooth, etc), and activity (fast/slow, hot/cold, etc) typically arise. |
|
|
Term
|
Definition
A participant’s score is equal to their true score plus contaminants. |
|
|
Term
|
Definition
A true score truthfully reflects a participant’s attitude. |
|
|
Term
|
Definition
Error is random à it could be anything from reading a recent article that would change their answer (that other participants’ did not read) to being bored, tired, etc. Bias is more fatal to findings because it is caused by more universal factors that influence most of the participant’s answers (e.g. social desirability) |
|
|
Term
|
Definition
Error threatens reliability. |
|
|
Term
|
Definition
This is part of error. Error is static in the results, it typically doesn’t influence the scores in a particular direction, it merely adds interference. |
|
|
Term
|
Definition
High amounts of error threatens power, lowering ones chances at finding significant results. |
|
|
Term
|
Definition
Bias threatens the validity of results because the results will not be an accurate representation of the construct. |
|
|
Term
|
Definition
Influences the results in a certain direction that is not representative of the contrast. |
|
|
Term
|
Definition
We want to hear the signal, but noise and static get in the way. |
|
|
Term
Influences on consistency |
|
Definition
true changes in participant on characteristic being assessed, changes in personal characteristics, extraneous situational factors, differences in administration of measure, in experimenter/observer, in sampling of items, in interpretation on instruments by ps, carry over effects, and difference in recording responses. |
|
|
Term
|
Definition
changes in scores that actually indicate a difference in attitudes or behavior. |
|
|
Term
|
Definition
Concerned with how cohesive responses are with the measure. This is why many items are used in a measure. Want items to be homogenous. |
|
|
Term
|
Definition
Used to assess internal validity. Assesses whether an item corresponds with the construct. Provides a correlation among items. |
|
|
Term
|
Definition
We want items to be homogenous because they should be assessing the same thing. |
|
|
Term
|
Definition
Correlations among items. We would hope for high positive or negative correlations (depending on whether the items are reverse coded or not.) |
|
|
Term
|
Definition
Shows correlation of item with construct as a whole(?) |
|
|
Term
|
Definition
the effect of treatment is underestimated because the DV is not sensitive to psychological states above a certain level. All the scores essentially max out. |
|
|
Term
|
Definition
The effect of treatment is underestimated because the DV artificially restricts how low the score can be. |
|
|
Term
|
Definition
A coefficient of reliability. Used to judge consistency or reliability of a test. |
|
|
Term
|
Definition
Method where you take half of the items on the test and calculate the score for each participant and see how that score correlates with the score of the second half. |
|
|
Term
Kuder-Richardson 20 (KR20) |
|
Definition
Essentially Cronbach’s alpha for dichotomous items (e.g. true false, multiple choice, etc). |
|
|
Term
|
Definition
Showing that results will be stable over time. This can be assessed through repeated measures. |
|
|
Term
|
Definition
Potential source of error in study. Things that “carryover” into participants responses (e.g. fatigue, false memory, etc.) |
|
|
Term
|
Definition
Essentially stability reliability. Administering two tests at different time periods and and seeing whether scores are consistent or not. |
|
|
Term
|
Definition
Using two slightly different versions of assessment to see how they correlate. |
|
|
Term
Inter-observer reliability |
|
Definition
Using two observers and seeing how well the two different observation reports correlate. |
|
|
Term
|
Definition
r is a literal correlation (found through a test-retest, etc). r squared is the percentage of variance linked with the construct (NOT due to random error). It might be slightly more meaningful, but typically both are reported. |
|
|
Term
|
Definition
a new reliability value calculated by the number of times longer times the original reliability divided by 1 plus the number of times longer minus original reliability times the original reliability. |
|
|
Term
|
Definition
Lengthening the test will increase the reliability. |
|
|
Term
|
Definition
A wider range of scores creates greater reliability since it is easier to differentiate whether differences are significant or not. |
|
|
Term
Validity of data collection |
|
Definition
Concerned with relating the operational definition to the construct. |
|
|
Term
|
Definition
Concerned with whether items reflect the construct that is being studied. Involves evaluation by experts and coverage of the domain of the construct. |
|
|
Term
|
Definition
Essentially the same as construct validity, concerned with whether the items reflect the construct. |
|
|
Term
Criterion related validity |
|
Definition
There has to be a standard against which to compare; the criterion for comparison must be valid. |
|
|
Term
|
Definition
It is hard to determine what criterion is appropriate for comparison. |
|
|
Term
|
Definition
Concerned with the results ability to predict future results and events. This is dependent on the length of time to predict over (it is easier to predict outcomes a month from now as opposed to 5 years from now). |
|
|
Term
|
Definition
Looks for high convergence between results of two different measures (but measures of the same construct) administered at the same time. |
|
|
Term
|
Definition
The degree to which a study/test or manipulation measures the thing that it claims to be measuring. Dependent on many things, like the soundness of the operational definitions used, how reliable the DV measurements are, how valid the DV measurements are, etc. |
|
|
Term
|
Definition
In regards to construct validity. Construct validity is made up of a network of anticipated associations. |
|
|
Term
|
Definition
Comes from the idea of “pulling someone up by their bootstraps: You cannot pull yourself up higher than your criterion used in research |
|
|
Term
Multi-trait multi-method approach |
|
Definition
Using multiple different traits or methods to assess a construct. Tests to see whether a developed measure shows convergence with other methods that are theoretically related. Developed by Campbell and Fiske. |
|
|
Term
|
Definition
Reliability demonstrated by showing that the measure correlates with other measures of the construct. |
|
|
Term
|
Definition
Validity demonstrated when a measure does not correlate highly with an unrelated construct. Makes sure that the measure isn’t measuring a different construct and can differentiate different constructs. |
|
|
Term
|
Definition
the main diagonal in the correlation matrix. Values reported could be any reliability (cronbach’s alphas, kappa coefficients, test-retest, split half, etc). The steeping stone values to validity |
|
|
Term
|
Definition
You want your measures to be sensitive enough to detect differences in the measure. The level of sensitivity is dependent on the goal of the research, since it is a value judgment. There is a continuum:
Diagnostic categories (present/not) à a few categories (SES) à continuous measures (Anxiety).
|
|
|
Term
Sensitivity for IV vs. DV |
|
Definition
It is best for DV measures to be more sensitive (better for detecting differences) and for IV’s is depends on the type of analysis. It is more ok for and IV to have a low sensitivity measure than a DV, typically. |
|
|
Term
Skew/ceiling/floor effects |
|
Definition
These problems arise when scales or measures with little spread are used |
|
|
Term
|
Definition
Audience is a determinant for level of sensitivity for a measure. Certain measures may be sensitive enough for a high school senior class, but not a bunch of PhD. candidates. |
|
|
Term
|
Definition
Level of discriminability should be high in order to detect differences in the results. Determines the threshold of difference required for significance. |
|
|
Term
|
Definition
Data analysis that looks at specific factors of a measure and analyses them for significance. E.g. Shyness measures may be significant while eye contact is not in a measure of introversion. |
|
|
Term
|
Definition
A data analysis test that produces a plot to show how many items should be removed to increase reliability of a measure (?) |
|
|
Term
Observational data collection |
|
Definition
Data collection process involving observation that can be used in most research designs and has differing levels of participant awareness. |
|
|
Term
Design vs. data collection |
|
Definition
Design is a broader concept, involving whether research using manipulations, etc. Data collection is the way which a researcher goes about collecting data from participants, which isn’t necessarily linked to design. |
|
|
Term
Reactivity vs. non-reacivity |
|
Definition
A non-reactive method of data collection is unobtrusive and subtly; not evident to the participant. A reactive method of data collection is obtrusive and fairly obvious to the participant. |
|
|
Term
|
Definition
Observer has minimal interaction with participants. This could take a variety of forms; the observer could be hidden, a confederate/disguised, have an unnoticed camera. Ethical concerns arise since there is no informed consent regarding the observer , etc. |
|
|
Term
|
Definition
The observer joins the group being studied. E.g. joining a religious cult, etc. Potential problem: could be dangerous for observer if they are found out, no informed consent, very reactive, hard to gain access, etc. |
|
|
Term
|
Definition
The observation of overt behaviors that are visually noticeable. Could be physical characteristics, social interaction, portrayal of emotions, etc. |
|
|
Term
Comtemporary observation vs. archival observation |
|
Definition
Contemporary observation is a data collection technigue involving the present, ongoing observation. Archival observation is a data collection technique that involves research of past behavior through historic records, legal records, etc. |
|
|
Term
|
Definition
Macro observation involves behaviors that are more obvious, whereas micro observation is harder to detect. |
|
|
Term
|
Definition
nvolves vocalizations and auditory observations. Specifically investigates voice tone, choice of words, who is speaking, grammar, etc. |
|
|
Term
|
Definition
Things like sound, temporal components, interaction, stylistic features, etc. |
|
|
Term
|
Definition
The destruction or depletion of something which indicates past behavior. Used in archival observation. (Ex: looking at broken bindings in books as an indicator of how well used they are.) |
|
|
Term
|
Definition
The accumulation of something which indicates a past behavior. Used in archival research. Ex: graffiti, litter, etc. |
|
|
Term
|
Definition
Information collected for some other purpose that can be analysed for psychological research (e.g. government records, private records, mass media, etc) |
|
|
Term
|
Definition
errors due to missing some data that would have been helpful, typically due to sensory limitations or quantitative limitations. |
|
|
Term
|
Definition
The number of objects that a human can hold in working memory is 7 plus or minus 2. |
|
|
Term
|
Definition
there are three stages of cognitive processing or information. Most information does not make it past stage one processing, which is largely unconscious, sensory related, very brief, etc. Info then goes through a filtering/selector system and stage two processing involves a more manageable amount of info that is more controlled and attentive. Mostly conscious, meaningful, involves working memory and the info can be maintained longer. |
|
|
Term
|
Definition
Decreases the amount of info so that the info that is consciously processed is more meaningful and understandable. |
|
|
Term
|
Definition
The amount of information that can be processed in working memory.
|
|
|
Term
|
Definition
Mistakes involving the claim of something being there that wasn’t. Could happen through coding errors, errors of inference, psychological processes, etc. |
|
|
Term
|
Definition
Involves interpretation of thing being observed. Something could be coded as a smile when it was in fact a smirk. Solid operational definitions are important for minimizing errors of inference. |
|
|
Term
|
Definition
Some behaviors come with universally good or bad judgments. Part of the errors of commission. |
|
|
Term
Archival observational biases |
|
Definition
Original recording or documentation could have errors, definitions change over time, there could be irregular accretion or erosion that is not indicative of typical behavior, there could be contamination in the measure, or there could be errors in the links between the observation and the construct (construct validity issue). |
|
|
Term
|
Definition
things like medical equipment, recordings, etc. These things expand the observable range and increase the level of detail and intensity. |
|
|
Term
|
Definition
Rotating sampling after a fixed amount of time. |
|
|
Term
|
Definition
The rotation of things to observe, or have multiple observers assigned to observing different behaviors. |
|
|
Term
|
Definition
Helps avoid observer bias. Observer is asked to spend a certain amount of time watching a specific individual and is then asked to switch to a new individual. |
|
|
Term
|
Definition
A “full report” can sometimes be seen as a better depiction of behavior, though it is hard to get the full picture. Ex: Barker’s book, “One day in the life of a boy”. Complete coding typically leads to having far too much information and just makes it harder to understand the construct. |
|
|
Term
|
Definition
Option 1: massive recording that can be reduced later.
Option 2: Reduction on the spot – cannot record. Using original experts. |
|
|
Term
|
Definition
Asks whether two observers observations correlate with each other. Typical way to assess this is to have two independent observers and code the agreement between them. |
|
|
Term
|
Definition
The calculated proportion of how much the observations of two independent observers agree with each other. |
|
|
Term
Pearson r reliability index |
|
Definition
correlations are calculated for observer/observer agreement and these values serve as a reliability index of the observations. |
|
|
Term
|
Definition
To resolve observational disagreements, they can be accepted and find an average for the ratings, they could discard the subjects, explore the cause of the disagreement (though this is hard), negotiate with the judges (is this feasible?) or a third judge could be added. |
|
|
Term
Self-report/observer agreement |
|
Definition
|
|
Term
Advantages/disadvantages of observational vs self-report data collection |
|
Definition
-Disadvantages: observers affect participants and their behavior, observers threaten anonymity and there is no informed consent, there can be errors of commission and omission, operational definition may not be great – leading to disagreements of observations, etc.
-Advantages: Having an observer minimizes self-report biases. Distinguishes between attitude and traits, is less reactive, gives unique access to behavior and internal states via behavior, and can detect things that participants are unaware of. |
|
|
Term
|
Definition
Used MBA students as participants, asked them to be a member of a 6 person committee. Observers recorded how many times they did 34 different behavioral acts (shouting, bringing the members back to order, etc) and then asked ps to give self-reports of their own behavior. Observer/observer agreement was good, observer/self-report agreement was not so good, though there was a high correlation between observed socially desirable traits and self-reported socially desirable traits. |
|
|
Term
|
Definition
In Gosling article, 34 different behaviors were coded in participants. |
|
|
Term
|
Definition
Had a low correlation with observer reports. |
|
|
Term
|
Definition
Observability, base rate, and desirability (extraversion traits). |
|
|
Term
|
Definition
Self-serving is not universal (at least not in Gosling article), but desirable acts were over reported 57% and underreported 24%. Self-serving bias is when desirable things are over reported in order to make the self look more favorable.
|
|
|
Term
Reactivity of observation and self-report |
|
Definition
Self-report is always reactive since participants typically figure out what is being assessed. Observers can be less reactive is they are disguised, hidden, etc. |
|
|
Term
Reactivity of observation and self-report |
|
Definition
Self-report is always reactive since participants typically figure out what is being assessed. Observers can be less reactive is they are disguised, hidden, etc. |
|
|
Term
|
Definition
A disadvantage of observational data collection. There is a danger that internal processes could be missed and that coverage of some important topics will be minimalized, as well as a strong focus soley on the observable/behavioral. |
|
|
Term
Attitude/behavior discrepancy |
|
Definition
An advantage of observational data collection. Someone may prescribe to an attitude, but not follow through behaviorally, for example, people may say that they are charitable, but never give money. |
|
|
Term
|
Definition
Another advantage of observational data collection; self-report is typically retrospective and relies heavily on memory. Many times our memories are false or biased and observational studies won’t have this issue. |
|
|
Term
|
Definition
Sometimes self-reports are good at detecting internal things (since we have unique access to our own thoughts) but many times our insight to internal processes are poor and behavioral measures can sometimes be better at distinguishing emotions, etc (assuming that behavior reflects internal states). |
|
|
Term
Participant error vs. bias |
|
Definition
Error results from random momentary differences in mood, interpretation, etc and bias is caused by consistent factors like social desirability, self-serving bais, narcissism, etc. |
|
|
Term
Experimenter error vs. bias |
|
Definition
Errors results from random interactions with participants, friendliness, instructions, etc, whereas bias comes from consistent difference like experimenter expectations, overload, changing of standards, etc. |
|
|
Term
|
Definition
Error results from random slight differences in lighting, volume, tools, etc whereas bias comes from consistent things like reactivity, leading questions, archival distortion, etc. |
|
|
Term
Situational error vs. bias |
|
Definition
Error results from random factors like distraction, time of day, etc whereas bias come from overload differences, consistent presence of others, etc |
|
|
Term
|
Definition
The desire to answer falsely in the attempt to make the self look more desirable. |
|
|
Term
|
Definition
participants could be nervous about experimenter or tools being used. |
|
|
Term
Corrective tactics for social desirability |
|
Definition
It is beneficial to disguise observer, balance question phrasing, give balanced options for forced choice questions, etc. It is wise to include a social desirability scale to assess how strongly social desirability may be skewing data, or use measures which have low correlations with social desirability or discard participants who have high SD scores. The statistical analysis ANCOVA can be used to assess how much social desirability affects the data and it produces a partial r that shows what the data would look like without social desirability. |
|
|
Term
|
Definition
Participants will many times conform their behavior to what they think the experimenter wants. |
|
|
Term
Overt experimenter influences |
|
Definition
Participants tend to comply very willingly with overt requests of the experimenter, which can be problematic since maybe compliance is not what is looked for in the study. Ex: Orne study à participants continued to work on math problems even when they were told their work would be ripped up when they finished each page and experimenter had left room. Milgram obedience study. However, compliance is typically desired. |
|
|
Term
|
Definition
Participants typically come in to a study with expectations like that they will be “tricked” or they try to form their own personal hypothesis which may cause them to act in ways other than they normally would. Ex: when some participants were legitimately hypnotized, while others were simply asked to fake hypnosis, observers could not tell who was in which condition. |
|
|
Term
other influences on results |
|
Definition
Participants typically come in to a study with expectations like that they will be “tricked” or they try to form their own personal hypothesis which may cause them to act in ways other than they normally would. Ex: when some participants were legitimately hypnotized, while others were simply asked to fake hypnosis, observers could not tell who was in which condition. |
|
|
Term
Demand characteristics vs. Social desirability |
|
Definition
They can compound each other (really bad) or they can work against each other (preferable). |
|
|
Term
|
Definition
Some participants are more prone to answer in the extreme, neutrally, etc and they do this despite the question and there is smaller spread of answers. |
|
|
Term
|
Definition
When participant’s show the tendency to agree despite the question content. |
|
|
Term
|
Definition
When participant’s show the tendency to disagree despite the question content. |
|
|
Term
|
Definition
A cross between different methods (observation and self-report, etc) and shows how much overlap there is between the two assessments. Want these values to be high, shows convergence. Low reliability will cause low validity. |
|
|
Term
|
Definition
A step in factor analysis. used with factor analysis. Intent is to better align the aces with the factor loadings. |
|
|
Term
|
Definition
Indicate the degree to which an individual item contributes to a given factor. Useful in interpretation of the factor. Items with higher factor loading are most related to the factor. |
|
|
Term
|
Definition
looking at existing diagnostic classification and how people within those categories respond differently. Similar to concurrent of convergent validity. |
|
|
Term
Triangulation of operational definitions |
|
Definition
You have two operational definitions of the same construct. Shows how well they are related, how much convergence there is, etc. |
|
|
Term
Triangulation among constructs |
|
Definition
Can have multiple constructs and see how much the different constructs correlate. |
|
|
Term
|
Definition
when calculating reliability between two judges, simply calculating the prop. Of agreement, it could be unequal due to base rate. More likely to agree when base rate is low (?). base rate refers to an inequality in a distribution of choices (marginal values). Inequality in dichotomous judgments can lead to the base rate problem. |
|
|
Term
|
Definition
has a built in correction for unequal base rate.
|
|
|
Term
Generalizing to trait judgments |
|
Definition
Most trait judgments are done through self-report. Does that translate to observable behavior? |
|
|