Threats to Validity

Internal Validity

Was the treatment responsible for the results of the study or was it something else?

External Validity

Threats to Validity

It is extremely difficult, if not impossible, to control for all of the possible complications and events which may affect subjects participating in a research project. However, responsible, ethical methodology dictates that all threats to the validity of a study must be considered and accounted for throughout the design process. While a researcher is expected to account for all of the possible influences which may have altered the results, some are reluctant to do so fearing that the such acknowledgment may discredit their work. Life care planners must be aware of these threats so that we may independently evaluate the efficacy of research studies and draw alternative conclusions from the data where appropriate.

The validity of a research design is evaluated in two ways; the internal validity of the study and the external validity of the study.

Internal Validity

Internal validity evaluates the extent to which extraneous factors, rather than the treatment, may have produced the outcomes of the study. When a researcher designs the methodology to be employed throughout the project, careful consideration is given all factors which may exert an unintended effect and cause subjects to respond differently than they would have otherwise. Internal validity seeks to answer the question:

Was the treatment responsible for the results of the study or was it something else?

1. Sample Selection: Consider the fundamental differences between the control group and the treatment group, or between the subjects who are being compared. There may have been significant differences between these groups from the conception of the project.

How were these groups selected? If subjects were randomly selected and randomly assigned to groups, the threat is decreased. A researcher may administer a pre-test to all subjects then compare the responses of the groups to ensure that they are similar before introducing the treatment or intervention.

2. History: Events may occur during the course of a study that impact the responses of the subjects. For example, a national news story which is closely related to the topic of the study, or a natural disaster occurring in an area where many of the subjects reside may influence their responses to the treatment or intervention.

What did the subjects experience during the course of the study? A researcher may ask all participants to complete the study in an isolated setting or to keep a diary detailing the event of their lives throughout the project. If the subjects were randomly assigned to groups, theoretically, extraneous factors will influence the groups equally.

3. Mortality: Some of the subjects may drop out of the project, move, or be unreachable for follow-up evaluation. This may present difficulty for the researcher if mortality affects groups at different rates. For example, in a two-group study of 50 participants each, if 15 drop out of Group A and only two drop out of Group B the groups may no longer be suitable for comparison. A researcher must try to identify the cause for attrition.

How many subjects dropped out of the study? A researcher may attempt to re-establish contact with subjects or rely upon statistical procedures designed to account for missing data (Campbell & Stanley, 1966).

4. Location: Consider the location and circumstances under which the first sets of data were gathered as compared to the location or circumstance under which the second set of data were gathered. If the situations were different, the setting or circumstance under which data was collected may have influenced the response of subjects (rather than the treatment or intervention).

What were the circumstances under which all sets of data were collected? A researcher should consider the quality of testing environments, similarity among sites, and describe the general testing circumstance.

5. Instrumentation: Changes in the calibration of the testing instruments or equipment, or changes in observers or scorers may affect the data. Life care planners should be aware of the conditions under which measurement instruments were normed, how they should be administered, and the purpose for which they were developed. There is also the possibility of scorer bias, whether conscious or unconscious.

Were the measurement instruments correctly used? A researcher may randomly assign scorers to participant groups, or employ blind or double-blind data collection techniques. A researcher may train and then pre-test scorers so that all are clear as to what is/is not to be tabulated and how the scores should/should not be derived. These pretests can be analyzed for inter-rater reliability and intra-rater reliability before scorers are given the responsibility of data collection.

6. Testing: The “practice effect” of pretesting may influence the outcome of posttests, particularly when the contents of these assessments are closely related. In addition, the contents of a pretest may make subjects more sensitive or responsive to the treatment or intervention.

What effect might the pretest have exerted upon the results of the posttest? A researcher may choose not to administer a pretest. Theoretically, and if the sample size is large enough, this threat should equally effect all groups if the subjects were randomly assigned.

7. Maturation: Particularly in a longitudinal study, changes over the course of the study may be attributable to the effects of time, rather than the intervention or treatment. For example, first graders may respond to project assessments much differently at the end of the school year simply due to maturation effects, rather than the intervention applied over several months. In another example, improvements in cognitive functioning may be a result of natural, biological processes rather than the rehabilitation program instituted by therapists.

Were the effects due to the intervention or to maturation? A researcher may select subjects who are relatively mature or exhibit stability on measures of interest. Also, the duration of the experiment may be limited to control for the effects of maturation, fatigue, or physical changes. Theoretically, if subjects were randomly selected this threat should affect all groups equally.

8. Attitude of Subjects: The approach and mindset of study participants can affect the outcome of the project. For example, subjects may put forth exceptional effort because they know their performance will be evaluated. Or, subjects may feel insulted based upon how they perceive the group of which they were assigned, particularly if the groups are being treated differently beyond the administration of the independent variable. When evaluating research, life care planners should consider whether results were affected by the experience of subjects in the experimental condition or whether results reflect only the influence of the treatment or intervention.

Do the results reflect the subjects’ reaction to the experimental condition or the treatment? A researcher should make a conscious effort to treat all groups the same, aside from the administration of the treatment. Unobtrusive measures may be selected so that scorers are able to observe subjects’ behavior without disrupting the natural circumstances of the environment or calling attention to their task.

9. Implementation: This threat occurs when implementers of the treatment or intervention use different methods in instructing or implementing the independent variable. An implementer may like one intervention better than the others and do a better job of implementing it. For example, if a study was designed to examine the effects of a new teaching method, an implementer who preferred the traditional method may not teach the experimental method as well.

Could the implementer have influenced the results of the study? A researcher may randomly assign implementers to groups (when possible), monitor the administration of the trials, or use the same implementer for all groups.

10. Regression: Groups selected because of unusually high or low scores on pretests (or similar measures) will tend to score closer to the mean on subsequent assessments (Ary, Jacobs, & Razavieh, 1996). This threat occurs when groups are selected on the basis of scores that are not representative of their true performance. For example, a researcher tests all patients in a rehabilitation facility with the same level and type of injury on measures of psychological adjustment. The lowest (i.e., those who show the most significant psychological difficulties in adjusting to their disability) are selected to participate in a six week intervention program. At the end of the program, all subjects are re-tested, the scores are compared, and the scores of the experimental group improved.

Actually, two extraneous variables may have influenced the results of this study. First, most patients will experience greater ease in psychological adjustment over time, particularly if counseling support is available in a rehabilitation setting such as the one referenced in this example. Second, there is a tendency for extreme scores to move closer to the mean on subsequent measures.

Is movement in scores over time due to the effects of the intervention or to regression to the mean? A researcher may attempt to control for this threat by eliminating extreme scores from participation in the study or by randomly assigning individuals to groups (theoretically, regression to the mean should occur equally in both groups). By analyzing the raw data for aberrant scores which make extreme moves, a researcher may conclude that this effect is not typical, but a result of measurement error.

11. Statistical Conclusion Validity: This threat occurs when analytical errors are made and these produce invalid results. There are numerous statistical errors that can corrupt the data such as the reliability of measurement instruments, violations of the assumptions of the statistical tests used, or even selecting the wrong statistic for data analysis. Sample size is important to consider, particularly if very few or a large number subjects were used in the study. Statistical analysis may produce invalid results by being over-sensitive (if the sample size is large) or under-sensitive (if the sample size is small) to differences attributed to the treatment. In other words, when sample sizes are very large statistical analysis may detect positive effects that do not exist. When sample sizes are very small, statistical analysis may not be sensitive enough to detect the differences that exist; so, even though the treatment did have an effect, it is not recognized (Ary, Jacobs, & Razavieh, 1996).

Are the results based on what truly occurred throughout the study, or are they due to statistical errors? A researcher often consults with statisticians during the course of the study to insure that all analytical errors are prevented. Life care planners should be familiar enough with basic statistical analysis to determine whether the conclusions reached by the researcher are plausible.


External Validity

External validity refers to the extent to which the results of a study generalize to the target population and/or other groups of individuals. In evaluating the external validity of a study, life care planners must consider how well the sample, administration of the treatment, and all related factors match the “real world” experience of those with whom we work. Threats to external validity include the following:

1. Influence of Testing: If all of the participants in the study were pretested, it may or may not be possible to generalize the findings to others. In real world applications, it may not be possible to pretest patients, so to what degree can the results of the study be generalized to others? And, to whom can the results be generalized?

In order to control for this threat, a researcher may not pretest subjects or may use a research design such as a “Solomon Four-Group design.” This procedure randomly assigns subjects to each of four groups; two are pretested, two are not. One non-pretested group and one pretested group receive treatment and all four groups are posttested (Campbell & Stanley, 1966). This design allows the researcher to analyze the effects of pretesting.

2. Influence of Selection: If subjects self-select or volunteer to participate in the study, an unusual sample may result; one which is not representative of the target population. Consider whether data reflects the effects of the treatment or the desire of participants to cooperate with and please the researcher.

In order to control for this threat, a researcher may randomly assign subjects from the target population, if possible. In clinical studies, this option is not often possible. Therefore, many researchers utilize the most appropriate design available to them, publish the results, and call upon others to replicate the study for further corroborate findings.

3. Reactive Effects: The fact that subjects realize that they are participating in a study may effect the results and limit the degree to which results can be generalized. The observed effects may be due to the fact that subjects are, consciously or unconsciously, “performing” in a way which is inconsistent with their typical behavior. Subjects may simply be reacting to the novelty and experience of participation/observation. This is similar to what occurs when subjects are given placebo drugs in pharmaceutical trials exhibit improvement in measured symptoms, even though no treatment was administered.

This is a difficult threat for researchers to control, particularly when issues of informed consent, human subjects restrictions, and ethical responsibilities limit the degree to which covert observation/experimentation can occur. There are statistical techniques which may be useful in detecting “false” behaviors by comparing early subject-specific data to later subject-specific data. In many cases the reactive effect of participation decreases and subjects resume typical behaviors, but this is difficult to control for within a large sample. Results simply have to be replicated over a variety of conditions.

4. Multiple Treatment Interference: Subjects who have participated in other studies, particularly ones of similar design or treatment, may be performing as a function of previous participation experiences, rather than as a function of the treatment. Previous treatment effects cannot be eradicated. Also, if subjects were exposed to multiple treatments throughout the course of the study, the accumulated effects of repeated testing may influence results. It may be problematic to generalize findings from this type of study when all other members of the target population (i.e., those who were not involved in the study) are not similarly exposed to multiple treatments.

A researcher may minimize the effects of multiple treatment interference by choosing the most appropriate research design to control for this threat, calling for replication, and considering the most appropriate sample selection process (Campbell & Stanley, 1966).

5. Interaction of Time and Treatment Effects: Results from a study may not be appropriate for generalization to the target population or other groups if they cannot be sustained over time. While initial posttests may indicate an improvement in a specific measure, the effects of treatment may decrease over time. For example, subjects may lose weight by participating in an experimental intervention program; therefore, the researcher concludes that the intervention was successful. If, after six months, most participants regained what they had lost is the intervention still considered to be successful? The answer depends upon how the results were reported and how the data was used in relation to the target populations and other groups of individuals.

A researcher may administer posttests for a specified length of time in order to ascertain the extent to which permanent behavior change occurred in subjects as a result of the intervention.

6. Posttest Sensitization: The administration of a posttest to subjects may actually provide a means of solidifying, clarifying, and facilitating the acquisition of concepts instructed through the applied intervention (Ary, Jacobs, & Razavieh, 1996). In other words, by completing a posttest, subjects are given an opportunity to reflect upon their experience, engage in problem-solving which may serve as an extension of the intervention. Life care planners must consider whether the administration of a posttest may influence results and whether conclusions may be reasonably applied to the target population.

If after considering all of the potential threats to the validity of research, one may wonder how this literature can be relied upon to inform any life care planning decisions. This discussion was presented in order to raise the awareness of those consulting research and to caution professionals from unquestioningly accepting the conclusions asserted by authors.

Even in disciplines beyond your field of professional practice, you must be able to critique the design, implementation, and results reported by the researcher. Scientific inquiry encourages consumers of research to evaluate the relevance and accuracy of what is purported be “known” within a discipline. By becoming familiar with research design and statistics, life care planners will be in a much better position to identify the information most beneficial to the specialty.


Life Care Planning Education & Research Vocational Analysis