- RESEARCH DESIGN AND STATS
Threats to Validity
It is extremely difficult, if not impossible, to control
for all of the possible complications and events which may
affect subjects participating in a research project. However,
responsible, ethical methodology dictates that all threats
to the validity of a study must be considered and accounted
for throughout the design process. While a researcher is
expected to account for all of the possible influences which
may have altered the results, some are reluctant to do so
fearing that the such acknowledgment may discredit their
work. Life care planners must be aware of these threats so
that we may independently evaluate the efficacy of research
studies and draw alternative conclusions from the data where
The validity of a research design is evaluated in two ways;
the internal validity of the study and the external validity
of the study.
Internal validity evaluates the extent to which extraneous
factors, rather than the treatment, may have produced the
outcomes of the study. When a researcher designs the methodology
to be employed throughout the project, careful consideration
is given all factors which may exert an unintended effect
and cause subjects to respond differently than they would
have otherwise. Internal validity seeks to answer the question:
Was the treatment responsible for the results of
the study or was it something else?
1. Sample Selection: Consider the fundamental
differences between the control group and the treatment group,
or between the subjects who are being compared. There may
have been significant differences between these groups from
the conception of the project.
How were these groups selected? If subjects were randomly
selected and randomly assigned to groups, the threat is decreased.
A researcher may administer a pre-test to all subjects then
compare the responses of the groups to ensure that they are
similar before introducing the treatment or intervention.
2. History: Events may occur during the
course of a study that impact the responses of the subjects.
For example, a national news story which is closely related
to the topic of the study, or a natural disaster occurring
in an area where many of the subjects reside may influence
their responses to the treatment or intervention.
What did the subjects experience during the course of the
study? A researcher may ask all participants to complete
the study in an isolated setting or to keep a diary detailing
the event of their lives throughout the project. If the subjects
were randomly assigned to groups, theoretically, extraneous
factors will influence the groups equally.
3. Mortality: Some of the subjects may
drop out of the project, move, or be unreachable for follow-up
evaluation. This may present difficulty for the researcher
if mortality affects groups at different rates. For example,
in a two-group study of 50 participants each, if 15 drop
out of Group A and only two drop out of Group B the groups
may no longer be suitable for comparison. A researcher must
try to identify the cause for attrition.
How many subjects dropped out of the study? A researcher
may attempt to re-establish contact with subjects or rely
upon statistical procedures designed to account for missing
data (Campbell & Stanley, 1966).
4. Location: Consider the location and
circumstances under which the first sets of data were gathered
as compared to the location or circumstance under which the
second set of data were gathered. If the situations were
different, the setting or circumstance under which data was
collected may have influenced the response of subjects (rather
than the treatment or intervention).
What were the circumstances under which all sets of data
were collected? A researcher should consider the quality
of testing environments, similarity among sites, and describe
the general testing circumstance.
5. Instrumentation: Changes in the calibration
of the testing instruments or equipment, or changes in observers
or scorers may affect the data. Life care planners should
be aware of the conditions under which measurement instruments
were normed, how they should be administered, and the purpose
for which they were developed. There is also the possibility
of scorer bias, whether conscious or unconscious.
Were the measurement instruments correctly used? A researcher
may randomly assign scorers to participant groups, or employ
blind or double-blind data collection techniques. A researcher
may train and then pre-test scorers so that all are clear
as to what is/is not to be tabulated and how the scores should/should
not be derived. These pretests can be analyzed for inter-rater
reliability and intra-rater reliability before scorers are
given the responsibility of data collection.
6. Testing: The “practice effect” of
pretesting may influence the outcome of posttests, particularly
when the contents of these assessments are closely related.
In addition, the contents of a pretest may make subjects
more sensitive or responsive to the treatment or intervention.
What effect might the pretest have exerted upon the results
of the posttest? A researcher may choose not to administer
a pretest. Theoretically, and if the sample size is large
enough, this threat should equally effect all groups if the
subjects were randomly assigned.
7. Maturation: Particularly in a longitudinal
study, changes over the course of the study may be attributable
to the effects of time, rather than the intervention or treatment.
For example, first graders may respond to project assessments
much differently at the end of the school year simply due
to maturation effects, rather than the intervention applied
over several months. In another example, improvements in
cognitive functioning may be a result of natural, biological
processes rather than the rehabilitation program instituted
Were the effects due to the intervention or to maturation?
A researcher may select subjects who are relatively mature
or exhibit stability on measures of interest. Also, the duration
of the experiment may be limited to control for the effects
of maturation, fatigue, or physical changes. Theoretically,
if subjects were randomly selected this threat should affect
all groups equally.
8. Attitude of Subjects: The approach
and mindset of study participants can affect the outcome
of the project. For example, subjects may put forth exceptional
effort because they know their performance will be evaluated.
Or, subjects may feel insulted based upon how they perceive
the group of which they were assigned, particularly if the
groups are being treated differently beyond the administration
of the independent variable. When evaluating research, life
care planners should consider whether results were affected
by the experience of subjects in the experimental condition
or whether results reflect only the influence of the treatment
Do the results reflect the subjects’ reaction to the
experimental condition or the treatment? A researcher should
make a conscious effort to treat all groups the same, aside
from the administration of the treatment. Unobtrusive measures
may be selected so that scorers are able to observe subjects’ behavior
without disrupting the natural circumstances of the environment
or calling attention to their task.
9. Implementation: This threat occurs
when implementers of the treatment or intervention use different
methods in instructing or implementing the independent variable.
An implementer may like one intervention better than the
others and do a better job of implementing it. For example,
if a study was designed to examine the effects of a new teaching
method, an implementer who preferred the traditional method
may not teach the experimental method as well.
Could the implementer have influenced the results of the
study? A researcher may randomly assign implementers to groups
(when possible), monitor the administration of the trials,
or use the same implementer for all groups.
10. Regression: Groups selected because
of unusually high or low scores on pretests (or similar measures)
will tend to score closer to the mean on subsequent assessments
(Ary, Jacobs, & Razavieh, 1996). This threat occurs when
groups are selected on the basis of scores that are not representative
of their true performance. For example, a researcher tests
all patients in a rehabilitation facility with the same level
and type of injury on measures of psychological adjustment.
The lowest (i.e., those who show the most significant psychological
difficulties in adjusting to their disability) are selected
to participate in a six week intervention program. At the
end of the program, all subjects are re-tested, the scores
are compared, and the scores of the experimental group improved.
Actually, two extraneous variables may have influenced the
results of this study. First, most patients will experience
greater ease in psychological adjustment over time, particularly
if counseling support is available in a rehabilitation setting
such as the one referenced in this example. Second, there
is a tendency for extreme scores to move closer to the mean
on subsequent measures.
Is movement in scores over time due to the effects of the
intervention or to regression to the mean? A researcher may
attempt to control for this threat by eliminating extreme
scores from participation in the study or by randomly assigning
individuals to groups (theoretically, regression to the mean
should occur equally in both groups). By analyzing the raw
data for aberrant scores which make extreme moves, a researcher
may conclude that this effect is not typical, but a result
of measurement error.
11. Statistical Conclusion Validity: This
threat occurs when analytical errors are made and these produce
invalid results. There are numerous statistical errors that
can corrupt the data such as the reliability of measurement
instruments, violations of the assumptions of the statistical
tests used, or even selecting the wrong statistic for data
analysis. Sample size is important to consider, particularly
if very few or a large number subjects were used in the study.
Statistical analysis may produce invalid results by being
over-sensitive (if the sample size is large) or under-sensitive
(if the sample size is small) to differences attributed to
the treatment. In other words, when sample sizes are very
large statistical analysis may detect positive effects that
do not exist. When sample sizes are very small, statistical
analysis may not be sensitive enough to detect the differences
that exist; so, even though the treatment did have an effect,
it is not recognized (Ary, Jacobs, & Razavieh, 1996).
Are the results based on what truly occurred throughout
the study, or are they due to statistical errors? A researcher
often consults with statisticians during the course of the
study to insure that all analytical errors are prevented.
Life care planners should be familiar enough with basic statistical
analysis to determine whether the conclusions reached by
the researcher are plausible.
External validity refers to the extent to which the results
of a study generalize to the target population and/or other
groups of individuals. In evaluating the external validity
of a study, life care planners must consider how well the
sample, administration of the treatment, and all related
factors match the “real world” experience of
those with whom we work. Threats to external validity include
1. Influence of Testing: If all of the
participants in the study were pretested, it may or may not
be possible to generalize the findings to others. In real
world applications, it may not be possible to pretest patients,
so to what degree can the results of the study be
generalized to others? And, to whom can the results
In order to control for this threat, a researcher may not
pretest subjects or may use a research design such as a “Solomon
Four-Group design.” This procedure randomly assigns
subjects to each of four groups; two are pretested, two are
not. One non-pretested group and one pretested group receive
treatment and all four groups are posttested (Campbell & Stanley,
1966). This design allows the researcher to analyze the effects
2. Influence of Selection: If subjects
self-select or volunteer to participate in the study, an
unusual sample may result; one which is not representative
of the target population. Consider whether data reflects
the effects of the treatment or the desire of participants
to cooperate with and please the researcher.
In order to control for this threat, a researcher may randomly
assign subjects from the target population, if possible.
In clinical studies, this option is not often possible. Therefore,
many researchers utilize the most appropriate design available
to them, publish the results, and call upon others to replicate
the study for further corroborate findings.
3. Reactive Effects: The fact that subjects
realize that they are participating in a study may effect
the results and limit the degree to which results can be
generalized. The observed effects may be due to the fact
that subjects are, consciously or unconsciously, “performing” in
a way which is inconsistent with their typical behavior.
Subjects may simply be reacting to the novelty and experience
of participation/observation. This is similar to what occurs
when subjects are given placebo drugs in pharmaceutical trials
exhibit improvement in measured symptoms, even though no
treatment was administered.
This is a difficult threat for researchers to control, particularly
when issues of informed consent, human subjects restrictions,
and ethical responsibilities limit the degree to which covert
observation/experimentation can occur. There are statistical
techniques which may be useful in detecting “false” behaviors
by comparing early subject-specific data to later subject-specific
data. In many cases the reactive effect of participation
decreases and subjects resume typical behaviors, but this
is difficult to control for within a large sample. Results
simply have to be replicated over a variety of conditions.
4. Multiple Treatment Interference: Subjects
who have participated in other studies, particularly ones
of similar design or treatment, may be performing as a function
of previous participation experiences, rather than as a function
of the treatment. Previous treatment effects cannot be eradicated.
Also, if subjects were exposed to multiple treatments throughout
the course of the study, the accumulated effects of repeated
testing may influence results. It may be problematic to generalize
findings from this type of study when all other members of
the target population (i.e., those who were not involved
in the study) are not similarly exposed to multiple treatments.
A researcher may minimize the effects of multiple treatment
interference by choosing the most appropriate research design
to control for this threat, calling for replication, and
considering the most appropriate sample selection process
(Campbell & Stanley, 1966).
5. Interaction of Time and Treatment Effects:
Results from a study may not be appropriate for generalization
to the target population or other groups if they cannot be
sustained over time. While initial posttests may indicate
an improvement in a specific measure, the effects of treatment
may decrease over time. For example, subjects may lose weight
by participating in an experimental intervention program;
therefore, the researcher concludes that the intervention
was successful. If, after six months, most participants regained
what they had lost is the intervention still considered to
be successful? The answer depends upon how the results were
reported and how the data was used in relation to the target
populations and other groups of individuals.
A researcher may administer posttests for a specified length
of time in order to ascertain the extent to which permanent
behavior change occurred in subjects as a result of the intervention.
6. Posttest Sensitization: The administration
of a posttest to subjects may actually provide a means of
solidifying, clarifying, and facilitating the acquisition
of concepts instructed through the applied intervention (Ary,
Jacobs, & Razavieh, 1996). In other words, by completing
a posttest, subjects are given an opportunity to reflect
upon their experience, engage in problem-solving which may
serve as an extension of the intervention. Life care planners
must consider whether the administration of a posttest may
influence results and whether conclusions may be reasonably
applied to the target population.
If after considering all of the potential threats to the
validity of research, one may wonder how this literature
can be relied upon to inform any life care planning decisions.
This discussion was presented in order to raise the awareness
of those consulting research and to caution professionals
from unquestioningly accepting the conclusions asserted by
Even in disciplines beyond your field of professional practice,
you must be able to critique the design, implementation,
and results reported by the researcher. Scientific inquiry
encourages consumers of research to evaluate the relevance
and accuracy of what is purported be “known” within
a discipline. By becoming familiar with research design and
statistics, life care planners will be in a much better position
to identify the information most beneficial to the specialty.