- RESEARCH DESIGN AND STATS
Statistical analysis provides researchers with a means of
translating data collected from a sample into numerical expressions
that represent the characteristics of the sample. Mathematical
principles are applied to data in an effort to objectively
determine whether change occurred as a result of intervention
or treatment. For many rehabilitation professionals, the
study of statistics is a “necessary evil” and
is generally the least favored coursework of those enrolled
in graduate or continuing education programs.
For the purposes of this chapter, an overview of basic statistical
methods will be helpful to life care planners in understanding
how research conclusions were reached and how findings may,
or may not, relate to a specified area of interest. This
section will reorient you to the basic statistical methods
commonly utilized in rehabilitation and medical research.
There are two types of statistics generally performed on
research data; descriptive and inferential statistics.
Descriptive statistics help researchers to organize, summarize,
and visualize the data collected from samples. At first,
researchers are interested in learning how the response variable(s)
The set of scores or data collected from subjects is referred
to as the “distribution.” Distribution of data
may be displayed creating an ordered list of scores (i.e.,
from high scores to low scores), creating a frequency distribution
(i.e., tally of the raw scores), or creating a histogram
(i.e., bar graph) to visually summarize the data. Once the
distribution has been established, additional descriptive
analyses may occur.
Measures of Central Tendency
Measures of central tendency simply calculate the mean (i.e.,
the average performance of all subjects), median (i.e., the
central score of all subjects which divides scores into equal
parts), and mode (i.e., the most frequently attained score
observed in all subjects). Each value provides the researcher
with slightly different information, but all describe the “central
tendency” of the participants. For example, the mean
is affected by extreme scores so in distributions having
many extreme values, the median may be a more accurate measure
of central tendency (i.e., the median is not affected by
extreme scores). If the distribution of scores is absolutely
normal (i.e., a “bell curve”), the mean, median,
and mode will be identical.
A normal distribution produces a bell curve when scores
are graphically displayed. Life care planners should note
that this “normal” distribution is a mathematical
model and is rarely produced by most studies. Many descriptive
and inferential statistics are based upon this model of normal
distribution, but specific correction techniques may be employed
to account for a certain degree of abnormality. In reality,
the shape of distributions may be bimodal, asymmetrical,
skewed, flat, or otherwise dissimilar from a bell-shaped
Measures of Variability
Range and standard deviation describe the extent to which
scores are dispersed across the distribution. The range is
a rough measure of difference between the highest and lowest
scores. The resulting number is not informative enough, however,
so researchers calculate the standard deviation (The standard
deviation is the square root of the variance and reflects
the average deviation of each individual score from the mean
score). In short, the standard deviation and mean scores
of a distribution help the researcher to identify the average
scores and the average variability of scores from the mean.
Imagine a normal distribution; a bell-shaped curve. Draw
a line directly in the center of the “bell” at
the highest point of the curve. This represents the mean
where 50% of the scores fall below and 50% of the scores
fall above. Now, continue dividing the “bell” into
equal parts; four parts above the mean, and four parts below
the mean. Label each of the dividing lines as illustrated
SD: -4 -3 -2 -1 0 +1 +2 +3 +4
Percent: -- .1% 2% 16% 50% 84% 98% 99.9% --
Theoretically speaking (i.e., in a normal distribution)
68% of the scores lie within plus and minus one standard
deviation of the mean; 95% of the scores lie within plus
and minus two standard deviations of the mean.
Note: Keep in mind that the normal distribution
is not a fact of nature, but is a mathematical model, only!
Measures of Correlation
Measures of correlation describe the extent to which two
variables are related, or co-vary with one another. Correlation
statistics indicate the magnitude (i.e., strength) and direction
(i.e., positive or negative relationship) between two variables.
Correlation coefficients range from -1.0 to +1.0 with +1.0
indicating a perfect relationship (i.e., an increase in one
variable is accompanied by a proportional increase in the
other variable). When graphed on a scatterplot or statistically
analyzed (i.e., contingency tables), the measure of correlation
(i.e., the correlation coefficient) is an index of the linear
relationship of the variables.
A common example of correlation is when height and weight
are considered. In most cases, individuals who are taller
also weigh more than those who are shorter. Of course there
are exceptions, so there is not a +1.0 correlation between
these variables. The actual observed relationship is +0.8;
less than perfect, but a strong linear relationship (Bellini & Rumrill,
The correlation coefficient does not tell the researcher
whether statistical significance has been achieved, it only
serves to quantify the relationship that exists.
Inferential statistics are based on probability theory and
are used to calculate the degree to which results derived
from the sample can be generalized to the target population.
Empirical data is translated into probability statements
which are used to infer the relationship between
variables within a target population (based upon what was
observed in the sample). Put another way, tests of statistical
significance determine the probability that the findings
produced by the sample are also true within the target population.
A Note About Inferential Statistics:
*Statistical tests do not confirm that the research hypotheses
*Statistical tests do not guarantee that the same results
will be obtained if replicated (Cohen, 1990)
In the social sciences, statistical significance is typically
determined when the probability (i.e., the p value)
of an occurrence is less than 5%, or 0.05. When a p value
is reported as being less than or equal to 0.05, the researcher
interprets it to mean that there is likely to be a statistically
significant relationship between the variables under investigation
within the target population. Conversely, when a p value
is found to be more than 0.05, the researcher concludes that
there is likely to be no statistical relationship between
the variables within the target population.
*In other words, a statistically significant result (at
the p<.05 level) means that there is a 95% probability
that results reflect what truly occurs between the variables
within the target population.
Researchers may set the p value at any value, but most are
at 0.05 or 0.01 for a more stringent test, or .10 for a less
stringent test of significance. Determining the statistical
significance of the data provides researchers with a means
and level of confidence when identifying whether the results
of the study were due to chance or to the treatment.
A Note About Inferential Statistics:
*”The only way to know for certain the actual nature
of the relationship between these variables in population
of interest is to sample every member of the population,
an impossible task in nearly every instance of research (Bellini & Rumrill,
Sample size (i.e., the number of subjects participating
in a study) has an enormous effect on tests of statistical
significance. If the sample size is large, statistical tests
may detect significance very small correlations simply because
the number of subjects causes the calculation to appear as
though results were not due to chance, but to the treatment.
As may be imagined, this fact has created a great deal of
confusion and misinterpretation in the research literature.
Life care planners should be aware of this fact and critically
review the conclusions drawn from large sample sizes (Cohen,
1999; Hunter & Schmidt, 1990).
Hunter and Schmidt (1990) propose the following exercise:
Imagine that you reviewed all of the research studies
regarding a specific counseling technique or therapy and
tallied the number of studies that concluded that the intervention “worked” and
those that concluded that the intervention did not “work.” After
reading these conflicting reports a student may determine
that the evidence in favor of the intervention is inconclusive
and does not have merit. Is the student correct? Possibly,
but upon closer review the student recognizes that the
studies used samples of varying sizes and probabilities
of varying values. Should this change the student’s
Limitations in the sensitivity of significance tests and
the practice of using them as the only measure of results
has led to the development of alternatives such as “effect
size measures.” Effect size refers to the proportion
of variance in one variable (or a set of variables) that
is accounted for by another variable (or set of variables)
(Cohen, 1988). A d statistic is a measure of effect
size and may be reported by researchers comparing the mean
difference in standard deviations between two groups. Basically,
the d statistic allows research findings of various
sample sizes and outcome measures to be directly evaluated.
Researchers may report the d statistic of the data
to facilitate cross-study comparisons (Bellini & Rumrill,
Methods of Statistical Analysis
There are many statistical techniques by which data are
analyzed. Bellini and Rumrill (1999) note, “Methods
are tools, and the methods of statistical analysis are meaningful
only when they are applied within an appropriately designed
study and interpreted within the theoretical context of the
The following statistical tests are commonly utilized in
rehabilitation and social sciences: The t-test, analysis
of variance (ANOVA), multiple regression, and multivariate
analysis. Each of these tests may sound complicated,
but are readily understood when the assumptions are known.
One of the least complicated statistical analyses to perform
is the t-test. This statistic measures the mean differences
between two groups, usually between the experimental and
control groups. The t-test is one method used by researchers
to determine whether the mean differences between groups
is large enough to be considered “significant” or
whether the results were likely due to chance.
Consider the following scenario which is typical of research
studies in rehabilitation sciences:
Mary has developed a twelve-week vocational adjustment
and training program for adults who have sustained a physical
injury requiring them to locate employment outside of their
field of expertise. She wants to test her program to ascertain
whether the individuals who complete it will exhibit higher
levels of psychological adjustment than those who participated
in the traditional program.
Mary gains the cooperation and consent of a group of
individuals seeking vocational assistance, and randomly
assigns them to two groups. One group will participate
in Mary’s twelve-week program and the others will
receive traditional vocational counseling and guidance.
The study is initiated and after twelve weeks, all participants
complete a self-report questionnaire.
Mary expects that the mean psychological adjustment
scores of the individuals who participated in her vocational
program will be higher than those who completed the traditional
Mary looks at between-group differences because she believes
that the mean scores of these two groups will be unequal
due to the benefits of her vocational adjustment program.
She also realizes that individuals within each group will
be different from one another (this is a fact of most research
studies) so she must analyze the data for within-group differences.
The t-test will provide an analysis of the ratio of between-group
differences to within-group differences. If the ratio of
these differences is large enough, statistical significance
is achieved. In other words, the t-test is applied to the
data in an effort to determine whether the ratio of between-group
differences and within-group differences is large enough
that a researcher is able to attribute these differences
to a treatment or intervention, rather than to chance.
In reality, the t-test identifies significance by analyzing
*the vehemence of the treatment or intervention (between-group
*the degree of variance within each group (within-group
*the sample size
The best scenario for a researcher hoping for statistical
significance is when the effect of the treatment/intervention
is large (substantial between-group scores), when there is
little variability among individual scores within each group,
and when a large sample has been obtained.
Analysis of Variance
An analysis of variance (ANOVA) is very similar to a t-test,
as may be inferred by the name, but is used when more than
two groups are involved in the study. Referring back to the
previous example, if Mary were to have developed two different
vocational programs that she wanted to test, Group 1 may
participate in Program A, Group 2 may participate in Program
B, and Group 3 would participated in the traditional program.
Just like the t-test, the ANOVA determines the mean deviations
within and between all three (or more) groups. An additional
type of test, the post hoc (or “after the fact”)
test, is used to obtain more information about the mean differences
of the three groups. For example, Mary would likely be interested
in knowing how each of her vocational programs compared with
the traditional program, and which of the two may have been “better” than
Post hoc tests allow researchers to compare the mean differences
of Group 1 to Group 2, Group 2 to Group 3, and Group 1 to
Group 3. This way, researchers are better able to determine
the relative effectiveness of each group as compared to the
By performing a factorial analysis of variance, or factorial
ANOVA, researchers are able to analyze the separate as well
as the interactive effects of two or more categorical (i.e.,
differing in kind, not amount or degree) variables.
Consider the following scenario which is based upon an actual
study conducted by Leierer, et al. (1996):
John is rehabilitation counselor who has worked with
individuals with physical disabilities for many years and
has noticed certain patterns in consumer behavior. Upon
case closure or termination, consumers are asked to complete
an evaluation of their counselor.
Several counselors on staff at his agency have physical
disabilities themselves and John wonders whether consumers
prefer to discuss issues involving the challenges related
to disability with counselors who also have a disability
may personally identify with some of their difficulties.
On the other hand, he wonders whether consumers feel comfortable
discussing general concerns unrelated to disability issues
with any of the counselors on staff, whether with or without
He defines the following parameters:
*John also knows that one of the most important skills counselors
possess is the ability to actively attend to the concerns
of consumers and is a benchmark for professional competence.
This will be one of the variables.
*John hypothesizes that there is a relationship between counselor’s
disability status (whether or not they have a disability)
and the consumer’s satisfaction ratings (the dependent
variable) of their counselors. This will be a second variable.
*In addition, this relationship is influenced by the nature
of the issues discussed with the counselor (consumers
may prefer discussing disability-related issues with a
counselor who has a disability, but has no preference when
issues unrelated to the disability are being discussed).
This will be a third variable.
By performing an ANOVA, John will be able to parcel out
the effect of the status variables (counselor’s disability,
professional competence rating) and independent variable
(nature of the issues discussed) to identify the main effect.
The main effect demonstrates the effect that an independent
variable has on a specific dependent variable without being
influenced by the other variables under study. The ANOVA
will also enable him to analyze the influential effects of
the combinations of independent variables, or interactive
effects, of all of the elements under investigation.
Multiple regression is similar to the factorial ANOVA but
may be used to predict and identify causal explanations,
rather than to simply identify relationships. Multiple regression
analyzes the multiple relationships between a set of
independent variables and one dependent variable.
In other words, this statistic is used as a means of predicting
an outcome based upon a combination of two or more variables.
Multiple regression is more flexible than ANOVA which is
limited to categorical variables. Multiple regression techniques
can analyze multiple continuous, dichotomous, or categorical
For example, a researcher may want to identify the variables
which best predict return to work success following physical
injury. Based upon what is known in the field of vocational
rehabilitation, the researcher may select the following variables:
age of onset, previous work history, severity of physical
impairment, marital status, etc., believed to influence vocational
outcome. Multiple regression analysis allows the researcher
to identify the combination of factors most likely to predict
whether individuals successfully return to work after sustaining
According to Bellini and Rumrill (1999), multivariate analysis
is less commonly utilized in rehabilitation sciences than
multiple regression, but may be useful depending upon the
research design of a particular study. Rather than one method,
multivariate analysis refers to a group of statistical techniques
which analyze the effects of one or a set of variables
on a set of continuous variables.
Statistical Significance vs. Practical Significance
Life care planners need to be familiar enough with statistical
methods in order to determine the difference between the
statistical significance of a study, and the practical significance
of a study. While statistical significance is the yardstick
by which research findings are measured, it may not always
be a useful criterion for determining whether the results
have any practical importance affecting the welfare of individuals
Knowing the effect of a large sample size on probability
(i.e., large sample sizes tend to detect significance when
it may not be truly present), life care planner should pay
close attention to the reported actual differences among
group means and other indicators of magnitudes of
Bellini and Rumrill (1999) assert, “Evaluating the
practical significance of research findings also involves
reassessing the status of the theoretical proposition following
the empirical test, as well as the heuristic and practical
value of the theoretical proposition relative to the goals,
activities, and procedures of the particular agency or program.” Recall
that the purpose of scientific inquiry is to develop theories
which guide a discipline’s philosophy and practice.
Research does not “prove” or “disprove” facts,
but support or refute current theoretical propositions. Bellini
and Rumrill (1999) continue, “…it is the theory,
confirmed by research findings, that provides rehabilitation
practitioners with tools for understanding the relationships
among personal values, beliefs about disability, and subsequent
adjustment of persons with acquired disabilities.”