Say that we have means and SDs at pre-test and post-test for
participants in each of two conditions, 0 and 1, and furthermore that
these summary statistics are reported separately for two different
sub-groups,
and
.
Let
be the sample size for sub-group
in condition
.
Let
be the sample mean of the outcome at time
(where
is the pre-test and
is the post-test), and let
be the sample standard deviation at time
.
To recover the summary statistics for the full sample
(pooled across sub-groups), we can do the following:
- The total sample size in condition
is
- The average outcome in condition
at time
is
- The full-sample variance in condition
at time
is
From these “rehydrated” summary statistics, one could calculate a
standardized mean difference at post-test, adjusting for pre-test
differences, by taking
where
i.e., the pooled sample standard
deviation at post-test. The sampling variance of
can be approximated as
where
is the correlation between the pre-test and the post-test within each
condition and each sub-group.
Alternately, one could take a slightly different approach to
calculating the numerator of the SMD, by instead calculating adjusted
mean differences across sub-groups, and then taking their weighted
average with weights corresponding to the total sample size of the
sub-group. This amounts to using a mean difference that adjusts for
sub-group differences. Denote the difference-in-differences within each
subgroup as
Then the average
difference-in-differences is
where
and
.
This average difference-in-differences could then be used in the
numerator of the SMD, as
The sampling variance of
can be approximated as
Multiple sub-groups
Now suppose that we have the same data as above, but reported
separately for
different sub-groups, indexed by
.
Let
be the sample size for sub-group
in condition
.
Let
be the sample mean of the outcome at time
(where
is the pre-test and
is the post-test), and let
be the sample standard deviation at time
.
To recover the summary statistics for the full sample
(pooled across sub-groups), we can do the following:
- The total sample size in condition
is
- The average outcome in condition
at time
is
- The full-sample variance in condition
at time
is
From these “rehydrated” summary statistics, one could calculate a
standardized mean difference at post-test, adjusting for pre-test
differences, as described above.
Alternately, one could calculate the numerator of the SMD as the
adjusted mean difference, pooled across sub-groups. The average
difference-in-differences is
where
and
.
This average difference-in-differences could then be used in the
numerator of the SMD, as
The sampling variance of
can be approximated as