Extracting Data from Research Papers for Meta-Analysis

This document aims to show you how to extract data from research papers for use in a meta-analysis. It does not show you how to select the papers or undertake a systematic review which must be undertaken before any meta-analysis. Support with undertaking a systematic review is available here. We cover both categorical and then continuous outcomes as the issues with these sorts of data are different.

Categorical Data

The example we will use involves the results of three studies that examined (amongst other things) the incidence of the three genotypes labelled as TT, MT or MM. People will have one of these three possible genotypes. The incidence of these genotypes in patients with Hypertension (high blood pressure) were compared with a control group of participants without hypertension. The three studies were published in:

Say et al. (2005) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1090564/
Rodriguez-Perez et al. (2001) https://www.sciencedirect.com/science/article/pii/S073510970101186X?via%3Dihub
Cheng et al. (2012) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3438648/

Our interest is looking at the incidence of the TT genotype and how this differs for hypertension patients (Hypertensive) versus those without hypertension (Normotensive). For example, Say et al. (2012) includes the following table (See page 3 Table 1 of that paper):

The only part of this we are interested in this:

We can see that amongst the Hypertensive patients there were 22 with the TT genotype and the number without the TT genotype was 33+46=79. For the Control group (Normotensives) the number with the TT genotype was 10 and the number without was 77 (59+18).

This allows us to compile the following table, perhaps using Excel:

	Hypertension Group		Control Group
Study ID	TT	Non_TT	TT	Non_TT
Say et al 2005	22	79	10	77
Rodriguez-Perez et al 2001
Cheng et al 2012

We now need to do the same for the remaining studies. If you examine the Rodriguez-Perez et al. (2001) paper you will see this table (See page 1539 Table 2 of that paper):

The only part of this we are interested in this:

In this paper, amongst the Hypertensive patients (Cases column) there were 87 with the TT genotype and the number without the TT genotype was 145+67=212. For the Control group the number with the TT genotype was 60 and the number without was 255 (158+97).

We can therefore add this to our (Excel) table as:

	Hypertension Group		Control Group
Study ID	TT	Non_TT	TT	Non_TT
Say et al 2005	22	79	10	77
Rodriguez-Perez et al 2001	87	212	60	255
Cheng et al 2012

In the final paper by Cheng at al. (2012) you will see this table (See page 511 Table II of that paper):

In this case, amongst the Hypertensive patients there were 165 with the TT genotype and the number without the TT genotype was 108+27=135. For the Control group (Healthy controls) the number with the TT genotype was 69 and the number without was 81 (66+15).

We can therefore add this to complete our (Excel) table as:

	Hypertension Group		Control Group
Study ID	TT	Non_TT	TT	Non_TT
Say et al 2005	22	79	10	77
Rodriguez-Perez et al 2001	87	212	60	255
Cheng et al 2012	165	135	69	81

We are now ready to undertake our meta-analysis – see the worksheet on a Meta-analysis of categorical Outcomes Using SPSS that uses the above data.

Continuous Data

Extracting continuous data can be much more problematic. Some of the key issues you need to consider are:

The outcomes being measured need to be similar enough to sensibly include in a meta-analysis;
We usually need to obtain the mean and the standard deviation of our measure outcomes, as well the sample size, in each group;
Many studies consider measures of the outcomes pre and post an intervention such that the work focuses on the change (post minus pre) in each group. In this case we need the standard deviation for the change in each group. Sometimes you may only see the standard deviation for the pre measurements and the post measurements. Since for each person (sampling unit) the pre and post measures will be correlated we can’t just use these two standard deviations to work out the standard deviation of the change pre to post.

Help with these sorts of issues can be obtained by consulting Chapter 10 of the Cochrane Handbook or Borenstein, M., Hedges, L., Higgins, J. and Rothstein, H. (2009). Introduction to Meta-Analysis. John Wiley & Sons.

The example we will use comes from four studies (research papers) that examined a method called “SMI” (Suboccipital Muscle Inhibition) versus “Other” methods to improve the flexibility of the knee joint in adults. The outcome of interest was the increase in the angle the knee can be moved (Popliteal Knee Angle) in degrees, after using either “SMI” or an “Other” technique. Hence the data are measured in degrees (for angles) and so are continuous data. The studies looked at the change in knee angle, but note that this is measured as Pre-treatment minus Post-treatment (not post-pre) since a reduction is a good thing (the knee can be bent further) and so a reduction would be recorded as a positive change using pre-post.

The four studies were published in:

Kuan and Haslan (2019) https://healthscopefsk.com/index.php/research/article/view/68
Aparicio et al. (2009) https://www.sciencedirect.com/science/article/pii/S0161475409000943#:~:text=Conclusions,for%20this%20group%20of%20subjects
Cho et al. (2015) https://pubmed.ncbi.nlm.nih.gov/25642072/#:~:text=%5BConclusion%5D%20Application%20of%20the%20SMI,Short%20hamstring%3B%20Suboccipital%20muscle%20inhibition
Joshi et al. (2018) https://www.sciencedirect.com/science/article/pii/S1360859218300573

Kuan and Haslan (2019) includes the following table (See page 238 Table 2 of that paper):

The only part of this we are interested in that highlighted above since we are focusing on the Popliteal Knee Angle (PAT).

We have a choice of using the data from the right knee or left knee – that choice is down to you as the subject discipline expert. The SMI column shows the Popliteal Knee Angle for those using the SMI or “Experimental” technique, whilst the SS column is the “Other” or “Control”. We need to read the paper to be sure what the mean values represent, and in this case we see these numbers below do represent the mean change from pre to post. In fact the paper presents a Table 1 to show the pre (baseline) PAT data.

The table above also shows the standard deviations and since the statistics reported above are based on the change from pre to post, we understand that this is indeed the standard deviation of the change. We also see the table includes the sample sizes in each group n=27 for the SMI group and similarly n=27 for the SS group.

This allows us to compile the following table, perhaps using Excel:

	Experimental Group (SM)			Control Group (Other)
Study ID	Mean Change	SD	n	Mean Change	SD	n
Kuan 2019	7.41	7.12	27	5.37	6.78	27
Aparicio 2009
Cho 2015
Joshi 2018

The second paper, Aparicio et al. (2009), includes this table:

Again, the only part of this we are interested in that highlighted above since we are focusing on the Popliteal Knee Angle (PAT) and we have decided to concentrate on the right knee.

However, here we do not have the data on the change from pre to post and so we need to calculate this from the data we do have. Note we might have considered undertaking a meta-analysis on just the post intervention data, but we do not have that for some of the other papers. In addition just focusing on the post intervention data ignores any potential differences in the baseline (starting values) not just between the two groups in this study, but also between the different study groups used in different papers. Hence for these reasons we choose to consider the change.

For the Control group we can see that (for the right knee) the PAT score changed from 27.29 pre to 26.44 post which is a change of 27.29-26.44=0.85.

For the Intervention group we can see that (for the right knee) the PAT score changed from 31.97 pre to 27.83 post which is a change of 31.97-27.83=4.14.

We can add these two mean values to our table (the sample sizes were reported in Table 1 of that paper):

	Experimental Group (SM)			Control Group (Other)
Study ID	Mean Change	SD	n	Mean Change	SD	n
Kuan 2019	7.41	7.12	27	5.37	6.78	27
Aparicio 2009	4.14		34	0.85		34
Cho 2015
Joshi 2018

The sample sizes n=34 for each group we had to find by reading the paper as these were not reported in the table. Note that we do not have the standard deviation for the change for either group! We have the standard deviation for the pre measures and post measures but as we discussed earlier, these are not enough to calculate the standard deviation for the change.

There are various options open to you to “estimate” the standard deviations which are discussed in Chapter 10 of the Cochrane Handbook. However these can been quite complicated and problematic to follow. For example one option we might consider here is using the reported p-value of 0.005 for the PAT right knee data. We can sometimes uses the p-value to determine say a t-statistics (if say, a two-sample t-test had been used) and then from that derive the standard error and then the required standard deviation. Apart from being quiet a complex operation for many people, the added complication we have is that the method of analysis used to compare the two groups was a two-way mixed ANOVA and so it may not be immediately obvious what you should do. In these cases it is suggested you seek advice from a qualified statistician or statistics tutor.

However, the good news is that a simpler and much more accessible option for most people is to estimate the missing standard deviations of the change from the standard deviation values from other studies. See later.

The third paper, Cho et al. (2015), includes this table:

For the SMI (Experimental) group we can see that the change in the PAT angle (PA here) was 5.5 with a standard deviation for this change of 6.6, whilst for the SMI (Control or “Other”) group we can see that the change was 2.3 with an SD of 5.0. We can now add these to our table (the sample sizes were reported also in Table 1 of that paper):

	Experimental Group (SM)			Control Group (Other)
Study ID	Mean Change	SD	n	Mean Change	SD	n
Kuan 2019	7.41	7.12	27	5.37	6.78	27
Aparicio 2009	4.14		34	0.85		34
Cho 2015	5.5	6.6	25	2.3	5	25
Joshi 2018

The final paper, Joshi et al. (2018), includes this table. In this case we have to read the paper to determine that Group B received the equivalent of SMI (Experimental treatment) and Group A received the SS (Control or “Other”) treatment. Group C are not of interest here as they received both treatments. We also need to read the paper carefully to determine that we are interested in the change during PT which is treatment given by the practitioner (whereas the Self columns refer to the change from post study to 2 weeks of self-administration which is not relevant to us). Hence we are interested only in the portion highlighted (KEA refers to Knee Angle):

For the SMI (Experimental) group B, we can see that the change in the PAT angle (PA here) was 9.5, whilst for the SMI (Control or “Other”) group A we can see that the change was 9.0. The numbers in brackets are not the standard deviations for these changes, but instead are the p-values from the tests used to make the comparison of the two groups. Hence this is another paper where the standard deviations of the change are missing (the sample sizes were reported also in Table 1 of that paper). Adding what information we do have to our Excel table gives us:

	Experimental Group (SM)			Control Group (Other)
Study ID	Mean Change	SD	n	Mean Change	SD	n
Kuan 2019	7.41	7.12	27	5.37	6.78	27
Aparicio 2009	4.14		34	0.85		34
Cho 2015	5.50	6.6	25	2.30	5	25
Joshi 2018	9.50		20	9.00		19

It was not possible to obtain the standard deviation for the change in knee angle for two of the studies due to a lack of information in the papers. There are various options open to you to “estimate” these which are discussed in Chapter 10 of the Cochrane Handbook. However, for simplicity we will estimate the missing standard deviations using the information we do have.

For example the estimate for the missing SD for the SMI group, could be based on the mean of the values of 7.12 and 6.6 we do have from other two studies. A simple way of doing this is (7.12+6.6)/2 = 6.86. There are perhaps better ways of doing this but the aim here is to keep things simple. Hence for Aparicio (2009) and Joshi (2018) we use 6.86 as the estimated SD for the Experimental group.

Similarly for the Control group we estimate the missing SDs as (6.78+5.0)/2 = 5.89. Hence for Aparicio (2009) and Joshi (2018) we use 5.89 as the estimated SD for the Control group.

Adding these result to our Excel table gives us:

	Experimental Group (SM)			Control Group (Other)
Study ID	Mean Change	SD	n	Mean Change	SD	n
Kuan 2019	7.41	7.12	27	5.37	6.78	27
Aparicio 2009	4.14	6.86	34	0.85	5.89	34
Cho 2015	5.50	6.60	25	2.30	5.00	25
Joshi 2018	9.50	6.86	20	9.00	5.89	19

We are now ready to undertake our meta-analysis – see the worksheet on a Meta-analysis of Continuous Outcomes Using SPSS that uses the above data.

For more resources, see sigma.coventry.ac.uk Adapted from material developed by Coventry University Creative Commons License