How to evaluate a meta-analysis study
From
Follow the steps below to evaluate a meta-analysis study.
Step 1: What is the relative risk result?
The first step in evaluating a meta-analysis study is considering the magnitude of the relative risk (RR) statistic, the key result of the study. Remember that a relative risk statistic has nothing to do with risk (e.g., the likelihood that exposure will cause disease. The magnitude of the relative risk statistic is merely indication of the strength of the strength of association between exposure/treatment and the occurrence of disease.
Where can you find out what the relative risk is? The RRs calculated by the researchers will be located in the results section of the study abstract and, usually, in a table in the full study.
If you are reading or hearing about the study from the news media, they will likely report the RR as follows: The exposure increased the risk of disease by X%, where X is some number greater than 0. Although this is an utterly incorrect way to talk about RR (see above paragraph), it is unfortunately common usage by the media and junk scientists. To determine what the study's RR is from such a report, simply divide X by 100 and then add the result to 1.0. So, for example, if there was 75% more disease among those exposed, then the RR would be 1.0 plus 0.75 = 1.75.
Step 2: Evaluating the relative risk: Magnitude
Since, meta-analysis studies typically test whether the occurrence of disease is associated with exposure, news touting a positive meta-analysis study will generally feature an RR greater than 1.0 (or, in incorrect shorthand, an increase in risk greater than 0%).
But just because an RR is greater than 1.0, that does not necessarily mean that the disease is actually associated with exposure. We need to evaluate some characteristics of the RR. The first characteristic is its magnitude or size, also referred to as the strength of association.
As a rule of thumb, meta-analysis studies with RRs less than 2.00 (i.e., a 100% increase in risk in junk science-ese) should be viewed with extreme suspicion. The rationale for this judgment is that the low RR is a weak statistical association.
Based on strength of association guidelines, more confidence is to be had in RRs that are 2.00 or greater. The greater the RR is, then the more confidence that may be had in the existence of a statistical association between exposure and disease.
Step 3: Evaluating the relative risk: Statistical significance (Part 1 of 2)
It is not enough that a meta-analysis study RR is 2.0 or greater, i.e., that the disease is highly correlated with exposure. The RR must also be statistically significant. That is, we want to have some confidence that the RR result is not a fluke or happened simply by chance.
There are two standard tests of statistical significance that should be reported for each RR. You will most likely have to obtain a copy of the actual study to verify statistical significance.
The first test of statistical significance concerns the p-value of the RR. The p-value will usually be found in the same table in the study as the RR. The p-value indicates how much confidence can be had that the RR is different from the no-effect level of 1.0.
By convention:
- If the p-value is 0.05 or less, then the RR is statistically significant. This does not necessarily mean that the RR is not junk science, it just means that the RR has passed the p-value test. Go to #Step 4: Evaluating the relative risk: Statistical significance (Part 2 of 2)
- If the p-value is greater than 0.05, then the RR is not statistically significant at the 95% confidence level. If the RR is not statistically significant, then it is viewed as a meaningless result and as having been debunked.
If a study does not present the p-values of its RRs, interpret this to mean that the researchers were too embarrassed to publish them because they would expose the RRs as not statistically significant, i.e., meaningless.
Step 4: Evaluating the relative risk: Statistical significance (Part 2 of 2)
The second test of statistical significance for an RR involves its confidence interval.
The confidence interval for an RR is typically found in the same place as the RR and its p-value. It is often indicated with the abbreviation C.I. or CI. The confidence interval is a range of values, typically indicated with parentheses) within which the true RR lies, according to a certain degree of confidence. The standard degree of confidence is 95%, designated as 95% C.I. For example, if the RR=2.50, and its confidence interval ranges from 2.0 to 3.0, then this may be designated as RR=2.50 (95%CI 2.0, 3.0).
The two numbers in the confidence interval are referred to as its lower bound and upper bound. In the above example, 2.0 is the lower bound and 3.0 is the upper bound.
In order for the RR of a meta-analysis study to pass the second test of statistical significance, the lower bound of the confidence interval must be greater than 1.0. The reason for this is that if the confidence interval includes 1.0, then we cannot be sure with 95% confidence that the true value of the RR is greater than 1.0. In other words we cannot safely exclude the possibility that the true RR isn't 1.0, the no-effect level. For example, if you see a RR and CI presented as RR=2.60 (95% CI 0.95, 5.25), the RR is not statistically significant because its lower bound (0.95) is less than 1.0 and includes the no-effect level as well as a range (from 0.095+ to 1.00) indicating that the exposure may not be statistically associated with the occurrence of disease.
So in the case of a meta-analysis study, if the lower bound of the confidence interval touches or crosses the 1.0 no effect level, then the RR is not statistically significant, i.e., it is meaningless and debunked.
Confidence intervals are be subject to game-playing by junk scientists. The width of the confidence interval may be adjusted by altering the confidence level. For example, a 90% confidence interval is narrower than a 95% confidence interval, while a 99% confidence interval is wider than a 95% confidence interval. This makes sense if you think about it. You will have less confidence in a narrower range and more confidence in a wide range. So here's the game.
Keeping in mind that the 1.0 or the no-effect level is the third rail of an epidemiology study and means instant death for an RR which confidence interval touches or crosses it, there is incentive to narrow the confidence interval so that the upper bound stays below 1.0. This may be accomplished by choosing to use a lower confidence level than the standard 95% level. This lower confidence level is typically 90%. So if you see a 90% confidence interval, this is a red flag that the researcher is trying to conceal the fact that his CI touches or crosses the no-effect level.
Step 5: Confidence interval width
Another tell-tale sign of a dubious relative risk is a wide confidence interval. The CI indicates the range of values within which the true RR likely falls. A “too-wide” confidence interval indicates that the underlying data are unusually disparate, including too many outliers and oddball data points. A “tight” confidence interval indicates more uniformity and less variance among the data.
How wide is too wide? As a rule of thumb, if a CI is wider than the magnitude of the RR, then you’re looking at goofy data. For example, assume you see a RR=20 (95% CI 5, 167). The width of the confidence interval (162) is many times the size of the size of the RR. A nice tight confidence interval would be, for example, something like RR=20 (95% CI 18, 22), where the width of the CI is only a fraction of the size of the RR.
Consider the width of the confidence interval as an indication of the “margin of error” concept used in polling. When the polling results between, say, two candidates are within the margin of error, the candidates are considered to be in a statistical dead heat. If the CI is wider than the RR is large then the RR is statistically dead.
Step 6: Examine the component studies
Even if the RR reported for a meta-analysis passes the strength of association and statistical significance tests, more analysis must be undertaken. Remember that a meta-analysis is essentially a review study with empirical components (i.e., the relative risk and confidence interval). In the preceding steps, only these empirical components have been so far considered. Now the component studies must be examined for their credibility. You may need to get copies of the component studies to do this.
A meta-analysis study may contain summaries of its component studies, but those summaries may be incomplete and/or wrong. A [meta-analysis study] author may accept at face value the conclusions drawn by the authors of the component studies. But if these conclusions are wrong or unwarranted, then the review study author’s analysis will be similarly afflicted. Each component study should be examined as you would individual clinical trial, cohort and meta-analysis studies.
Step 7: Have any studies been excluded?
Don’t assume that a meta-analysis study contains the entire, relevant body of scientific literature. Not only may a biased author exclude studies that don’t fit his predetermined conclusion, but keep in mind the phenomenon of publication bias, which tends to favor the publication of “positive studies” over “negative” studies. While not much can be done about publication bias, the former problem can be checked with the data base. Sometimes authors will inadvertently hint that studies have been excluded by, for example, mentioning cut-off dates for study publication, English-language only studies or other exclusion criteria. If you locate excluded studies, what results do they report? If the results of excluded studies are of a contradictory or debunk-atory nature, then you may have just learned why they were excluded. Any review study that excludes contradictory studies should be viewed with prejudice.
Step 8: Not all studies are created equal
Once you have a good sense of the component studies (and what studies, if any, have been excluded), next consider how the meta-analysis study author has evaluated and weighted the individual studies in his own analysis. Keep in mind that epidemiology studies are inherently different from one another. Cohort studies have a better design than meta-analysis studies. Every population is different, demographically and size-wise. Data may be collected, measured or estimated in different ways that impact its reliability. Accordingly, some studies are better-conducted and, hence, more credible, than others. Consider also how the review study authors interprets/handles contradictory studies. Do positive results inexplicably carry more weight with the review author than negative results? Does the author seem to go out of his way to rationalize/apologize for shortcomings in studies with positive results while treating negative results dismissively? Affirmative responses to these questions may indicate author bias. Meta-analysis authors typically assign weights to component study RRs in order to calculate the RR for the meta-analysis. What is the weighting scheme? Is it study size? Study type? How does the weighting scheme affect the outcome? How would other weighting schemes affect the outcome?
Return to the step-by-step guide for debunking.
Return to the DEBUNKosaurus™ Main Page.
