How to evaluate a clinical trial

From

Jump to: navigation, search

Follow the steps below to evaluate a clinical trial.

Contents

Step 1: What is the relative risk result?

The first step in evaluating a clinical trial is considering the magnitude of the relative risk (RR) statistic, the key result of the study. Remember that a relative risk statistic has nothing to do with risk (e.g., the chances or percentage of the time that the treatment is or is not successful. The magnitude of the relative risk statistic is merely indication of the strength of the strength of association.

Where can you find out what the relative risk is? The RRs calculated by the researchers will be located in the results section of the study abstract and, usually, in a table in the full study.

If you are reading or hearing about the study from the news media, they will likely report the RR as follows: The treatment reduced the risk of the disease by X%, where X is some number between 1 and 100. Although this is an utterly incorrect way to talk about RR (see above paragraph), it is unfortunately common usage by the media and junk scientists. To determine what the study's RR is from such a report, simply divide X by 100 and then subtract the result from 1.0. So, for example, if there was 75% less disease among the group that received the treatment, then the RR would be 1.0 minus 0.75 = 0.25.

Step 2: Evaluating the relative risk: Magnitude

Since, clinical trials typically test whether a treatment is effective at treating a disease, news touting a successful trial will generally feature an RR less than 1.0 (or, in incorrect shorthand, a reduction in risk up to 100%).

But just because an RR is less than 1.0, that does not necessarily mean that the treatment was really successful. We need to evaluate some characteristics of the RR. The first characteristic is its magnitude or size, also referred to as the strength of association.

As a rule of thumb, clinical trials with RRs greater than 0.50 (i.e., up to a 50% reduction in risk in junk science-ese) should be viewed with suspicion. This is a judgment call and the rationale for this judgment is that it is a weak statistical association.

More confidence is to be had in RRs that are 0.50 or less. Some researchers will say that a 0.50 standard is too tough for a clinical trial RR, under the rationale that the data quality and study design is much more reliable than other forms of epidemiology, like cohort and case-control studies. There may be some merit in that claim. Keep in mind, however, that if a particular treatment operated as a true cure, then the clinical trial RR would approach 0. The stronger case for a clinical trial RR being meaningful is where a treatment seems to help more than half of the study subjects. When a treatment helps less than half of the study subjects or only some or a few subjects, then the clinical trial should be re-thought and re-designed to better who, if anyone, the treatment is really helping.

Step 3: Evaluating the relative risk: Statistical significance (Part 1 of 2)

It is not enough that a clinical trial RR is 0.50 or less, i.e., that the treatment is highly correlated with a reduction in disease. The RR must also be statistically significant. That is, we want to have some confidence that the RR result is not a fluke or happened simply by chance — say, for example, the researchers lucked into picking study subjects to receive the treatment whose disease went away on its own or due to some other non-treatment related factor.

There are two standard tests of statistical significance that should be reported for each RR. You will most likely have to obtain a copy of the actual study to verify statistical significance.

The first test of statistical significance concerns the p-value of the RR. The p-value will usually be found in the same table in the study as the RR. The p-value indicates how much confidence can be had that the RR is different from the no-effect level of 1.0.

By convention:

If a study does not present the p-values of its RRs, interpret this to mean that the researchers were too embarrassed to publish them because they would expose the RRs as not statistically significant, i.e., meaningless.

Step 4: Evaluating the relative risk: Statistical significance (Part 2 of 2)

The second test of statistical significance for an RR involves its confidence interval.

The confidence interval for an RR is typically found in the same place as the RR and its p-value. It is often indicated with the abbreviation C.I. or CI. The confidence interval is a range of values, typically indicated with parentheses) within which the true RR lies, according to a certain degree of confidence. The standard degree of confidence is 95%, designated as 95% C.I. For example, if the RR=0.50, and its confidence interval ranges from 0.4 to 0.6, then this may be designated as RR=0.50 (95%CI 0.40, 0.60).

The two numbers in the confidence interval are referred to as its lower bound and upper bound. In the above example, 0.40 is the lower bound and 0.60 is the upper bound.

In order for the RR of a clinical trial to pass the second test of statistical significance, the upper bound of the confidence interval must be less than 1.0. The reason for this is that if the confidence interval includes 1.0, then we cannot be sure with 95% confidence that the true value of the RR is less than 1.0. In other words we cannot safely exclude the possibility that the true RR isn't 1.0, the no-effect level. For example, if you see a RR and CI presented as RR=0.60 (95% CI 0.35, 1.05), the RR is not statistically significant because its upper bound (1.05) is greater than 1.0 and includes the no-effect level as well as a range (from 1.0+ to 1.05) indicating that the treatment may actually worsen disease.

So in the case of a clinical trial, if the upper bound of the confidence interval touches or crosses the 1.0 no effect level, then the RR is not statistically significant, i.e., it is meaningless and debunked.

Confidence intervals are be subject to game-playing by junk scientists. The width of the confidence interval may be adjusted by altering the confidence level. For example, a 90% confidence interval is narrower than a 95% confidence interval, while a 99% confidence interval is wider than a 95% confidence interval. This makes sense if you think about it. You will have less confidence in a narrower range and more confidence in a wide range. So here's the game.

Keeping in mind that the 1.0 or the no-effect level is the third rail of an epidemiology study and means instant death for an RR which confidence interval touches or crosses it, there is incentive to narrow the confidence interval so that the upper bound stays below 1.0. This may be accomplished by choosing to use a lower confidence level than the standard 95% level. This lower confidence level is typically 90%. So if you see a 90% confidence interval, this is a red flag that the researcher is trying to conceal the fact that his CI touches or crosses the no-effect level.

Step 5: Confidence interval width

Another tell-tale sign of a dubious relative risk is a wide confidence interval. The CI indicates the range of values within which the true RR likely falls. A “too-wide” confidence interval indicates that the underlying data are unusually disparate, including too many outliers and oddball data points. A “tight” confidence interval indicates more uniformity and less variance among the data.

How wide is too wide? As a rule of thumb, if a CI is wider than the magnitude of the RR, then you’re looking at goofy data. For example, assume you see a RR=20 (95% CI 5, 167). The width of the confidence interval (162) is many times the size of the size of the RR. A nice tight confidence interval would be, for example, something like RR=20 (95% CI 18, 22), where the width of the CI is only a fraction of the size of the RR.

Consider the width of the confidence interval as an indication of the “margin of error” concept used in polling. When the polling results between, say, two candidates are within the margin of error, the candidates are considered to be in a statistical dead heat. If the CI is wider than the RR is large then the RR is statistically dead.

Step 6: Non-empirical considerations

If a clinical trial is debunkable, then there's a good chance that you will have already debunked it by following the procedure described in Steps 1 to 4. If that empirical process didn't do it and there is something that still seems fishy about the study, here are a few qualitative issues to explore with a clinical trial:

  • Biological plausibility. This is an essential element of a cause-and-effect relationship. Researchers should be able to provide some sort a physiological explanation for the results, i.e., why their treatment worked/didn't work. But be aware of the limitations of biological plausibility. Marginal statistical results cannot be buffed into science even by the best physiological explanation, unless that explanation can also explain the marginal results.
  • Inclusion/exclusion shenanigans. One of the potential strengths of a clinical trial is the ability of researchers to virtually hand-pick study subjects. But as you can readily imagine, this ability can cut both for sound science and junk science. On the sound science side, hand-picking study subjects allows researchers to construct better matched treatment and placebo/control groups. On the junk science side, hand-picking study subjects allows unscrupulous researchers to select study and group subjects so as to bias the results — e.g., putting study subjects who are likely to respond more favorably to treatment in the treatment group and putting study subjects less likely to respond favorably to treatment in the control group. Ideally, study subjects in clinical trials should be assigned to the treatment and control groups randomly and so that neither the researcher nor any study subject knows to which group any particular study subject is assigned. Does this always happen? Doubtful. Researchers may also inexplicably drop certain study subjects from trials -- like those that are hurting the desired study result. Inclusion/exclusion shenanigans could be important factors in the case of borderline relative risk or confidence interval statistics. That is, how does the inclusion/exclusion of one or a few study subjects affect the results. Does it push the RR over the 0.50 threshold? Does it push the upper bound of the confidence interval over 1.0?
  • Other clinical trials. A hallmark of science is replication of results. Similar experiments should produce similar results. You should check to see if there are other clinical trials similar in purpose and scope to the one of interest. Compare the results of the trials. Are they consistent? If not, this should be a red flag — something is problematic, either the study of interest or the other study or studies. If it turns out to be a case of one study against many studies, don't necessarily assume that the one study is necessarily wrong. It could very well be that the many studies are wrong for a variety of reasons — say, for example, they've all associated with a single researcher or a single organization with reasons to cook the books.
  • Publication bias. It may be difficult to locate and compare prior clinical trials because of the phenomenon of publication bias — i.e., the tendency of journal to publish positive results and to ignore the rest. Past the tendency of the journals, researchers who don't like their negative results may not submit them for publication.
  • Fraud. Scientists are people, people can commit fraud, so scientists can commit fraud. It's sad, but true — fraud can never be discounted as an explanation for study results.

Return top the Step-by-step guide for debunking.

Return to the DEBUNKOsaurustrade; Main Page.

Personal tools