How do you communicate the evidence for the practical significance of an intervention using Bayesian methods?

I was listening to Frank Harrell on the Plenary Sessions podcast talk about Bayesian methods applied to clinical trials. I’d recommend anyone interested in or considering applying Bayesian methods listen to the episode.

One thing I liked was their discussion on communicating the evidence for practical significance in a transparent way. They were talking about how a paper they had read had a very good example of this. The episode doesn’t have notes so I did not see the paper.¹ But I understood the basic idea to be that you can calculate and report the probability that the treatment effect exceeds all relevant values of practical significance.

I’ll go through an example to demonstrate this. The data are Kruschke’s hypothetical IQ data from his BEST paper.² The green points are from the control group (placebo) and the red points are from the treatment group who received a “smart” drug:

Kruschke performed a Bayesian analysis, assumed the data were t-distributed which accommodated outliers in the data and permitted the data to be heteroskedastic by group membership. The results suggested a very high probability that the treatment effect exceeded zero. Following Harrell’s advice, one could go further and produce the following graph:

Given this graph, the probability that the treatment improved IQ by more than 1 point was barely over 50%. If we consider a 3-point IQ increase the minimum that is worth paying attention to, then the probability that the smart drug improved IQ points by a value that was of importance was 0. 2 points? Barely above 0. If we set ourselves the low bar of any kind of improvement i.e. more than 0 points, the probability that the smart drug worked was almost 100%.

If researchers stop at the low bar of any kind of improvement, the almost 100% probability of efficacy is very impressive. However, consider any useful level of practical significance and the probability of practical significance plummets. Researchers can produce this type of graph for any effect or relation that is of interest, and the graph provides a transparent way for readers to assess the effect/relation. Some readers have high standards, others have low standards. Given sufficient familiarity with the scale of the effect size, different readers can come to different evaluations given the transparent reporting.

I did not run Kruschke’s exact model. The model I ran was:

$\mathbf y \sim t\big(\nu, \beta_0 + \beta_1\times\mathbf x, \exp(\theta_0 + \theta_1\times \mathbf x)\big)$

$\nu \sim \mathrm{gamma}(2, 0.1),\quad \beta_0 \sim \mathrm{Cauchy}(0, 5),\quad \beta_1 \sim \mathcal{N}(0, 15 / 1.96)$

$\theta_0 \sim t(3, 0, 1),\quad \theta_1\sim \mathcal{N}(0, \ln(3) / 1.96)$

where $\mathbf y$ was the outcome variable and $\mathbf x$ was the treatment indicator. And I calculated the probability that $\beta_1>\text{effect}$ for different values of the effect using the posterior values of $\beta_1$ .

The code is available here:

	data {
	real<lower = 0> sd_m;
	real<lower = 0> sd_m_diff;
	real<lower = 0> sd_st;
	real<lower = 0> sd_st_r;
	int<lower = 0, upper = 1> nu_choice;
	int<lower = 0> N;
	vector<lower = 0, upper = 1>[N] x;
	vector[N] y;
	}
	transformed data {
	real n1 = sum(x);
	real n0 = N - n1;
	}
	parameters {
	real m0;
	real m_diff;
	real ln_st0;
	real ln_st_ratio;
	real<lower = 0> nu;
	}
	transformed parameters {
	vector[N] mu = m0 + x * m_diff;
	vector[N] sigma = exp(ln_st0 + x * ln_st_ratio);
	}
	model {
	m0 ~ cauchy(0, sd_m);
	m_diff ~ normal(0, sd_m_diff);
	ln_st0 ~ student_t(3, 0, sd_st);
	ln_st_ratio ~ normal(0, sd_st_r);
	if (nu_choice == 0) nu ~ gamma(2, 0.1);
	else if (nu_choice == 1) nu ~ exponential(1.0 / 29);

	y ~ student_t(nu + nu_choice, mu, sigma);
	}
	generated quantities {
	vector[N] log_lik;
	vector[N] yhat;
	real m1 = m0 + m_diff;
	real <lower = 0> st0 = exp(ln_st0);
	real <lower = 0> st1 = exp(ln_st0 + ln_st_ratio);
	real<lower = 0> st_ratio = exp(ln_st_ratio);

	for (i in 1:N) {
	yhat[i] = student_t_rng(nu + nu_choice, mu[i], sigma[i]);
	log_lik[i] = student_t_lpdf(y[i] \| nu + nu_choice, mu[i], sigma[i]);
	}
	}

view raw dmdv.stan hosted with ❤ by GitHub

	library(rstan)

	options(mc.cores = parallel::detectCores())
	rstan_options(auto_write = TRUE)

	# Uncomment next line and create stan_scripts folder to recreate this
	# saveRDS(stan_model(stanc_ret = stanc(file = "stan_scripts/dmdv.stan")), "stan_scripts/dmdv.rds")
	dmdv <- readRDS("stan_scripts/dmdv.rds")

	dat <- data.frame(
	y = c(
	c(101, 100, 102, 104, 102, 97, 105, 105, 98, 101, 100, 123, 105, 103, 100, 95, 102, 106,
	109, 102, 82, 102, 100, 102, 102, 101, 102, 102, 103, 103, 97, 97, 103, 101, 97, 104,
	96, 103, 124, 101, 101, 100, 101, 101, 104, 100, 101),
	c(99, 101, 100, 101, 102, 100, 97, 101, 104, 101, 102, 102, 100, 105, 88, 101, 100,
	104, 100, 100, 100, 101, 102, 103, 97, 101, 101, 100, 101, 99, 101, 100, 100,
	101, 100, 99, 101, 100, 102, 99, 100, 99)
	), x = c(rep(1, 47), rep(0, 42))
	)

	library(dplyr)
	library(ggplot2)
	library(scales)
	theme_set(theme_classic())

	image1 <- dat %>%
	mutate(x = factor(x)) %>%
	ggplot(aes(x, y, col = x)) +
	scale_colour_manual(values = c("#00BFC4", "#F8766D")) +
	geom_jitter(height = 0, shape = 1, width = .3) +
	labs(x = "Group assignment", y = "IQ scores", tag = "A") +
	theme(axis.text.y = element_blank(), axis.ticks.y = element_blank(),
	axis.title.y = element_blank()) +
	scale_y_continuous(breaks = c(100, 105, sort(dat$y)[c(1, 2, 3, 87, 89)])) +
	guides(color = FALSE) + coord_flip()
	image1

	stan.dat <- list(
	sd_m = 5, sd_m_diff = 15 / qnorm(.975), sd_st = 1, sd_st_r = log(3) / qnorm(.975),
	nu_choice = 0, N = length(dat$y), y = dat$y, x = dat$x)

	fit.dmdv <- sampling(dmdv, data = stan.dat, seed = 123)
	print(fit.dmdv, pars = c("m0", "m1", "m_diff", "st0", "st1", "st_ratio", "nu"),
	probs = c(.025, .1, .5, .9, .975), digits_summary = 3)
	mdiff.df <- as.data.frame(fit.dmdv, "m_diff")
	effect.tab <- cbind(level = seq(0, 5, .01),
	efficacy = sapply(seq(0, 5, .01), function (x) mean(mdiff.df$m_diff > x))) %>%
	as.data.frame()
	effect.tab %>% ggplot(aes(level, efficacy)) + geom_line() +
	scale_y_continuous(labels = percent_format()) + theme_bw() +
	labs(x = "Practically significant treatment effect", y = "",
	subtitle = "Probability that treatment effect is practically significant")

view raw file_0.R hosted with ❤ by GitHub

I later found the paper after writing this blog post, see Figure 3 in the paper: https://doi.org/10.1136/bmjopen-2018-024256 ↩︎
Kruschke, J. K. (2013). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General, 142(2), 573–603. https://doi.org/10.1037/a0029146 ↩︎

How do you communicate the evidence for the practical significance of an intervention using Bayesian methods?

James Uanhoro

2019/09/26