Posted on May 5, 2015 by Philipp Neubauer

History matching is a method for fitting complex computer simulation models to data, using likelihood free methods that are based on Bayesian theory of analysis of computer simulations (Kennedy and O’Hagan 2001). For more information on history matching itself, see Kennedy and O’Hagan (2001) and in the context of ecological and epidemiological simulations, see Andrianakis et al. (2015).

The intriguing prospect of history matching is that is can be used with complex simulation models that would be difficult to fit to data using likelihood based methods. For example, MCMC for long-running simulations would likely involve running the full simulation at every iteration to get a likelihood. In history matching, the simulation runs are replaced by statistical emulators based on Gaussian Processes, which can make the model fitting orders of magnitude faster.

This post represents a first try at using history matching for ecosystem models, and in the context of this project, for size-spectra. The history match is done using the community size-spectrum slope as data and figuring out which parameter combination most likely lead to the observed slope (here from a simulated truth)

History matching size-based models

Starting with mizer to get a community model up and running, see if I can emulate it using Gaussian processes, then use history matching to discard parameter values that provide poor matches to data. I'll closely follow (Andrianakis et al. 2015) in this first try.

truth <- set_community_model(max_w = 1e+06, min_w = 0.001,r_pp=10,
                    z0 = 0.1, alpha = 0.2, h = 10, beta = 100, sigma = 2,
                    q = 0.8, n = 0.8, kappa = 10000,
                    f0 = 0.7,  gamma = NA,
                    knife_edge_size = 1000)

sim <- project(truth, effort = 0, t_max = 20, dt=0.1)

data <- getCommunitySlope(sim)[20,]

Arbitrarily, I chose n, q, h and β as the uncertain parameters. This is entirely arbitrary at this stage; I just want to figure out how this works...

First, I need to define a hypercube to sample from and calibrate the Gaussian process that will (hopefully!) emulate the size spectrum outputs.

#Set prior search space
cube=6
dims=4

n=seq(2/3,0.9,l=cube)
q=seq(0.4,1,l=cube)
h = seq(3,20,l=cube)
beta = seq(50,1000,l=cube)

# do this in parallel
sim_data <- vector('list',cube^4)
a=0
# run hypercube
for(ns in n){
  for (qs in q){
    for (hs in h){
      for (bs in beta){
        a=a+1
        sim_data[[a]] <- c(ns,qs,hs,bs)
      }
    }
  }
}

simdat <- parallel::mclapply(sim_data, function(x) {
  truth <- set_community_model(max_w = 1e+06, min_w = 0.001,
                                     z0 = 0.1, alpha = 0.2, h = x[3],
                                     beta = x[4], sigma = 2,
                                     q = x[2], n = x[1], kappa = 10000,
                                     f0 = 0.7, r_pp = 10, gamma = NA,
                                     knife_edge_size = 1000)

  sim <- project(truth, effort = 0, t_max = 10, dt=0.1)
  as.matrix(getCommunitySlope(sim)[10,])
}, mc.cores=4)

simdat <- do.call('rbind',simdat)

So far so good, with those simulation in our pocket, we can go to the emulation part:

inputs <- do.call('rbind',sim_data)
outputs1 <- simdat[,1]
outputs2 <- simdat[,2]

Following Andrianakis et al. (2015) - estimate the length scale from data. Simulated annealing seems to work best here as the surface in 4 dimensions is multimodal and other optim method choices get stuck in suboptimal modes. Wonder if there's better alternatives (besides MCMC or numerical integration, which seem to defeat the purpose.)

Using Vernon et al. (2010), set the initial guess at a conservative 0.125 times the range of the inputs given the cubic regressor.

meanfunk <- function(x){
  out <- c(1,x,x^2,x^3)
  names(out) <- letters[1:length(x)]
  return(out)
}

# scale inputs to lie in [-1,1], makes things more numerically stable

ipmax <- apply(inputs,2,max)
ipmean <- colMeans(inputs)

scale <- function(x,ipmean,ipmax) (x-ipmean)/(ipmax-ipmean)
rescale <- function(x,ipmean,ipmax) x*(ipmax-ipmean)+ipmean

input <- t(apply(inputs, 1, scale, ipmean, ipmax))

scale_start <- 1/(0.125*as.vector(diff(apply(input,2,range))))^2

pars_w1_o1 <- optimal.scales(val=input,scale_start,func=meanfunk, as.matrix(outputs1),method="SANN",control=list(trace=TRUE,REPORT=10,maxit = 200,reltol=0.001))
## sann objective function values
## initial       value -2175.760416
## iter      100 value -2759.539373
## iter      199 value -2768.112450
## final         value -2768.112450
## sann stopped after 199 iterations
pars_w1_o2 <- optimal.scales(val=input,scale_start,func=meanfunk, as.matrix(outputs2),method="SANN",control=list(trace=TRUE,REPORT=10,maxit = 200,reltol=0.001))
## sann objective function values
## initial       value 322.631519
## iter      100 value -238.491252
## iter      199 value -241.687325
## final         value -241.687325
## sann stopped after 199 iterations
A_w1_o1 = corr.matrix(input,scales=pars_w1_o1);
Ainv_w1_o1 <- solve(A_w1_o1)

A_w1_o2 = corr.matrix(input,scales=pars_w1_o2);
Ainv_w1_o2 <- solve(A_w1_o2)

Wave 1

Next, need to predict to a whole lot of points and calculate implausibility. Using the emulator package for this:

cube_pred <- 10

preds <- data.frame(n=seq(2/3,0.9,l=cube_pred),
           q=seq(0.2,1,l=cube_pred),
           h = seq(3,20,l=cube_pred),
           beta = seq(50,1000,l=cube_pred))

predicts <- vector('list',cube_pred^4)
a=0
for(ns in preds$n){
  for (qs in preds$q){
    for (hs in preds$h){
      for (bs in preds$beta){
        a=a+1
        predicts[[a]] <- scale(c(ns,qs,hs,bs),ipmean,ipmax)
      }
    }
  }
}
   
predmu_w1_o1 <- parallel::mclapply(predicts,interpolant, outputs1, as.matrix(input),A=A_w1_o1,Ainv=Ainv_w1_o1,scales=pars_w1_o1,func=meanfunk,g=TRUE,mc.cores=4)
predmu_w1_o2 <- parallel::mclapply(predicts,interpolant, outputs2, as.matrix(input),A=A_w1_o1,Ainv=Ainv_w1_o1,scales=pars_w1_o2,func=meanfunk,g=TRUE,mc.cores=4)

predvar_w1_o1 <- unlist(lapply(predmu_w1_o1,function(x) x$Z))
pred_w1_o1 <- unlist(lapply(predmu_w1_o1,function(x) x$mstar.star))

predvar_w1_o2 <- unlist(lapply(predmu_w1_o2,function(x) x$Z))
pred_w1_o2 <- unlist(lapply(predmu_w1_o2,function(x) x$mstar.star))
      
pn <- cbind(pred_w1_o1,pred_w1_o2)
mse <- cbind(predvar_w1_o1,predvar_w1_o2)^2
modelvar <- c(0.1*var(outputs1),0.1*var(outputs2))

predictn <- data.frame(do.call('rbind',lapply(predicts,rescale,ipmean,ipmax)))

predictn$impl <- sapply(1:nrow(pn),function(x){ 
  imp <- as.matrix(data[,c('slope','intercept')] - pn[x,]) %*% solve(diag(mse[x,]+modelvar)) %*% t(as.matrix(data[,c('slope','intercept')] - pn[x,]))
  return(imp)
})

colnames(predictn) <- c('n','q','h','beta','Implausibility')

ggplot(predictn,aes(x=n,y=q,z=Implausibility)) + stat_summary2d(fun = function(x) mean(x>3))
ggplot(predictn,aes(x=n,y=beta,z=Implausibility)) + stat_summary2d(fun = function(x) mean(x>3))
ggplot(predictn,aes(x=h,y=beta,z=Implausibility)) + stat_summary2d(fun = function(x) mean(x>3))
keepers <- predictn %>% filter(Implausibility<3) %>% select(-Implausibility)

In this wave, 8% of points of the hypercube remain undiscarted.

Now draw from MYN distribution around keepers (or subset thereof), choose variance to have ~ 20% rejected. Those are the starting points for the next wave of history matching.

repmat = function(X,m,n){
##R equivalent of repmat (matlab)
  X<- as.matrix(X)
  mx = dim(X)[1]
  nx = dim(X)[2]
  matrix(t(matrix(X,mx,nx*n)),mx*m,nx*n,byrow=T)
}

#draw 12000 points total
oc <- round(15000/nrow(keepers))

keepers_sc <- t(apply(keepers,1,scale,ipmean,ipmax))
isd <- apply(keepers_sc,2,sd)
new_points_sim <- t(apply(repmat(keepers_sc,oc,1),1,function(x) t(mvtnorm::rmvnorm(1,x,diag(rep(0.01*isd))))))

d=ncol(keepers)
apply.range <- apply(sapply(1:d,function(x) new_points_sim[,x]>=-1 & new_points_sim[,x]<=1),1,all)

new_points_sim <- new_points_sim[apply.range,]
dim(new_points_sim)
## [1] 8302    4
ss <- sample(1:nrow(new_points_sim),1000,replace = F)
new_points_sim_sub <- new_points_sim[ss,]

new_points_sim_sub_list <- split(new_points_sim_sub,1:nrow(new_points_sim_sub))

#test output1
predmu_w1test_o1 <- parallel::mclapply(new_points_sim_sub_list,interpolant, outputs1, as.matrix(input),A=A_w1_o1,Ainv=Ainv_w1_o1,scales=pars_w1_o1,func=meanfunk,g=TRUE,mc.cores=4)
#test output2
predmu_w1test_o2 <- parallel::mclapply(new_points_sim_sub_list,interpolant, outputs2, as.matrix(input),A=A_w1_o2,Ainv=Ainv_w1_o2,scales=pars_w1_o2,func=meanfunk,g=TRUE,mc.cores=4)

predvar_w1test_o1 <- unlist(lapply(predmu_w1test_o1,function(x) x$Z))
pred_w1test_o1 <- unlist(lapply(predmu_w1test_o1,function(x) x$mstar.star))

predvar_w1test_o2 <- unlist(lapply(predmu_w1test_o2,function(x) x$Z))
pred_w1test_o2 <- unlist(lapply(predmu_w1test_o2,function(x) x$mstar.star))
        
pn <- cbind(pred_w1test_o1,pred_w1test_o2)
mse <- cbind(predvar_w1test_o1,predvar_w1test_o2)^2
modelvar <- c(0.1*var(outputs1),0.1*var(outputs2))

test_impl <- sapply(1:nrow(pn),function(x){ 
  imp <- as.matrix(data[,c('slope','intercept')] - pn[x,]) %*% solve(diag(mse[x,]+modelvar)) %*% t(as.matrix(data[,c('slope','intercept')] - pn[x,]))
  return(imp)
})


mean(test_impl>3)
## [1] 0.211

About 21 percent rejected at this try. Now re-run simulator and GP emulation & history matching for the next wave...

Wave 2

sim_in_w2 <- t(apply(new_points_sim_sub, 1, rescale, ipmean, ipmax))
sim_in_w2_list <- split(sim_in_w2,1:nrow(sim_in_w2))

sim_data_w2 <- mclapply(sim_in_w2_list,function(x){
  truth <- set_community_model(max_w = 1e+06, min_w = 0.001,
                               z0 = 0.1, alpha = 0.2, h = x[3],
                               beta = x[4], sigma = 2,
                               q = x[2], n = x[1], kappa = 10000,
                               f0 = 0.7, r_pp = 10, gamma = NA,
                               knife_edge_size = 1000)
  
  sim <- project(truth, effort = 0, t_max = 10, dt=0.1)
  sims <- as.matrix(getCommunitySlope(sim)[10,])
  return(sims)
},mc.cores=4)

w2_data <- do.call('rbind',sim_data_w2)
outputs1_w2 <- w2_data[,1]
outputs2_w2 <- w2_data[,2]  


pars_w2_o1 <- optimal.scales(val=new_points_sim_sub, pars_w1_o1,func=meanfunk, as.matrix(outputs1_w2),method="SANN",control=list(trace=TRUE,REPORT=10,maxit = 200,reltol=0.001))
## sann objective function values
## initial       value -1218.120614
## iter      100 value -2205.707784
## iter      199 value -2205.707784
## final         value -2205.707784
## sann stopped after 199 iterations
pars_w2_o2 <- optimal.scales(val=new_points_sim_sub, pars_w1_o2,func=meanfunk, as.matrix(outputs2_w2),method="SANN",control=list(trace=TRUE,REPORT=10,maxit = 200,reltol=0.001))
## sann objective function values
## initial       value 918.677263
## iter      100 value -247.200427
## iter      199 value -247.200427
## final         value -247.200427
## sann stopped after 199 iterations
A_w2_o1 = corr.matrix(new_points_sim_sub,scales=pars_w2_o1);
Ainv_w2_o1 <- solve(A_w2_o1)

A_w2_o2 = corr.matrix(new_points_sim_sub,scales=pars_w2_o2);
Ainv_w2_o2 <- solve(A_w2_o2)

I can now use the new_points_sim object from above to use reasonable values to do predictions

new_points_sim_list <- split(new_points_sim[-ss,],1:nrow(new_points_sim[-ss,]))

predmu_w2_o1 <- parallel::mclapply(new_points_sim_list,interpolant,outputs1_w2, new_points_sim_sub,A=A_w2_o1,Ainv=Ainv_w2_o1,scales=pars_w2_o1,func=meanfunk,g=TRUE,mc.cores=4)
predmu_w2_o2 <- parallel::mclapply(new_points_sim_list,interpolant,outputs2_w2, new_points_sim_sub,A=A_w2_o1,Ainv=Ainv_w2_o1,scales=pars_w2_o2,func=meanfunk,g=TRUE,mc.cores=4)

predvar_w2_o1 <- unlist(lapply(predmu_w2_o1,function(x) x$Z))
pred_w2_o1 <- unlist(lapply(predmu_w2_o1,function(x) x$mstar.star))

predvar_w2_o2 <- unlist(lapply(predmu_w2_o2,function(x) x$Z))
pred_w2_o2 <- unlist(lapply(predmu_w2_o2,function(x) x$mstar.star))
        
pn <- cbind(pred_w2_o1,pred_w2_o2)
mse <- cbind(predvar_w2_o1,predvar_w2_o2)^2
modelvar <- c(0.1*var(outputs1_w2),0.1*var(outputs2_w2))

predictn <- data.frame(do.call('rbind',lapply(new_points_sim_list,rescale,ipmean,ipmax)))

predictn$impl <- sapply(1:nrow(pn),function(x){ 
  imp <- as.matrix(data[,c('slope','intercept')] - pn[x,]) %*% solve(diag(mse[x,]+modelvar)) %*% t(as.matrix(data[,c('slope','intercept')] - pn[x,]))
  return(imp)
})

colnames(predictn) <- c('n','q','h','beta','Implausibility')

require(ggplot2)

ggplot(predictn,aes(x=n,y=q,z=Implausibility)) + stat_summary2d(fun = function(x) mean(x>3))
ggplot(predictn,aes(x=n,y=beta,z=Implausibility)) + stat_summary2d(fun = function(x) mean(x>3))
hist(keepers$n)
hist(keepers$q)
hist(keepers$beta)
keepers_w2 <- predictn %>% filter(Implausibility<3) %>% select(-Implausibility)
oc <- round(15000/nrow(keepers_w2))
keepers_sc <- t(apply(keepers_w2,1,scale,ipmean,ipmax))
isd <- apply(keepers_sc,2,sd)
new_points_sim_w2 <- t(apply(repmat(keepers_sc,oc,1),1,function(x) t(mvtnorm::rmvnorm(1,x,diag(rep(0.01*isd))))))

apply.range <- apply(sapply(1:d,function(x) new_points_sim_w2[,x]>-1 & new_points_sim_w2[,x]<1),1,all)

new_points_sim_w2 <- new_points_sim_w2[apply.range,]

ss <- sample(1:nrow(new_points_sim_w2),1000,replace = F)
new_points_sim_sub_w2 <- new_points_sim_w2[ss,]

new_points_sim_sub_list_w2 <- split(new_points_sim_sub_w2 ,1:nrow(new_points_sim_sub_w2 ))

predmu_w2test_o1 <- parallel::mclapply(new_points_sim_sub_list_w2,interpolant, outputs1_w2, new_points_sim_sub,A=A_w2_o1,Ainv=Ainv_w2_o1,scales=pars_w2_o1,g=TRUE,mc.cores=4)
predmu_w2test_o2 <- parallel::mclapply(new_points_sim_sub_list_w2,interpolant, outputs2_w2, new_points_sim_sub,A=A_w2_o2,Ainv=Ainv_w2_o2,scales=pars_w2_o2,g=TRUE,mc.cores=4)

predvar_w2test_o1 <- unlist(lapply(predmu_w2test_o1,function(x) x$Z))
pred_w2test_o1 <- unlist(lapply(predmu_w2test_o1,function(x) x$mstar.star))

predvar_w2test_o2 <- unlist(lapply(predmu_w2test_o2,function(x) x$Z))
pred_w2test_o2 <- unlist(lapply(predmu_w2test_o2,function(x) x$mstar.star))
        
pn <- cbind(pred_w2test_o1,pred_w2test_o2)
mse <- cbind(predvar_w2test_o1,predvar_w2test_o2)^2
modelvar <- c(0.1*var(outputs1),0.1*var(outputs2))

test_impl <- sapply(1:nrow(pn),function(x){ 
  imp <- as.matrix(data[,c('slope','intercept')] - pn[x,]) %*% solve(diag(mse[x,]+modelvar)) %*% t(as.matrix(data[,c('slope','intercept')] - pn[x,]))
  return(imp)
})

mean(test_impl>3)
## [1] 0.068

This looks good, about 7 percent rejected at this try. Again, re-run simulator and GP for the next wave...

Wave 3

sim_in_w3 <- t(apply(new_points_sim_sub_w2, 1, rescale, ipmean, ipmax))
sim_in_w3_list <- split(sim_in_w3,1:nrow(sim_in_w3))
new_points_sim_sub_w3 <- new_points_sim_sub_w2

sim_data_w3 <- mclapply(sim_in_w3_list,function(x){
  truth <- set_community_model(max_w = 1e+06, min_w = 0.001,
                               z0 = 0.1, alpha = 0.2, h = x[3],
                               beta = x[4], sigma = 2,
                               q = x[2], n = x[1], kappa = 10000,
                               f0 = 0.7, r_pp = 10, gamma = NA,
                               knife_edge_size = 1000)
  
  sim <- project(truth, effort = 0, t_max = 10, dt=0.1)
  sims <- as.matrix(getCommunitySlope(sim)[10,])
  return(sims)
}, mc.cores = 4)

w3_data <- do.call('rbind',sim_data_w3)
outputs1_w3 <- w3_data[,1]
outputs2_w3 <- w3_data[,2]  


pars_w3_o1 <- optimal.scales(val=new_points_sim_sub_w3, pars_w2_o1,func = meanfunk, as.matrix(outputs1_w3),method="SANN",control=list(trace=TRUE,REPORT=10,maxit = 200,reltol=0.001))
## sann objective function values
## initial       value -2387.226678
## iter      100 value -2508.817026
## iter      199 value -2509.972328
## final         value -2509.972328
## sann stopped after 199 iterations
pars_w3_o2 <- optimal.scales(val=new_points_sim_sub_w3, pars_w2_o2,func = meanfunk, as.matrix(outputs2_w3),method="SANN",control=list(trace=TRUE,REPORT=10,maxit = 200,reltol=0.001))
## sann objective function values
## initial       value -472.383982
## iter      100 value -500.889419
## iter      199 value -502.824945
## final         value -502.824945
## sann stopped after 199 iterations
A_w3_o1 = corr.matrix(new_points_sim_sub_w3,scales=pars_w3_o1);
Ainv_w3_o1 <- solve(A_w3_o1)

A_w3_o2 = corr.matrix(new_points_sim_sub_w3,scales=pars_w3_o2);
Ainv_w3_o2 <- solve(A_w3_o2)

Again, use the new_points_sim object from above to use reasonable values to do predictions

new_points_sim_list_w3 <- split(new_points_sim_w2[-ss,],1:nrow(new_points_sim_w2[-ss,]))

predmu_w3_o1 <- parallel::mclapply(new_points_sim_list_w3,interpolant, outputs1_w3, new_points_sim_sub_w3,A=A_w3_o1,Ainv=Ainv_w3_o1,scales=pars_w3_o1,g=TRUE,mc.cores=4)
predmu_w3_o2 <- parallel::mclapply(new_points_sim_list_w3,interpolant, outputs2_w3, new_points_sim_sub_w3,A=A_w3_o2,Ainv=Ainv_w3_o2,scales=pars_w3_o2,g=TRUE,mc.cores=4)

predvar_w3_o1 <- unlist(lapply(predmu_w3_o1,function(x) x$Z))
pred_w3_o1 <- unlist(lapply(predmu_w3_o1,function(x) x$mstar.star))

predvar_w3_o2 <- unlist(lapply(predmu_w3_o2,function(x) x$Z))
pred_w3_o2 <- unlist(lapply(predmu_w3_o2,function(x) x$mstar.star))
        
pn <- cbind(pred_w3_o1,pred_w3_o2)
mse <- cbind(predvar_w3_o1,predvar_w3_o2)^2
modelvar <- c(0.1*var(outputs1_w3),0.1*var(outputs2_w3))

predictn <- data.frame(do.call('rbind',lapply(new_points_sim_list_w3,rescale,ipmean,ipmax)))

predictn$impl <- sapply(1:nrow(pn),function(x){ 
  imp <- as.matrix(data[,c('slope','intercept')] - pn[x,]) %*% solve(diag(mse[x,]+modelvar)) %*% t(as.matrix(data[,c('slope','intercept')] - pn[x,]))
  return(imp)
})

colnames(predictn) <- c('n','q','h','beta','Implausibility')

require(ggplot2)

ggplot(predictn,aes(x=n,y=q,z=Implausibility)) + stat_summary2d(fun = function(x) mean(x>3))
ggplot(predictn,aes(x=n,y=beta,z=Implausibility)) + stat_summary2d(fun = function(x) mean(x>3))
ggplot(predictn,aes(x=h,y=beta,z=Implausibility)) + stat_summary2d(fun = function(x) mean(x>3))
keepers_w3 <- predictn %>% filter(Implausibility<3) %>% select(-Implausibility)

nrow(keepers_w3)/length(new_points_sim_list_w3)
## [1] 0.3219766

I could probably keep going here, but will leave it at this for a trail - 32% of points remaining. How do the remaining parameter estiamtes stack up against the first round (previous histograms)?

hist(keepers_w3$n)
hist(keepers_w3$q)
hist(keepers_w3$beta)
hist(keepers_w3$h)

It looks as though the history matching got us quite a bit closer to the true values,although the distributions for β and q are peaked towards the extremes, whereas the distribution of n is peaked near the true value of 0.8. h seems poorly constrained by the slope, so I might need to think of other metrics to constrain h. But this could indicate that for the purpose of a community spectrum, the value of h is less important.

I should really have a look at the predicted spectrum slopes from the GPs vs the real slopes to see how well the GP is doing at predicting the slope. That'll be next time...

References


Andrianakis, I., I. R. Vernon, N. McCreesh, T. J. McKinley, J. E. Oakley, R. N. Nsubuga, M. Goldstein, and R. G. White. 2015. Bayesian History Matching of Complex Infectious Disease Models Using Emulation: A Tutorial and a Case Study on HIV in Uganda. PLoS Comput Biol 11:e1003968.

Kennedy, M. C., and A. O’Hagan. 2001. Bayesian calibration of computer models. Journal of the Royal Statistical Society. Series B, Statistical Methodology:425–464.

Vernon, I., M. Goldstein, R. G. Bower, and others. 2010. Galaxy formation: A bayesian uncertainty analysis. Bayesian Analysis 5:619–669.