The situation is as follows:

- We are running a large model that takes quite a long time to run. It takes too long to realistically use the the approach in the “leave-future-out” vignette
- It includes a GP whose kernel has two parameters. We supply them as
`data`

. One of them has a fairly substantial impact on the predictions and inferences. (Unfortunately this kernel is not one of those that can take advantage of the basis function approach)

Ideally we would model the kernel parameters directly. Unfortunately this is not realistic for the number of timesteps we have (even with the new ability to move the cholesky decomposition to the GPU)

What are thinking of doing is the following:

- Run several models in parallel with different kernel parameters
- For each model, run a second version with a 90 day holdout period
- Once the models are done fitting, take the
`elpd`

of the 90 day holdout period (by averaging the log density across draws for each timestep) - Use
`loo::stacking_weights`

to obtain model averaging weights - Weight the draws of the non-heldout models by these weights
- Resample to get a weighted set of draws

is this… okay? Any watchouts with this approach? Any recommendations?