The split-plot design is helpful when we want to determine the effect of one (or more than one) treatment over time and the same measure is repeated on a unit at different times, as we will show here.

We can consider the unit as the whole plot, while the measurements taken at different times as the sub-plots.

Here we will show how to perform this kind of analysis with a dataset coming from the (already cited) Design and Analysis of Experiments with R textbook by John Lawson. The dataset can be found at this link.

import pandas as pd
import seaborn as sns
import numpy as np
import pymc as pm
import arviz as az
import bambi as bmb
from matplotlib import pyplot as plt

rng = np.random.default_rng(42)

kwargs = {'nuts_sampler': 'numpyro', 'random_seed': rng, 'draws': 2000, 'tune': 2000, 'chains': 4, 'target_accept': 0.95,
         'idata_kwargs': dict(log_likelihood = True)}

df = pd.read_csv('/home/stippe/Downloads/dairy.csv')

df.rename(columns={'Unnamed: 0': 'cow'}, inplace=True)

df.head()
  cow Diet pr1 pr2 pr3 pr4
0 1 Barley 3.63 3.57 3.47 3.65
1 2 Barley 3.24 3.25 3.29 3.09
2 3 Barley 3.98 3.6 3.43 3.3
3 4 Barley 3.66 3.5 3.05 2.9
4 5 Barley 4.34 3.76 3.68 3.51

Here the first column represents the cow, the second column the diet, while the remaining four columns represent the protein percentage of the milk measured at four different weeks.

This format is not useful, so it’s better to melt the above dataset

df_melt = df.melt(value_vars=['pr1', 'pr1', 'pr3', 'pr4'],
                  var_name='week', value_name='protein', id_vars=['cow', 'Diet'])

df_melt['x'] = df_melt['week'].str[2].astype(int)

df_melt.head()
  cow Diet week protein x
0 1 Barley pr1 3.63 1
1 2 Barley pr1 3.24 1
2 3 Barley pr1 3.98 1
3 4 Barley pr1 3.66 1
4 5 Barley pr1 4.34 1
fig, ax = plt.subplots()
sns.lineplot(df_melt, y='protein', x='x', hue='Diet', ax=ax)
ax.set_xlim([1, 4])
fig.tight_layout()

There is a clear time dependence, and it also looks like it’s possibly non-linear. From what we said before, it’s straightforward to implement the split-plot design to this dataset.

categorical = ['cow','Diet','week']

model = bmb.Model('protein ~ Diet*week + (1 | cow:Diet)',
                  data=df_melt, categorical=categorical)

idata = model.fit(**kwargs)

The trace of the split-plot model

fig, ax = plt.subplots()
az.plot_forest(idata, var_names=['1|cow:Diet_sigma', 'Diet'], ax=ax)
fig.tight_layout()

From the above figure we see that the diet effect is less than the cow variability, so changing the diet would have an impact which has a negligible practical impact.

We may also ask what’s the impact of discretizing a continuous variables as the time, and this is generally considered suboptimal. We can however easily implement the continuous-time version of the above model. We will use a diet level slope, and an overall quadratic effect to account for possible non-linearity.

model_cont = bmb.Model('protein ~1 + Diet*x + I(x**2) + (1 | cow:Diet)',
                       data=df_melt, categorical=categorical)

idata_cont = model_cont.fit(**kwargs)

az.plot_trace(idata_cont)
fig = plt.gcf()
fig.tight_layout()

The trace of the continuous model

Let us now compare the two models

df_comp = az.compare({'discrete': idata, 'continuous': idata_cont})

df_comp
  rank elpd_loo p_loo elpd_diff weight se dse warning scale
continuous 0 -16.2897 94.8413 0 1 13.8243 0 False log
discrete 1 -20.4463 98.4776 4.15654 1.73195e-14 14.0241 1.77448 True log
fig, ax = plt.subplots()
az.plot_compare(df_comp, ax=ax)
fig.tight_layout()

The model comparison

The continuous model is slightly better than the discrete one.

With Bambi we can easily plot the posterior predictive distribution

fig, ax = plt.subplots()
bmb.interpret.plot_predictions(model=model_cont, idata=idata_cont, average_by=['x','Diet'], ax=ax)
ax.set_xlim([1, 4])
fig.tight_layout()

The posterior predictive distribution
of the continuous model

Conclusions

We have seen how we can use mixed effect models to model repeated measurements in a completely randomized design. We have also seen how to implement models with continuous variables by using Bambi.


%load_ext watermark
%watermark -n -u -v -iv -w -p xarray,numpyro,jax,jaxlib
Last updated: Fri Feb 28 2025

Python implementation: CPython
Python version : 3.12.8
IPython version : 8.31.0

xarray : 2024.11.0
numpyro: 0.16.1
jax : 0.4.38
jaxlib : 0.4.38

bambi : 0.15.0
pandas : 2.2.3
arviz : 0.20.0
seaborn : 0.13.2
matplotlib: 3.10.0
numpy : 1.26.4
pymc : 5.19.1

Watermark: 2.5.0