R formulas

Notes for figuring out how to read/write those R formulas used in lmer and brms.


Summary

I needed to understand a Bayesian mixed-effect model and write about it. It was time to figure out how those R formulas work.

The evolution of those formulas (Under construction)

  1. Wilkinson-Rogers syntax. The original? most basic? kind of formula. As seen in lm(y ~ x + z).

https://stats.stackexchange.com/questions/284913/the-origin-of-the-wilkinson-style-notation-such-as-1id-for-random-effects-in

Wilkinson-Rogers-Pinheiro-Bates https://twitter.com/BatesDmbates/status/1111283948615802881

throw Burkner on the end there for the brms extensions to it

brms accepts formulas in the form response | aterms ~ pterms + (gterms | group).

Formula components

  • aterms: special brms syntax, like using trial(5) to sneak in the value of \(n=5\) in \(Binomial(n, p)\).
  • pterms: population-level terms.
  • gterms: group-level terms.
  • group: grouping factor.
  • 1: intercept, often omitted when there’s another predictor. So (1+x|g) is the same as (x|g).

TODO: confusing frequentist language below:

Group-level terms

There can be multiple group-level effects gterms and multiple grouping factors group.

  • (1 | random): random group intercepts only
  • (0 + fixed | random) = (-1 + fixed | random): random slopes only
  • (0 + fixed | random) + (1 | random): random slopes and intercepts, independent. Alternatively: (fixed || random).
  • (1 + fixed | random) = (fixed | random): correlated random slopes and intercepts, see below for the correlation coefficients.
  • (1 + gterms |<ID>| group): according to brms documentation, if there are multiple group-level terms gterms for the same factor group, we can choose to model the correlations among those gterms. This is possible for when those group-level terms are in different forumlas, say in a non-linear model. |<ID>| tags the grouping factor and all gterms with the same ID will be modeled as correlated.
    • Example: if the model says (1+x|2|g) and (1+z|2|g), brms will estimate the correlation between the group-level effects of x and z.

Incremental examples

Note: s stands for subject/grouping parameter

1+(1|s)

  • Not a mixed-effects model
  • Estimates two parameters, \(\beta_0\) and \(\beta_1\)
  • \(Y_{si} = \beta_0 + \beta_1 X_i + e_{si}\)

x+(1|s)

  • Random intercepts, fixed slopes
  • Estimates three parameters:
    1. A global intercept, \(\beta_0\)
    2. A global estimate for the fixed effect (slope) of x, \(\beta_1\)
    3. How much the intercepts for levels in s deviate from the global intercept, \(\sigma_{00}\).
  • \(Y_{si} = (\beta_0 + S_{0s}) + \beta_1 X_i + e_{si}\)
    • \(S_{0s} \sim \mathcal{N}(0, \sigma_{00}^2)\)

x+(0+x|s)+(1|s)

  • Independent random slopes and intercepts
  • Estmiates four paramters:
    1. How much the slopes for levels in s deviate from the global slope \(\beta_1\), \(\sigma_{11}\). Or how much the effect of x within each level of s deviates from the global effect of x.
  • Note that \(S_{0s}\) and \(S_{1s}\) are drawn from two independent distributions with different variances.
  • \(Y_{si} = (\beta_0 + S_{0s}) + (\beta_1 + S_{1s}) X_i + e_{si}\)
    • \(S_{0s} \sim \mathcal{N}(0, \sigma_{00}^2)\)
    • \(S_{1s} \sim \mathcal{N}(0, \sigma_{11}^2)\)

x+(1+x|s)

  • Correlated random slopes and intercepts
  • Estmiates five paramters:
    1. Correlation between intercepts \(S_{0s}\) and slopes \(S_{1s}\), \(\sigma_{01}\).
  • \(Y_{si} = (\beta_0 + S_{0s}) + (\beta_1 + S_{1s}) X_i + e_{si}\). Same as above, but
    • \(\begin{pmatrix}S_{0s} \\ S_{1s}\end{pmatrix} \sim \mathcal{N}\left( \begin{pmatrix} 0 \\ 0 \end{pmatrix},\begin{pmatrix} \sigma_{00}^2 & \sigma_{01}\\ \sigma_{01} & \sigma_{11}^2 \end{pmatrix}\right)\)

Understanding x+(1+x|ID|g)

library(stats)
library(rstan)
library(brms)
library(tidyverse)
library(knitr)
library(kableExtra)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

The basic distributional model

We can start with a simple distributional model with a normally distributed response variable. The mean \(\mu\) is estimated with the formula x + (x |group). The variance parameter \(\sigma\) is assumed to be the same across observations. The brms model output summarizes which parameters get estimated:

# m <- brm(
#   brmsformula(y ~ x + (x |group), family = gaussian()),
#   data = df,  control = list(adapt_delta = .99),
#   prior = priors1)
m1
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: y ~ x + (x | group) 
##    Data: df (Number of observations: 30) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~group (Number of levels: 3) 
##                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)        0.12      0.27     0.00     0.70 1.00      617      549
## sd(x)                0.09      0.15     0.00     0.55 1.01      640      503
## cor(Intercept,x)    -0.04      0.62    -0.97     0.96 1.00     1806     1590
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.02      0.10    -0.16     0.28 1.00      823      449
## x             1.01      0.08     0.87     1.18 1.01      626      330
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.12      0.02     0.09     0.16 1.00     2010     1284
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

There are five parameters for the group-level and population-level combined, as we have seen above. The sixth paramter is the \(\sigma\) for the normal distribution.

With a submodel for variance

Sometimes we are interested in the variance \(\sigma\) across groups. We add a submodel for \(\sigma\), x + (x|group), just like the one for \(\mu\) (the Family Specific Parameters in m1).

Independent group-level covariances

# f2 <- brmsformula(
#   y ~ x + (x|group),
#   sigma ~ x + (x|group),
#   family = gaussian())
# ...
m2
##  Family: gaussian 
##   Links: mu = identity; sigma = log 
## Formula: y ~ x + (x | group) 
##          sigma ~ x + (x | group)
##    Data: df (Number of observations: 30) 
## Samples: 1 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 1000
## 
## Group-Level Effects: 
## ~group (Number of levels: 3) 
##                              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)                    0.07      0.12     0.00     0.44 1.01      384      541
## sd(x)                            0.08      0.14     0.00     0.50 1.00      261      204
## sd(sigma_Intercept)              1.24      1.74     0.05     6.38 1.00      250      245
## sd(sigma_x)                      1.04      1.51     0.02     5.85 1.02      177      188
## cor(Intercept,x)                -0.02      0.61    -0.97     0.97 1.00      729      463
## cor(sigma_Intercept,sigma_x)    -0.02      0.60    -0.96     0.97 1.01      524      505
## 
## Population-Level Effects: 
##                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept           0.01      0.07    -0.14     0.15 1.00      574      546
## sigma_Intercept    -2.46      1.16    -4.45    -0.22 1.00      373      303
## x                   1.02      0.09     0.90     1.22 1.01      398      286
## sigma_x             0.26      1.12    -2.33     2.61 1.01      123       92
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Note that there are two correlation terms cor(Intercept,x) and cor(sigma_Intercept,sigma_x).

Correlated group-level covariances with |<ID>|

# f3 <- brmsformula(
#   y ~ x + (x|2|group),
#   sigma ~ x + (x|2|group),
#   family = gaussian())
# ...
m3
##  Family: gaussian 
##   Links: mu = identity; sigma = log 
## Formula: y ~ x + (x | 2 | group) 
##          sigma ~ x + (x | 2 | group)
##    Data: df (Number of observations: 30) 
## Samples: 1 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 1000
## 
## Group-Level Effects: 
## ~group (Number of levels: 3) 
##                                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)                      0.12      0.29     0.00     0.82 1.00      328      164
## sd(x)                              0.09      0.20     0.00     0.62 1.00      402      205
## sd(sigma_Intercept)                1.27      1.69     0.04     5.68 1.00      283      351
## sd(sigma_x)                        0.80      1.26     0.01     4.32 1.00      322      399
## cor(Intercept,x)                  -0.01      0.46    -0.83     0.82 1.00      784      573
## cor(Intercept,sigma_Intercept)     0.00      0.45    -0.83     0.83 1.00     1014      598
## cor(x,sigma_Intercept)             0.02      0.46    -0.82     0.79 1.01      654      696
## cor(Intercept,sigma_x)             0.02      0.47    -0.83     0.83 1.00     1426      816
## cor(x,sigma_x)                     0.00      0.46    -0.81     0.85 1.00      902      619
## cor(sigma_Intercept,sigma_x)      -0.00      0.47    -0.82     0.85 1.00      760      839
## 
## Population-Level Effects: 
##                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept           0.02      0.14    -0.20     0.41 1.00      271      100
## sigma_Intercept    -2.29      1.09    -4.33     0.62 1.00      384      213
## x                   1.00      0.09     0.81     1.14 1.00      381      137
## sigma_x             0.19      0.69    -1.55     1.43 1.00      446      263
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

We have six correlation terms estimated. cor(Intercept,x) and cor(sigma_Intercept,sigma_x) are the same from before. The extra parameters populate the 4x4 covariance matrix. There is generally no need to look into the correlation values; the fact that correlations are estimated is enough.

Resources

Barr et al. 2013

Figure 2: Barr et al. 2013