Skip to main content
Unlisted page
This page is unlisted. Search engines will not index it, and only users having a direct link can access it.

Technical post-stratification details

Here we document the technical details behind GrowthBook regression adjustment and post-stratification. This approach permits estimation of absolute and relative effects, and unadjusted and CUPED inference, for binomial, count, and ratio metrics.

Throughout the document we describe CUPED estimation for ratio metrics, and then discuss simpler cases (e.g., non-adjusted estimates, count metrics).

We assume data are available in each cell, i.e., either the number of cells is not too big, or we have already aggregated some cells together.

For each case there are 4 steps.

  1. In Regression we describe how to construct regression estimates of the treatment effect and control mean for each cell (i.e. dimension level).
  2. In First Stage we describe how to construct cell-specific estimates of absolute treatment effects and control means using cell-specific summary statistics.
  3. In Second Stage we describe how to combine estimates across cells to estimate population effects and population control means.
  4. Finally, in Third Stage we transform the combined estimates into estimates of lift, ratio parameters, etc.

Regression

Below we describe regression models for each cell, or dimension level. The regression output will be used in the next section to construct the joint sampling distribution of effect estimates and control means within a stratification cell.

We do this for ratio metrics, and discuss along the way the simpler case of count metrics.

  • Define mi1m_{i1} (di1d_{i1}) as the numerator (denominator) outcome for the ithi^{\text{th}} user, i=1,2,...,Ni=1,2,..., N
  • Define ximx_{im} (xidx_{id}) as the pre-exposure numerator (denominator) variable for the ithi^{\text{th}} user.
  • Define wiw_{i} as the binary treatment assignment for the ithi^{\text{th}} user.
  • Define the covariate vector xi=(1,wi,xim,xid)\textbf{x}_{i} = \left(1, w_{i}, x_{im}, x_{id}\right).
  • Define the N×4N \times 4 design matrix X~\tilde{\textbf{X}} whose ithi^{\text{th}} row equals xi\textbf{x}_{i}.
  • Define the 2N×82N \times 8 design matrix X=I2X~\textbf{X} = \textbf{I}_{2}\otimes \tilde{\textbf{X}}.
  • Define the 2N2N length vector Y={m11,d11,m21,d21,...,mN1,dN1}\boldsymbol{Y} = \left\{m_{11}, d_{11},m_{21}, d_{21},..., m_{N1}, d_{N1}\right\}.
  • Define the regression coefficients as γ\boldsymbol{\gamma}.

Our model is of the form Y=Xγ+E.\textbf{Y} = \textbf{X}\boldsymbol{\gamma} + \textbf{E}.

The least squares solution for the 8×18 \times 1 vector of regression coefficients γ\boldsymbol{\gamma} is

γ^=(XX)1XY.\hat{\boldsymbol{\gamma}} = \left(\textbf{X}^{\top}\textbf{X} \right)^{-1}\textbf{X}^{\top}\textbf{Y}.
  • Define E~\tilde{\boldsymbol{E}} as the N×2N \times 2 matrix of residuals, whose first column corresponds to the residuals for the numerator and the second column is the residuals for the denominator.
  • Define the 2×22\times 2 covariance of E\textbf{E} as Ψ\boldsymbol{\Psi}.

The covariance of γ^\hat{\boldsymbol{\gamma}} is

Σγ=Cov(γ^)=Ψ(XX)1.\boldsymbol{\Sigma}_{\boldsymbol{\gamma}} = \text{Cov}\left(\hat{\boldsymbol{\gamma}} \right) = \boldsymbol{\Psi}\otimes \left(\textbf{X}^{\top}\textbf{X} \right)^{-1}.

By Lyapunov's central limit theorem,

γ^N(γ,Σγ).\hat{\boldsymbol{\gamma}} \stackrel{}{\sim} \mathcal{N}\left(\boldsymbol{\gamma}, \boldsymbol{\Sigma}_{\boldsymbol{\gamma}} \right).

Cell moments

In this section we describe how to use the regression output from the previous section to construct the joint sampling distribution of effect estimates and control means within a stratification cell.

In the kthk^{\text{th}} cell, our inferential focus is the vector αk\boldsymbol{\alpha}_{k}, which has four elements:

  • numerator absolute effect estimate for the kthk^{\text{th}} cell
  • numerator control mean for the kthk^{\text{th}} cell
  • denominator absolute effect estimate for the kthk^{\text{th}} cell
  • denominator control mean for the kthk^{\text{th}} cell

Now that we have our summary statistics in the form of a multivariate CLT, we linearly transform them to create our estimates of numerator and denominator effects and control means.

  • Define xˉm\bar{x}_{m} (xˉd\bar{x}_{d}) as the sample mean pre-exposure numerator (denominator) variable.
  • Define μxm\mu_{xm} and μxd\mu_{xd} as their population counterparts.
  • Define the 4×84\times 8 contrast matrix Ak,reg\textbf{A}_{k, reg} where
Ak,reg=(10xˉmxˉd000001000000000010xˉmxˉd00000100).\begin{align} \textbf{A}_{k, reg} = \begin{pmatrix} 1 & 0 & \bar{x}_{m} & \bar{x}_{d} & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & \bar{x}_{m} & \bar{x}_{d}\\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \end{pmatrix}. \end{align}

We estimate αk\boldsymbol{\alpha}_{k} with α^k=Ak,regγ^k\boldsymbol{\hat{\alpha}}_{k} = \textbf{A}_{k, reg}\hat{\boldsymbol{\gamma}}_{k}.

We now calculate the covariance of α^k\boldsymbol{\hat{\alpha}}_{k}, denoted as Σk\boldsymbol{\Sigma}_{k}. Many readers may want to skip to the next section, Combining cell estimates, where we describe how to combine estimates across cells to estimate population absolute effects and control means. One subtlety is that Ak,reg\textbf{A}_{k, reg} has random components which must be accounted for. For inference within a cell, we condition upon the sample size for that cell. We deal with the assignment randomness in the next section. Technically, each of the covariances and expectations below are conditional upon nkn_{k}, but we suppress this notation for clarity. Below we describe how to calculate row means and covariances between individual rows of Ak,reg\textbf{A}_{k, reg}.

The first moment of Ak,reg\textbf{A}_{k, reg} is

E[Ak,reg]=E[(10xˉmxˉd000001000000000010xˉmxˉd00000100)]=(10μxmμxd000001000000000010μxmμxd00000100).\begin{align} E\left[\textbf{A}_{k, reg}\right] &= E\left[ \begin{pmatrix} 1 & 0 & \bar{x}_{m} & \bar{x}_{d} & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & \bar{x}_{m} & \bar{x}_{d}\\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \end{pmatrix} \right] \\&= \begin{pmatrix} 1 & 0 & \mu_{xm} & \mu_{xd} & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & \mu_{xm} & \mu_{xd}\\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \end{pmatrix}. \end{align}

We also need the covariance between individual rows of Ak,reg\textbf{A}_{k, reg}.

Note that there is nothing random in the second and fourth rows of Ak,reg\textbf{A}_{k, reg}, so the covariance of any vectors with these terms is 0.

There are only 4 cases we need to consider, and we start with the covariance of the first row of Ak,reg\textbf{A}_{k, reg} with itself.

Cov(Ak,reg[1,],Ak,reg[1,])=E[(10xˉmxˉd0000)(10xˉmxˉd0000)]E[(10xˉmxˉd0000)]E[(10xˉmxˉd0000)]=(000000000000000000σxm2/nσxmd/n000000σxmd/nσxd2/n000000000000000000000000000000000000).\begin{align*} Cov\left(\textbf{A}_{k, reg}[1, ], \textbf{A}_{k, reg}[1, ]\right) &= E\left[ \begin{pmatrix} 1 & 0 & \bar{x}_{m} & \bar{x}_{d} & 0 & 0 & 0 & 0 \end{pmatrix} \begin{pmatrix} 1\\ 0\\ \bar{x}_{m} \\ \bar{x}_{d} \\ 0\\ 0\\ 0 \\ 0\\ \end{pmatrix} \right] \\&- E\left[ \begin{pmatrix} 1 & 0 & \bar{x}_{m} & \bar{x}_{d} & 0 & 0 & 0 & 0 \end{pmatrix} \right] E\left[ \begin{pmatrix} 1\\ 0\\ \bar{x}_{m} \\ \bar{x}_{d} \\ 0\\ 0\\ 0 \\ 0\\ \end{pmatrix} \right] \\&= \begin{pmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & \sigma_{xm}^{2}/n & \sigma_{xmd}/n & 0 & 0 & 0 & 0\\ 0 & 0 & \sigma_{xmd}/n & \sigma_{xd}^{2}/n & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{pmatrix}. \end{align*}

Using a similar argument for the (3, 3) case:

Cov(Ak,reg[3,],Ak,reg[3,])==(000000000000000000000000000000000000000000000000000000σxm2/nσxmd/n000000σxmd/nσxd2/n).\begin{align*} Cov\left(\textbf{A}_{k, reg}[3, ], \textbf{A}_{k, reg}[3, ]\right) &= \\&= \begin{pmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & \sigma_{xm}^{2}/n & \sigma_{xmd}/n\\ 0 & 0 &0 & 0 & 0 & 0 & \sigma_{xmd}/n & \sigma_{xd}^{2}/n \end{pmatrix}. \end{align*}

For the (1, 3) case:

Cov(Ak,reg[1,],Ak,reg[3,])=E[(10xˉmxˉd0000)(000010xˉmxˉd)]E[(10xˉmxˉd0000)]E[(000010xˉmxˉd)]=(00000000000000000000000000000000000000000000000000σxm2/nσxmd/n000000σxmd/nσxd2/n0000)\begin{align*} Cov\left(\textbf{A}_{k, reg}[1, ], \textbf{A}_{k, reg}[3, ]\right) &= E\left[ \begin{pmatrix} 1 & 0 & \bar{x}_{m} & \bar{x}_{d} & 0 & 0 & 0 & 0 \end{pmatrix} \begin{pmatrix} 0\\ 0\\ 0\\ 0\\ 1 \\ 0\\ \bar{x}_{m} \\ \bar{x}_{d} \end{pmatrix} \right] \\&- E\left[ \begin{pmatrix} 1 & 0 & \bar{x}_{m} & \bar{x}_{d} & 0 & 0 & 0 & 0 \end{pmatrix} \right] E\left[ \begin{pmatrix} 0\\ 0\\ 0\\ 0\\ 1 \\ 0\\ \bar{x}_{m} \\ \bar{x}_{d} \end{pmatrix} \right] \\&= \begin{pmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & \sigma_{xm}^{2}/n & \sigma_{xmd}/n & 0 & 0 & 0 & 0\\ 0 & 0 & \sigma_{xmd}/n & \sigma_{xd}^{2}/n & 0 & 0 & 0 & 0 \end{pmatrix} \end{align*}

For the (3, 1) case:

Cov(Ak,reg[3,],Ak,reg[1,])=Cov(Ak,reg[1,],Ak,reg[3,])\begin{align*} Cov\left(\textbf{A}_{k, reg}[3, ], \textbf{A}_{k, reg}[1, ]\right) = Cov\left(\textbf{A}_{k, reg}[1, ], \textbf{A}_{k, reg}[3, ]\right)' \end{align*}

Define μk,reg\boldsymbol{\mu}_{k, reg} as the mean of Ak,reg\textbf{A}_{k, reg}.

Cov(α^k)=Cov(Ak,regγ^k)=E[Cov(Ak,regγ^k)Ak,reg]+Cov[E(Ak,regγ^k)Ak,reg]=E[Ak,regCov(γ^k)Ak,reg]+Cov[Ak,regγk]\begin{align*} \text{Cov}\left(\hat{\alpha}_{k}\right) &= \text{Cov}\left(\textbf{A}_{k, reg}\hat{\gamma}_{k}\right) \\&= E\left[\text{Cov}\left(\textbf{A}_{k, reg}\hat{\gamma}_{k}\right)|\textbf{A}_{k, reg}\right] + \text{Cov}\left[E\left(\textbf{A}_{k, reg}\hat{\gamma}_{k}\right)|\textbf{A}_{k, reg}\right] \\&= E\left[\textbf{A}_{k, reg}\text{Cov}\left(\hat{\gamma}_{k}\right)\textbf{A}_{k, reg}^{\top}\right] + \text{Cov}\left[\textbf{A}_{k, reg}\gamma_{k}\right] \end{align*}

The first term has (i,j)th(i,j)^{\text{th}} element equal to

E[Ak,regCov(γ^k)Ak,reg][i,j]=E[Ak,reg[i,]Cov(γ^k)Ak,reg[j,]]=E[trace(Ak,reg[i,]Cov(γ^k)Ak,reg[j,])]=E[trace(Cov(γ^k)Ak,reg[j,]Ak,reg[i,])]=trace(Cov(γ^k)E[Ak,reg[j,]Ak,reg[i,])]\begin{align*} E\left[\textbf{A}_{k, reg}\text{Cov}\left(\hat{\boldsymbol{\gamma}}_{k}\right)\textbf{A}_{k, reg}^{\top}\right][i,j] &= E\left[\textbf{A}_{k, reg}[i, ]\text{Cov}\left(\hat{\boldsymbol{\gamma}}_{k}\right)\textbf{A}_{k, reg}[j, ]^{\top}\right] \\&= E\left[ \text{trace}\left(\textbf{A}_{k, reg}[i, ]\text{Cov}\left(\hat{\boldsymbol{\gamma}}_{k}\right)\textbf{A}_{k, reg}[j, ]^{\top}\right) \right] \\&= E\left[ \text{trace}\left(\text{Cov}\left(\hat{\boldsymbol{\gamma}}_{k}\right)\textbf{A}_{k, reg}[j, ]^{\top}\textbf{A}_{k, reg}[i, ]\right) \right] \\&= \text{trace}\left(\text{Cov}\left(\hat{\boldsymbol{\gamma}}_{k}\right) E\left[ \textbf{A}_{k, reg}[j, ]^{\top}\textbf{A}_{k, reg}[i, ]\right) \right] \end{align*}

A similar argument exists for the second term.

Therefore,

Σk[i,j]=Cov(α^k)[i,j]=trace(Cov(γ^k)E[Ak,reg[j,]Ak,reg[i,])]+trace(γkγkCov[Ak,reg[j,]Ak,reg[i,]])\begin{align*} \boldsymbol{\Sigma}_{k}[i, j] &= \text{Cov}\left(\boldsymbol{\hat{\alpha}}_{k}\right)[i, j] &= \text{trace}\left(\text{Cov}\left(\hat{\boldsymbol{\gamma}}_{k}\right) E\left[ \textbf{A}_{k, reg}[j, ]\textbf{A}_{k, reg}[i, ]^{\top}\right) \right] +\text{trace}\left( \boldsymbol{\gamma}_{k} \boldsymbol{\gamma}_{k}^{\top} \text{Cov}\left[ \textbf{A}_{k, reg}[j, ]\textbf{A}_{k, reg}[i, ]^{\top} \right] \right) \end{align*}

In practice, we substitute γk\boldsymbol{\gamma}_{k} for γ^k\hat{\boldsymbol{\gamma}}_{k}.

Combining cell estimates

At a high level, for each cell we now have estimates of population means (e.g., the control mean and the absolute effect), and uncertainty about those estimates. In this section we describe how to combine these estimates across cells to estimate population absolute effects and control means. This algorithm can be used for count or ratio metrics, unadjusted or adjusted (e.g., CUPED), and absolute or relative inference.

Define the population (sample) proportion for the kthk^{\text{th}} strata cell as νk\nu_{k} (ν^k)\hat{\nu}_{k}).

Under stratified sampling, the νk\nu_{k} are deterministic, and we could define α^=k=1Kνkα^k\hat{\boldsymbol{\alpha}} = \sum_{k=1}^{K}\nu_{k}\hat{\boldsymbol{\alpha}}_{k} and Σ^=k=1Kνk2nk1Σ^k\hat{\boldsymbol{\Sigma}} = \sum_{k=1}^{K}\nu_{k}^{2}n_{k}^{-1}\hat{\boldsymbol{\Sigma}}_{k}. However, we do not conduct stratified sampling in GrowthBook. Under simple random sampling the ν^k\hat{\nu}_{k} are multinomial random variables, and we could define α^=k=1Kν^kα^k\hat{\boldsymbol{\alpha}} = \sum_{k=1}^{K}\hat{\nu}_{k}\hat{\boldsymbol{\alpha}}_{k}. Define the 4×K4 \times K matrix αM\boldsymbol{\alpha}_{M} as the matrix whose kthk^{\text{th}} row is αk\boldsymbol{\alpha}_{k}. Our point estimate is the expected value of k=1Kνk^α^k\sum_{k=1}^{K}\hat{\nu_{k}}\hat{\boldsymbol{\alpha}}_{k}, which is

E(α^Mν^)=Eν^(Eα^(α^Mν^ν^))=αMEν^(ν^)=αMν.\begin{align*} E\left(\hat{\boldsymbol{\alpha}}_{M}\hat{\boldsymbol{\nu}} \right) &= E_{\hat{\boldsymbol{\nu}}}\left(E_{\hat{\boldsymbol{\alpha}}}\left(\hat{\boldsymbol{\alpha}}_{M}\hat{\boldsymbol{\nu}}|\hat{\boldsymbol{\nu}} \right)\right) \\&= \boldsymbol{\alpha}_{M}E_{\hat{\boldsymbol{\nu}}}\left( \hat{\boldsymbol{\nu}} \right) \\&= \boldsymbol{\alpha}_{M}\boldsymbol{\nu}. \end{align*}

Below we derive its covariance. Define the collection of ν^k\hat{\nu}_{k} as ν^\hat{\boldsymbol{\nu}}. The naive covariance is Σ^=k=1Kν^k2nk1Σ^k\hat{\boldsymbol{\Sigma}} = \sum_{k=1}^{K}\hat{\nu}_{k}^{2}n_{k}^{-1}\hat{\boldsymbol{\Sigma}}_{k}.

Alternatively, we can use Equation 15 in (Xie and Aurriset 2016) to define

Σ^=n1k=1K(ν[k]+1ν[k]n)Σk\hat{\boldsymbol{\Sigma}} = n^{-1}\sum*{k=1}^{K}\left(\boldsymbol{\nu}[k] + \frac{1-\boldsymbol{\nu}[k]}{n} \right)\boldsymbol{\Sigma}*{k}

Both approaches assume the population cell proportions νk\nu_{k} are known.

For GrowthBook experiments, the ν^k\hat{\nu}_{k} are random variables, and this assumption is not met. There is dependence between the nkn_{k} (or equivalently, between the ν^k\hat{\boldsymbol{\nu}}_{k}) that is not accounted for when estimating the variance. We show below in the Section Derivation of conditional covariance that Cov(α^Mν^)\text{Cov}\left(\hat{\boldsymbol{\alpha}}_{M}\hat{\boldsymbol{\nu}} \right) is:

Cov(α^Mν^)=αMCov(ν^)αM+n1k=1Kν[k]Σk.\begin{align} \text{Cov}\left(\hat{\boldsymbol{\alpha}}_{M}\hat{\boldsymbol{\nu}} \right) &= \boldsymbol{\alpha}_{M}\text{Cov}\left(\hat{\boldsymbol{\nu}}\right)\boldsymbol{\alpha}_{M}^{\top} + n^{-1}\sum_{k=1}^{K} \boldsymbol{\nu}[k]\boldsymbol{\Sigma}_{k}. \end{align}

Note that ν^\hat{\boldsymbol{\nu}} is a multinomial random variable divided by nn, so its K×KK\times K covariance matrix has diagonal element equal to νk/n2\nu_{k} / n^{2} and off-diagonal element (i,j)th(i, j)^{\text{th}} element equal to νiνj/n2-\nu_{i} \nu_{j} / n^{2}.

Delta method

To recapitulate, we now have an estimate of the joint sampling distribution of the vector α\boldsymbol{\alpha}, which has four elements:

  • numerator absolute effect estimate
  • numerator control mean
  • denominator absolute effect estimate
  • denominator control mean.

To estimate lift (relative effects), we use the delta method.

Delta method for ratio metrics

By the central limit theorem

α^=(α^1α^2α^3α^4)N(α=(α1α2α3α4),Σ)\begin{equation} \hat{\boldsymbol{\alpha}} =\begin{pmatrix} \hat{\boldsymbol{\alpha}}_{1}\\ \hat{\boldsymbol{\alpha}}_{2} \\ \hat{\boldsymbol{\alpha}}_{3} \\ \hat{\boldsymbol{\alpha}}_{4} \end{pmatrix}\stackrel{}{\sim}\mathcal{N}\left(\boldsymbol{\alpha}=\begin{pmatrix} \boldsymbol{\alpha}_{1}\\ \boldsymbol{\alpha}_{2} \\ \boldsymbol{\alpha}_{3} \\ \boldsymbol{\alpha}_{4} \end{pmatrix},\boldsymbol{\Sigma}\right) \end{equation}

Define gabs(α)=α[1]+α[2]α[3]+α[4]α[1]α[3]g_{abs}(\boldsymbol{\alpha}) = \frac{\boldsymbol{\alpha}[1] + \boldsymbol{\alpha}[2]}{\boldsymbol{\alpha}[3] + \boldsymbol{\alpha}[4]} - \frac{\boldsymbol{\alpha}[1]}{\boldsymbol{\alpha}[3]}. Define

grel(α)=α[1]+α[2]α[3]+α[4]α[1]α[3]α[1]/α[3]=α[1]+α[2]α[3]+α[4]α[1]/α[3]1=α[3](α[1]+α[2])α[1](α[3]+α[4])1=grel,Ngrel,D1.\begin{align*} g_{rel}(\boldsymbol{\alpha}) &= \frac{ \frac{\boldsymbol{\alpha}[1] + \boldsymbol{\alpha}[2]}{\boldsymbol{\alpha}[3] + \boldsymbol{\alpha}[4]} - \frac{\boldsymbol{\alpha}[1]}{\boldsymbol{\alpha}[3]}}{\boldsymbol{\alpha}[1] / \boldsymbol{\alpha}[3]} \\&= \frac{\frac{\boldsymbol{\alpha}[1] + \boldsymbol{\alpha}[2]}{\boldsymbol{\alpha}[3] + \boldsymbol{\alpha}[4]}}{\boldsymbol{\alpha}[1] / \boldsymbol{\alpha}[3]} - 1 \\&= \frac{\boldsymbol{\alpha}[3]\left(\boldsymbol{\alpha}[1] + \boldsymbol{\alpha}[2]\right)}{\boldsymbol{\alpha}[1]\left(\boldsymbol{\alpha}[3] + \boldsymbol{\alpha}[4]\right)}- 1 \\&=\frac{g_{rel, N}}{g_{rel, D}} - 1. \end{align*}

Define g{gabs,grel}g \in \left\{g_{abs}, g_{rel} \right\}.
Define the vector of partials of length 44 =gα\boldsymbol{\nabla} = \frac{\partial g}{\partial \boldsymbol{\alpha}}. If g=gabsg = g_{abs} then set \boldsymbol{\nabla} equal to abs\boldsymbol{\nabla}_{abs}, where

abs[1]=1α[3]+α[4]1α[3]abs[2]=1α[3]+α[4]abs[3]=([α[1]+α[2]])([α[3]+α[4]])2+α[1]α[3]2abs[4]=([α[1]+α[2]])([α[3]+α[4]])2\begin{align*} \boldsymbol{\nabla}_{abs}[1] &= \frac{1}{\boldsymbol{\alpha}[3] + \boldsymbol{\alpha}[4]}- \frac{1}{\boldsymbol{\alpha}[3]}\\ \boldsymbol{\nabla}_{abs}[2] &= \frac{1}{\boldsymbol{\alpha}[3] + \boldsymbol{\alpha}[4]}\\ \boldsymbol{\nabla}_{abs}[3] &= \frac{-\left(\left[\boldsymbol{\alpha}[1] + \boldsymbol{\alpha}[2]\right]\right)}{ \left(\left[\boldsymbol{\alpha}[3] + \boldsymbol{\alpha}[4]\right]\right)^{2} } + \frac{\boldsymbol{\alpha}[1]}{\boldsymbol{\alpha}[3]^{2}}\\ \boldsymbol{\nabla}_{abs}[4] &= \frac{-\left(\left[\boldsymbol{\alpha}[1] + \boldsymbol{\alpha}[2]\right]\right)}{ \left(\left[\boldsymbol{\alpha}[3] + \boldsymbol{\alpha}[4]\right]\right)^{2} } \end{align*}

If g=grelg = g_{rel} then define \boldsymbol{\nabla} equal to rel\boldsymbol{\nabla}_{rel}, where

rel[1]=α[3]grel,D(α[3]+α[4])grel,Ngrel,D2rel[2]=α[3]grel,Drel[3]=(α[1]+α[2])grel,Dα[1]grel,Ngrel,D2rel[4]=α[3](α[1]+α[2])α[1](α[3]+α[4])2\begin{align*} \boldsymbol{\nabla}_{rel}[1] &= \frac{\boldsymbol{\alpha}[3]g_{rel, D} - (\boldsymbol{\alpha}[3] + \boldsymbol{\alpha}[4])g_{rel, N}} {g_{rel, D}^{2}}\\ \boldsymbol{\nabla}_{rel}[2] &= \frac{\boldsymbol{\alpha}[3]}{g_{rel, D}}\\ \boldsymbol{\nabla}_{rel}[3] &= \frac{(\boldsymbol{\alpha}[1] + \boldsymbol{\alpha}[2])g_{rel, D} - \boldsymbol{\alpha}[1] g_{rel, N}} {g_{rel, D}^{2}}\\ \boldsymbol{\nabla}_{rel}[4] &= \frac{-\boldsymbol{\alpha}[3]\left(\boldsymbol{\alpha}[1] + \boldsymbol{\alpha}[2]\right)}{ \boldsymbol{\alpha}[1]\left(\boldsymbol{\alpha}[3] + \boldsymbol{\alpha}[4]\right)^{2} } \end{align*}

By the delta method,
Δ^r=g(α^)N(Δr=g(α),Σ)\hat{\Delta}_{r} = g(\hat{\boldsymbol{\alpha}}) \stackrel{}{\sim}\mathcal{N}\left(\Delta_{r} = g\left(\alpha\right), \boldsymbol{\nabla}^{\top}\boldsymbol{\Sigma}\boldsymbol{\nabla}\right).

In summary, the steps for the algorithm are:

  1. Compute the point estimate Δ^=g(α^)\hat{\Delta} = g(\hat{\boldsymbol{\alpha}}).
  2. Compute the estimated variance v^=Σ\hat{v} = \boldsymbol{\nabla}^{\top}\boldsymbol{\Sigma}\boldsymbol{\nabla}.
  3. Return (Δ^,v^)(\hat{\Delta}, \hat{v}).

Delta method for count metrics

Define α^\hat{\boldsymbol{\alpha}} as the 2×12\times 1 vector with the control sample mean and the numerator effect estimate. Define Σ^\hat{\boldsymbol{\Sigma}} as the 2×22 \times 2 covariance of α^\hat{\boldsymbol{\alpha}}. By the central limit theorem

α^=(α^1α^2)N(α=(α1α2),Σ)\begin{equation} \hat{\boldsymbol{\alpha}} =\begin{pmatrix} \hat{\boldsymbol{\alpha}}_{1}\\ \hat{\boldsymbol{\alpha}}_{2} \end{pmatrix}\stackrel{}{\sim}\mathcal{N}\left(\boldsymbol{\alpha}=\begin{pmatrix} \boldsymbol{\alpha}_{1}\\ \boldsymbol{\alpha}_{2} \end{pmatrix},\boldsymbol{\Sigma}\right) \end{equation}

Define gabs(α)=α[2]g_{abs}(\boldsymbol{\alpha}) = \boldsymbol{\alpha}[2].

Define grel(α)=α[2]α[1].g_{rel}(\boldsymbol{\alpha}) = \frac{\boldsymbol{\alpha}[2]}{\boldsymbol{\alpha}[1]}.

Define g{gabs,grel}g \in \left\{g_{abs}, g_{rel} \right\}.

Define the vector of partials of length 22 =gα\boldsymbol{\nabla} = \frac{\partial g}{\partial \boldsymbol{\alpha}}.

If g=gabsg = g_{abs} then set \boldsymbol{\nabla} equal to abs\boldsymbol{\nabla}_{abs}, where

abs[1]=0abs[2]=1.\begin{align*} \boldsymbol{\nabla}_{abs}[1] &= 0\\ \boldsymbol{\nabla}_{abs}[2] &= 1. \end{align*}

If g=grelg = g_{rel} then define \boldsymbol{\nabla} equal to rel\boldsymbol{\nabla}_{rel}, where

rel[1]=α[2]α[1]2rel[2]=1α[1]\begin{align*} \boldsymbol{\nabla}_{rel}[1] &= \frac{-\boldsymbol{\alpha}[2]} {\boldsymbol{\alpha}[1]^{2}}\\ \boldsymbol{\nabla}_{rel}[2] &= \frac{1}{\boldsymbol{\alpha}[1]}\\ \end{align*}

By the delta method,
Δ^=g(α^)N(Δr=g(α),Σ)\hat{\Delta} = g(\hat{\boldsymbol{\alpha}}) \stackrel{}{\sim}\mathcal{N}\left(\Delta_{r} = g\left(\alpha\right), \boldsymbol{\nabla}^{\top}\boldsymbol{\Sigma}\boldsymbol{\nabla}\right).

In summary, the steps for the algorithm are:

  1. Compute the point estimate Δ^=g(α^)\hat{\Delta} = g(\hat{\boldsymbol{\alpha}}).
  2. Compute the estimated variance v^=Σ\hat{v} = \boldsymbol{\nabla}^{\top}\boldsymbol{\Sigma}\boldsymbol{\nabla}.
  3. Return (Δ^,v^)(\hat{\Delta}, \hat{v}).

Appendix

Derivation of conditional covariance

In this section we derive the covariance of α^Mν^\hat{\boldsymbol{\alpha}}_{M}\hat{\boldsymbol{\nu}}.

We derive the covariance using results from linear models. Recall that if A\textbf{A} is a matrix and Z\textbf{Z} is a random vector with mean μ\boldsymbol{\mu} and covariance Ψ\boldsymbol{\Psi} then E(ZAZ)=αAα+tr(AΨ).E\left(\textbf{Z}^{\top} \textbf{A}\textbf{Z}\right) = \boldsymbol{\alpha}^{\top} \textbf{A}\boldsymbol{\alpha} + \text{tr}\left( \textbf{A}\boldsymbol{\Psi}\right). Define αl\boldsymbol{\alpha}_{l} as the lthl^{\text{th}} row of αM\boldsymbol{\alpha}_{M}, and analogously define α^m\hat{\boldsymbol{\alpha}}_{m}.

Below we derive Cov(α^Mν^)\text{Cov}\left(\hat{\boldsymbol{\alpha}}_{M}|\hat{\boldsymbol{\nu}}\right). Define αl\boldsymbol{\alpha}_{l} as the lthl^{\text{th}} row of αM\boldsymbol{\alpha}_{M}, and analogously define α^m\hat{\boldsymbol{\alpha}}_{m}. First we need the following result: E(α^lα^mν^)E\left(\hat{\boldsymbol{\alpha}}_{l}\hat{\boldsymbol{\alpha}}_{m}^{\top}|\hat{\boldsymbol{\nu}}\right)

=[E(α^l[1]α^m[1]ν^)E(α^l[1]α^m[2]ν^)...E(α^l[1]α^m[i,K]ν^)E(α^l[2]α^m[1]ν^)E(α^l[2]α^m[2]ν^)...E(α^l[2]α^m[i,K]ν^)E(α^l[K]α^m[1]ν^)E(α^l[K]α^m[2]ν^)...E(α^l[K]α^m[i,K]ν^)]=[E(α^l[1]α^m[1]ν^)E(α^l[1]ν^)E(α^m[2]ν^)...E(α^l[1]ν^)E(α^m[i,K]ν^)E(α^l[2]ν^)E(α^m[1]ν^)E(α^l[2]α^m[2]ν^)...E(α^l[2]ν^)E(α^m[i,K]ν^)E(α^l[K]ν^)E(α^m[1]ν^)E(α^l[1]ν^)E(α^m[i,K]ν^)...E(α^l[K]ν^)E(α^m[i,K]ν^)]\begin{align*} &= \begin{bmatrix} E\left(\hat{\boldsymbol{\alpha}}_{l}[1]\hat{\boldsymbol{\alpha}}_{m}[1]|\hat{\boldsymbol{\nu}}\right) & E\left(\hat{\boldsymbol{\alpha}}_{l}[1]\hat{\boldsymbol{\alpha}}_{m}[2]|\hat{\boldsymbol{\nu}}\right) & ... & E\left(\hat{\boldsymbol{\alpha}}_{l}[1]\hat{\boldsymbol{\alpha}}_{m}[i, K]|\hat{\boldsymbol{\nu}}\right) \\ E\left(\hat{\boldsymbol{\alpha}}_{l}[2]\hat{\boldsymbol{\alpha}}_{m}[1]|\hat{\boldsymbol{\nu}}\right) & E\left(\hat{\boldsymbol{\alpha}}_{l}[2]\hat{\boldsymbol{\alpha}}_{m}[2]|\hat{\boldsymbol{\nu}}\right) & ... & E\left(\hat{\boldsymbol{\alpha}}_{l}[2]\hat{\boldsymbol{\alpha}}_{m}[i, K]|\hat{\boldsymbol{\nu}}\right) \\ \vdots & \vdots & \vdots & \vdots \\ E\left(\hat{\boldsymbol{\alpha}}_{l}[K]\hat{\boldsymbol{\alpha}}_{m}[1]|\hat{\boldsymbol{\nu}}\right) & E\left(\hat{\boldsymbol{\alpha}}_{l}[K]\hat{\boldsymbol{\alpha}}_{m}[2]|\hat{\boldsymbol{\nu}}\right) & ... & E\left(\hat{\boldsymbol{\alpha}}_{l}[K]\hat{\boldsymbol{\alpha}}_{m}[i, K]|\hat{\boldsymbol{\nu}}\right) \\ \end{bmatrix} \\&= \begin{bmatrix} E\left(\hat{\boldsymbol{\alpha}}_{l}[1]\hat{\boldsymbol{\alpha}}_{m}[1]|\hat{\boldsymbol{\nu}}\right) & E\left(\hat{\boldsymbol{\alpha}}_{l}[1]|\hat{\boldsymbol{\nu}}\right)E\left(\hat{\boldsymbol{\alpha}}_{m}[2]|\hat{\boldsymbol{\nu}}\right) & ... & E\left(\hat{\boldsymbol{\alpha}}_{l}[1]|\hat{\boldsymbol{\nu}}\right)E\left(\hat{\boldsymbol{\alpha}}_{m}[i, K]|\hat{\boldsymbol{\nu}}\right) \\ E\left(\hat{\boldsymbol{\alpha}}_{l}[2]|\hat{\boldsymbol{\nu}}\right)E\left(\hat{\boldsymbol{\alpha}}_{m}[1]|\hat{\boldsymbol{\nu}}\right) & E\left(\hat{\boldsymbol{\alpha}}_{l}[2]\hat{\boldsymbol{\alpha}}_{m}[2]|\hat{\boldsymbol{\nu}}\right) & ... & E\left(\hat{\boldsymbol{\alpha}}_{l}[2]|\hat{\boldsymbol{\nu}}\right)E\left(\hat{\boldsymbol{\alpha}}_{m}[i, K]|\hat{\boldsymbol{\nu}}\right) \\ \vdots & \vdots & \vdots & \vdots \\ E\left(\hat{\boldsymbol{\alpha}}_{l}[K]|\hat{\boldsymbol{\nu}}\right)E\left(\hat{\boldsymbol{\alpha}}_{m}[1]|\hat{\boldsymbol{\nu}}\right) & E\left(\hat{\boldsymbol{\alpha}}_{l}[1]|\hat{\boldsymbol{\nu}}\right)E\left(\hat{\boldsymbol{\alpha}}_{m}[i, K]|\hat{\boldsymbol{\nu}}\right) & ... & E\left(\hat{\boldsymbol{\alpha}}_{l}[K]|\hat{\boldsymbol{\nu}}\right)E\left(\hat{\boldsymbol{\alpha}}_{m}[i, K]|\hat{\boldsymbol{\nu}}\right) \end{bmatrix} \end{align*}

Therefore,

Cov(α^l,α^mν^)=E(α^lα^mν^)E(α^lν^)E(α^mν^)=[Cov(α^l[1],α^m[1]ν^)0...00Cov(α^l[1],α^m[1]ν^)...000...Cov(α^l[K],α^m[K]ν^)]=n1[Σ1[l,m]/(ν^[1])0...00Σ2[l,m]/(ν^[2])...000...ΣK[l,m]/(ν^[K])] \begin{align*} \text{Cov}\left(\hat{\boldsymbol{\alpha}}_{l}, \hat{\boldsymbol{\alpha}}_{m}|\hat{\boldsymbol{\nu}}\right) &= E\left(\hat{\boldsymbol{\alpha}}_{l}\hat{\boldsymbol{\alpha}}_{m}^{\top}|\hat{\boldsymbol{\nu}}\right) - E\left(\hat{\boldsymbol{\alpha}}_{l}|\hat{\boldsymbol{\nu}}\right)E\left(\hat{\boldsymbol{\alpha}}_{m}^{\top}|\hat{\boldsymbol{\nu}}\right) \\&= \begin{bmatrix} \text{Cov}\left(\hat{\boldsymbol{\alpha}}_{l}[1], \hat{\boldsymbol{\alpha}}_{m}[1]|\hat{\boldsymbol{\nu}}\right) & 0 & ... & 0 \\ 0 & \text{Cov}\left(\hat{\boldsymbol{\alpha}}_{l}[1], \hat{\boldsymbol{\alpha}}_{m}[1]|\hat{\boldsymbol{\nu}}\right) & ... & 0 \\ \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & ... & \text{Cov}\left(\hat{\boldsymbol{\alpha}}_{l}[K], \hat{\boldsymbol{\alpha}}_{m}[K]|\hat{\boldsymbol{\nu}}\right) \end{bmatrix} \\&= n^{-1}\begin{bmatrix} \boldsymbol{\Sigma}_{1}[l, m] / (\hat{\boldsymbol{\nu}}[1]) & 0 & ... & 0 \\ 0 & \boldsymbol{\Sigma}_{2}[l, m] / (\hat{\boldsymbol{\nu}}[2]) & ... & 0 \\ \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & ... & \boldsymbol{\Sigma}_{K}[l, m] / (\hat{\boldsymbol{\nu}}[K]) \end{bmatrix} \end{align*}

To get the (l,m)th(l, m)^{\text{th}} element of the covariance we first calculate the (l,m)th(l, m)^{\text{th}} element of the second moment E(α^lν^ν^α^m)E\left(\hat{\boldsymbol{\alpha}}_{l}\hat{\boldsymbol{\nu}}\hat{\boldsymbol{\nu}}^{\top}\hat{\boldsymbol{\alpha}}_{m}^{\top} \right):

=Eν^(Eα^(α^lν^ν^α^mν^))=Eν^(αlν^ν^αm)+Eν^(tr(ν^ν^Cov(α^l,α^mν^))ν^)=Eν^(αlν^ν^αm)+Eν^(tr(Cov(α^l,α^mν^)ν^ν^)ν^)=Eν^(αlν^ν^αm)+Eν^(tr(n1[Σ1[l,m]/ν^[1]0...00Σ2[l,m]/ν^[2]...000...ΣK[l,m]/ν^[K]]))=αl(Cov(ν^)+νν)αm+n1k=1Kν[k]Σk[l,m]\begin{align*} &= E_{\hat{\boldsymbol{\nu}}}\left(E_{\hat{\boldsymbol{\alpha}}}\left(\hat{\boldsymbol{\alpha}}_{l}\hat{\boldsymbol{\nu}}\hat{\boldsymbol{\nu}}^{\top}\hat{\boldsymbol{\alpha}}_{m}^{\top}|\hat{\boldsymbol{\nu}} \right)\right) \\&= E_{\hat{\boldsymbol{\nu}}} \left( \boldsymbol{\alpha}_{l}\hat{\boldsymbol{\nu}}\hat{\boldsymbol{\nu}}^{\top}\boldsymbol{\alpha}_{m}^{\top} \right) \\&+ E_{\hat{\boldsymbol{\nu}}} \left( \text{tr}\left(\hat{\boldsymbol{\nu}}\hat{\boldsymbol{\nu}}^{\top}\text{Cov}\left(\hat{\boldsymbol{\alpha}}_{l}, \hat{\boldsymbol{\alpha}}_{m}|\hat{\boldsymbol{\nu}} \right)\right)|\hat{\boldsymbol{\nu}} \right) \\&= E_{\hat{\boldsymbol{\nu}}} \left( \boldsymbol{\alpha}_{l}\hat{\boldsymbol{\nu}}\hat{\boldsymbol{\nu}}^{\top}\boldsymbol{\alpha}_{m}^{\top} \right) \\&+ E_{\hat{\boldsymbol{\nu}}} \left( \text{tr}\left(\text{Cov}\left(\hat{\boldsymbol{\alpha}}_{l}, \hat{\boldsymbol{\alpha}}_{m}|\hat{\boldsymbol{\nu}} \right)\hat{\boldsymbol{\nu}}\hat{\boldsymbol{\nu}}^{\top}\right)|\hat{\boldsymbol{\nu}} \right) \\&= E_{\hat{\boldsymbol{\nu}}} \left( \boldsymbol{\alpha}_{l}\hat{\boldsymbol{\nu}}\hat{\boldsymbol{\nu}}^{\top}\boldsymbol{\alpha}_{m}^{\top} \right) \\&+ E_{\hat{\boldsymbol{\nu}}} \left( \text{tr}\left( n^{-1}\begin{bmatrix} \boldsymbol{\Sigma}_{1}[l, m] / \hat{\boldsymbol{\nu}}[1] & 0 & ... & 0 \\ 0 & \boldsymbol{\Sigma}_{2}[l, m] / \hat{\boldsymbol{\nu}}[2] & ... & 0 \\ \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & ... & \boldsymbol{\Sigma}_{K}[l, m] / \hat{\boldsymbol{\nu}}[K] \end{bmatrix} \right) \right) \\&= \boldsymbol{\alpha}_{l}\left(\text{Cov}\left(\hat{\boldsymbol{\nu}}\right) + \boldsymbol{\nu}\boldsymbol{\nu}^{\top}\right)\boldsymbol{\alpha}_{m}^{\top} \\&+ n^{-1}\sum_{k=1}^{K} \boldsymbol{\nu}[k]\boldsymbol{\Sigma}_{k}[l, m] \end{align*}

where we used the fact that the trace of a product of a diagonal matrix and a matrix is the sum of the products of the diagonal elements of the diagonal matrix and the diagonal elements of the matrix.

In summary, the (l,m)th(l, m)^{\text{th}} element of Cov(α^Mν^)\text{Cov}\left(\hat{\boldsymbol{\alpha}}_{M}\hat{\boldsymbol{\nu}} \right) is:

Cov(α^Mν^)[l,m]=αlCov(ν^)αm+n1k=1Kν[k]Σk[l,m].\begin{align*} \text{Cov}\left(\hat{\boldsymbol{\alpha}}_{M}\hat{\boldsymbol{\nu}} \right)[l, m] &= \boldsymbol{\alpha}_{l}\text{Cov}\left(\hat{\boldsymbol{\nu}}\right)\boldsymbol{\alpha}_{m}^{\top} + n^{-1}\sum_{k=1}^{K} \boldsymbol{\nu}[k]\boldsymbol{\Sigma}_{k}[l, m]. \end{align*}

Therefore, the covariance of α^Mν^\hat{\boldsymbol{\alpha}}_{M}\hat{\boldsymbol{\nu}} is:

Cov(α^Mν^)=αMCov(ν^)αM+n1k=1Kν[k]Σk.\begin{align*} \text{Cov}\left(\hat{\boldsymbol{\alpha}}_{M}\hat{\boldsymbol{\nu}} \right) &= \boldsymbol{\alpha}_{M}\text{Cov}\left(\hat{\boldsymbol{\nu}}\right)\boldsymbol{\alpha}_{M}^{\top} + n^{-1}\sum_{k=1}^{K} \boldsymbol{\nu}[k]\boldsymbol{\Sigma}_{k}. \end{align*}