Skip to main content

Technical CUPED details

Here we document the technical details behind GrowthBook CUPED variance estimates.

We use the notation below. We describe our approach in terms of revenue, but any binomial or count metric can be substituted.

  1. Define YCY_{C} (YT)\left(Y_{T}\right) as the observed post-exposure revenue for a user exposed to control (treatment).
  2. Define XCX_{C} (XT)\left(X_{T}\right) as the observed pre-exposure revenue for a user exposed to control (treatment).
  3. Define YY (XX) as the post-exposure (pre-exposure) revenue for all users collectively in the experiment.
  4. Define YˉC\bar{Y}_{C} (YˉT)\left(\bar{Y}_{T}\right) as the sample average post-exposure revenue for users exposed to control (treatment).
  5. Define μC\mu_{C} (μT)\left(\mu_{T}\right) as the population average post-exposure revenue for users exposed to control (treatment).
  6. Define NCN_{C} (NT)\left(N_{T}\right) as the number of users exposed to control (treatment).

Absolute case

For absolute inference, our target parameter is

ΔA=μYμYC.\begin{align} \Delta_{A}&=\mu_{Y}-\mu_{YC}. \end{align}

As described in Equation 4 of (Deng et al. 2013), we find the optimal θ\theta using user data across both control and treatment: θ=cov(Y,X)/var(X).\theta = cov(Y, X) / var(X). Our estimate of ΔA\Delta_{A} is the difference in adjusted means

Δ^A=(YˉTθXˉT)(YˉCθXˉC).\begin{align} \hat{\Delta}_{A} &= \left(\bar{Y}_{T} - \theta\bar{X}_{T}\right) - \left(\bar{Y}_{C} - \theta\bar{X}_{C}\right). \end{align}

Under a superpopulation framework and independence of random assignment, the adjusted means (YˉTθXˉT)\left(\bar{Y}_{T} - \theta\bar{X}_{T}\right) and (YˉCθXˉC)\left(\bar{Y}_{C} - \theta\bar{X}_{C}\right) are statistically independent.
Therefore, the variance of the difference in adjusted means is the sum of the variances of the adjusted means.
We denote these variances as Vadj,CV_{adj, C} and Vadj,TV_{adj, T}, respectively, and they are defined as Define the control (treatment) population covariance between post-exposure and pre-exposure revenue as σXY,C\sigma_{XY,C} (σXY,T\sigma_{XY,T}).

Vadj,C=σYC2+θ2σXC22θσXY,CNCVadj,T=σYT2+θ2σXT22θσXY,TNT.\begin{align} V_{adj, C} &= \frac{\sigma^{2}_{YC} + \theta^{2}\sigma^{2}_{XC} - 2\theta\sigma_{XY,C}}{N_{C}}\\ V_{adj, T} &= \frac{\sigma^{2}_{YT} + \theta^{2}\sigma^{2}_{XT} - 2\theta\sigma_{XY,T}}{N_{T}}. \end{align}

Our estimated variance of Δ^A\hat{\Delta}_{A} is σ^ΔA2=Vadj,C+Vadj,T\hat{\sigma}^{2}_{\Delta_{A}} = V_{adj, C} + V_{adj, T}.

Relative case

For relative inference (i.e., estimating lift), the parameter of interest is

ΔR=μTμCμC.\begin{align} \Delta_{R}&=\frac{\mu_{T}-\mu_{C}}{\mu_{C}}. \end{align}

Our estimate of ΔR\Delta_{R} is the difference in adjusted means divided by the control mean:

Δ^R=(YˉTθXˉT)(YˉCθXˉC)YˉC.\begin{align} \hat{\Delta}_{R} = \frac{\left(\bar{Y}_{T} - \theta\bar{X}_{T}\right) - \left(\bar{Y}_{C} - \theta\bar{X}_{C}\right)}{\bar{Y}_{C}}. \end{align}

To derive σ^ΔR2\hat{\sigma}^{2}_{\Delta_{R}}, the estimated variance of Δ^R\hat{\Delta}_{R}, we use the delta method.

  1. Define the control (treatment) population post-exposure variance as σYC2\sigma^{2}_{YC} (σYT2\sigma^{2}_{YT}).
  2. Define the control (treatment) population pre-exposure variance as σXC2\sigma^{2}_{XC} (σXT2\sigma^{2}_{XT}).
  3. Define the covariance of the sample control means ΛC=Cov[YˉC,XˉC]=(σY,C2σXY,CσXY,CσX,C2)/NC \boldsymbol{\Lambda}_{C} = \text{Cov}\left[\bar{Y}_{C}, \bar{X}_{C}\right] =\begin{pmatrix} \sigma^{2}_{Y,C} & \sigma*{XY,C}\\ \sigma*{XY,C} & \sigma^{2}_{X,C} \end{pmatrix}/ N_{C}.
  4. Define the covariance of the sample treatment means ΛT=Cov[YˉT,XˉT]=(σY,T2σXY,TσXY,TσX,T2)/NT \boldsymbol{\Lambda}_{T} = \text{Cov}\left[\bar{Y}_{T}, \bar{X}_{T}\right] =\begin{pmatrix} \sigma^{2}_{Y,T} & \sigma*{XY,T}\\ \sigma*{XY,T} & \sigma^{2}_{X,T} \end{pmatrix}/ N_{T}.
  5. Define the vector of population means β0=[μYT,μXT,μYC,μXC].\boldsymbol{\beta}_{0} = \left[\mu_{YT}, \mu_{XT}, \mu_{YC}, \mu_{XC} \right].
  6. Define their sample counterparts as β^=[YˉT,XˉT,YˉC,XˉC].\hat{\boldsymbol{\beta}} = \left[\bar{Y}_{T}, \bar{X}_{T}, \bar{Y}_{C}, \bar{X}_{C} \right].
  7. Define Λ=Cov(β^)=(ΛT00ΛC),\boldsymbol{\Lambda} = \text{Cov}\left(\hat{\boldsymbol{\beta}}\right) = \begin{pmatrix} \boldsymbol{\Lambda}_{T} & \textbf{0}\\ \textbf{0} & \boldsymbol{\Lambda}_{C} \end{pmatrix}, where 0\textbf{0} is a 2×22 \times 2 matrix of zeros.

By the multivariate central limit theorem:

β^MVN(β0,Λ).\begin{align} \hat{\boldsymbol{\beta}} \stackrel{}{\sim}\mathcal{MVN}\left(\boldsymbol{\beta}_{0},\boldsymbol{\Lambda}\right). \end{align}

For vector β\boldsymbol{\beta}, define its kthk^{\text{th}} element as β[k]\beta[k].
Define the function g(β;θ)=(β[1]θβ[2])(β[3]θβ[4])β[3].g(\boldsymbol{\beta}; \theta) = \frac{\left(\beta[1] - \theta\beta[2]\right) - \left(\beta[3] - \theta\beta[4]\right)}{\beta[3]}.

Define the vector of partial derivatives as r=g(β)β\boldsymbol{\nabla}_{r} = \frac{\partial g(\boldsymbol{\beta})}{\partial\boldsymbol{\beta}}, where the individual elements are

[1]=1β[3][2]=θβ[3][3]=β[3](β[1]θβ[2]β[3]+θβ[4])β[3]2=β[1]+θβ[2]θβ[4]β[3]2[4]=θβ[3].\begin{align*} \boldsymbol{\nabla}[1] &= \frac{1}{\boldsymbol{\beta}[3]} \\\boldsymbol{\nabla}[2] &= \frac{-\theta}{\boldsymbol{\beta}[3]} \\\boldsymbol{\nabla}[3] &= \frac{-\boldsymbol{\beta}[3] - \left(\boldsymbol{\beta}[1] - \theta\boldsymbol{\beta}[2] - \boldsymbol{\beta}[3] + \theta\boldsymbol{\beta}[4]\right) }{\boldsymbol{\beta}[3]^{2}} = \frac{ -\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4] }{\boldsymbol{\beta}[3]^{2}} \\\boldsymbol{\nabla}[4] &= \frac{\theta}{\boldsymbol{\beta}[3]}. \end{align*}

By the delta method, Δ^r=g(β^)N(Δr=g(β),rΛr)\hat{\Delta}_{r} = g(\hat{\boldsymbol{\beta}}) \stackrel{}{\sim}\mathcal{N}\left(\Delta_{r} = g\left(\boldsymbol{\beta}\right), \boldsymbol{\nabla}_{r}^{\top}\Lambda\boldsymbol{\nabla}_{r} \right).

Decompose r\boldsymbol{\nabla}_{r} into r=[r[1:2],r[3:4]].\boldsymbol{\nabla}_{r} = \left[ \boldsymbol{\nabla}_{r}[1:2], \boldsymbol{\nabla}_{r}[3:4] \right].

Then the final variance

rΛr=r[1:2]ΛTr[1:2]+r[3:4]ΛCr[3:4]=[1β[3],θβ[3]](σY,T2σXY,TσXY,TσX,T2)/NT[1β[3],θβ[3]]+[β[1]+θβ[2]θβ[4]β[3]2,θβ[3]](σY,C2σXY,CσXY,CσX,C2)/NC[β[1]+θβ[2]θβ[4]β[3]2,θβ[3]]=1NTβ[3]2[1,θ](σY,T2σXY,TσXY,TσX,T2)[1,θ]+1NCβ[3]2[β[1]+θβ[2]θβ[4]β[3],θ](σY,C2σXY,CσXY,CσX,C2)[β[1]+θβ[2]θβ[4]β[3],θ]=σY,T2+θ2σX,T22σXY,TNTβ[3]2+1NCβ[3]2[β[1]+θβ[2]θβ[4]β[3],θ](σY,C2(β[1]+θβ[2]θβ[4])β[3]+θσXY,CσXY,C(β[1]+θβ[2]θβ[4])β[3]+θσX,C2)=σY,T2+θ2σX,T22σXY,TNTβ[3]2+1NCβ[3]2(σY,C2(β[1]+θβ[2]θβ[4])2β[3]2+2θσXY,C(β[1]+θβ[2]θβ[4])β[3]+θ2σX,C2)=σY,T2+θ2σX,T22σXY,TNTYˉC2+1NCYˉC2(σY,C2(YˉT+θXˉTθXˉC)2YˉC2+2θσXY,C(YˉT+θXˉTθXˉC)YˉC+θ2σX,C2),\begin{align} \boldsymbol{\nabla}_{r}^{\top}\Lambda\boldsymbol{\nabla}_{r} &= \boldsymbol{\nabla}_{r}[1:2]^{\top} \boldsymbol{\Lambda}_{T} \boldsymbol{\nabla}_{r}[1:2] + \boldsymbol{\nabla}_{r}[3:4]^{\top} \boldsymbol{\Lambda}_{C} \boldsymbol{\nabla}_{r}[3:4] \nonumber\\&= \left[\frac{1}{\boldsymbol{\beta}[3]}, \frac{-\theta}{\boldsymbol{\beta}[3]}\right] ^{\top} \begin{pmatrix} \sigma^{2}_{Y,T} & \sigma_{XY,T} \\ \sigma_{XY,T} & \sigma^{2}_{X,T} \end{pmatrix}/ N_{T} \left[\frac{1}{\boldsymbol{\beta}[3]}, \frac{-\theta}{\boldsymbol{\beta}[3]}\right] \nonumber\\&+ \left[\frac{ -\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4] }{\boldsymbol{\beta}[3]^{2}}, \frac{\theta}{\boldsymbol{\beta}[3]}\right] \begin{pmatrix} \sigma^{2}_{Y,C} & \sigma_{XY,C} \\ \sigma_{XY,C} & \sigma^{2}_{X,C} \end{pmatrix}/ N_{C} \left[\frac{ -\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4] }{\boldsymbol{\beta}[3]^{2}}, \frac{\theta}{\boldsymbol{\beta}[3]}\right] \nonumber\\&= \frac{1}{N_{T}\boldsymbol{\beta}[3]^{2}}\left[1, -\theta\right] ^{\top} \begin{pmatrix} \sigma^{2}_{Y,T} & \sigma_{XY,T} \\ \sigma_{XY,T} & \sigma^{2}_{X,T} \end{pmatrix} \left[1, -\theta\right] \nonumber\\&+ \frac{1}{N_{C}\boldsymbol{\beta}[3]^{2}} \left[\frac{ -\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4] }{\boldsymbol{\beta}[3]}, \theta\right] \begin{pmatrix} \sigma^{2}_{Y,C} & \sigma_{XY,C} \\ \sigma_{XY,C} & \sigma^{2}_{X,C} \end{pmatrix} \left[\frac{ -\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4] }{\boldsymbol{\beta}[3]}, \theta\right] \nonumber\\&=\frac{ \sigma^{2}_{Y,T} +\theta^{2}\sigma^{2}_{X,T} -2\sigma_{XY,T} }{N_{T}\boldsymbol{\beta}[3]^{2}} \nonumber\\&+ \frac{1}{N_{C}\boldsymbol{\beta}[3]^{2}} \left[\frac{ -\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4] }{\boldsymbol{\beta}[3]}, \theta\right] \begin{pmatrix} \frac{\sigma^{2}_{Y,C} \left( -\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4] \right)}{ \boldsymbol{\beta}[3] } + \theta\sigma_{XY,C} \\ \frac{\sigma_{XY,C} \left( -\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4] \right)}{\boldsymbol{\beta}[3]} +\theta\sigma^{2}_{X,C} \end{pmatrix} \nonumber\\&=\frac{ \sigma^{2}_{Y,T} +\theta^{2}\sigma^{2}_{X,T} -2\sigma_{XY,T} }{N_{T}\boldsymbol{\beta}[3]^{2}} \nonumber\\&+ \frac{1}{N_{C}\boldsymbol{\beta}[3]^{2}} \left( \frac{\sigma^{2}_{Y,C} \left( -\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4] \right)^{2}}{\boldsymbol{\beta}[3]^{2}} +2\frac{\theta\sigma_{XY,C} \left(-\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4]\right)}{\boldsymbol{\beta}[3]} +\theta^{2}\sigma^{2}_{X,C} \right) \nonumber\\&=\frac{ \sigma^{2}_{Y,T} +\theta^{2}\sigma^{2}_{X,T} -2\sigma_{XY,T} }{N_{T}\bar{Y}_{C}^{2}} \nonumber\\&+ \frac{1}{N_{C}\bar{Y}_{C}^{2}} \left( \frac{\sigma^{2}_{Y,C} \left( -\bar{Y}_{T} + \theta\bar{X}_{T} - \theta\bar{X}_{C} \right)^{2}}{\bar{Y}_{C}^{2}} +2\frac{\theta\sigma_{XY,C} \left(-\bar{Y}_{T} + \theta\bar{X}_{T} - \theta\bar{X}_{C}\right)}{\bar{Y}_{C}} +\theta^{2}\sigma^{2}_{X,C} \right), \end{align}

where in the last step we move away from β\boldsymbol{\beta} notation and use sample mean notation.

For estimating uncertainty in production, we use

σ^ΔR2=σY,T2+θ2σX,T22σXY,TNTYˉC2+1NCYˉC2(σY,C2(YˉT)2YˉC2+2θσXY,C(YˉT)YˉC+θ2σX,C2),\begin{align} \hat{\sigma}^{2}_{\Delta_{R}}&=\frac{ \sigma^{2}_{Y,T} +\theta^{2}\sigma^{2}_{X,T} -2\sigma_{XY,T} }{N_{T}\bar{Y}_{C}^{2}} \nonumber\\&+ \frac{1}{N_{C}\bar{Y}_{C}^{2}} \left( \frac{\sigma^{2}_{Y,C} \left( -\bar{Y}_{T} \right)^{2}}{\bar{Y}_{C}^{2}} +2\frac{\theta\sigma_{XY,C} \left(-\bar{Y}_{T}\right)}{\bar{Y}_{C}} +\theta^{2}\sigma^{2}_{X,C} \right), \end{align}

which leverages the fact that the pre-exposure revenue population means are equal due to randomization.