Skip to main content

Technical CUPED details

Here we document the technical details behind GrowthBook CUPED variance estimates. We begin with revenue metrics, and then move to ratio metrics.

Mean or Binomial Metric

We use the notation below. We describe our approach in terms of revenue, but any Mean or Binomial metric can be substituted.

  1. Define YCY_{C} (YT)\left(Y_{T}\right) as the observed post-exposure revenue for a user exposed to control (treatment).
  2. Define XCX_{C} (XT)\left(X_{T}\right) as the observed pre-exposure revenue for a user exposed to control (treatment).
  3. Define YY (XX) as the post-exposure (pre-exposure) revenue for all users collectively in the experiment.
  4. Define YˉC\bar{Y}_{C} (YˉT)\left(\bar{Y}_{T}\right) as the sample average post-exposure revenue for users exposed to control (treatment).
  5. Define μC\mu_{C} (μT)\left(\mu_{T}\right) as the population average post-exposure revenue for users exposed to control (treatment).
  6. Define NCN_{C} (NT)\left(N_{T}\right) as the number of users exposed to control (treatment).

Mean or Binomial Metric, Absolute case

For absolute inference, our target parameter is

ΔA=μYμYC.\begin{align} \Delta_{A}&=\mu_{Y}-\mu_{YC}. \end{align}

As described in Equation 4 of (Deng et al. 2013), we find the optimal θ\theta using user data across both control and treatment: θ=cov(Y,X)/var(X).\theta = cov(Y, X) / var(X). Our estimate of ΔA\Delta_{A} is the difference in adjusted means

Δ^A=(YˉTθXˉT)(YˉCθXˉC).\begin{align} \hat{\Delta}_{A} &= \left(\bar{Y}_{T} - \theta\bar{X}_{T}\right) - \left(\bar{Y}_{C} - \theta\bar{X}_{C}\right). \end{align}

Under a superpopulation framework and independence of random assignment, the adjusted means (YˉTθXˉT)\left(\bar{Y}_{T} - \theta\bar{X}_{T}\right) and (YˉCθXˉC)\left(\bar{Y}_{C} - \theta\bar{X}_{C}\right) are statistically independent.
Therefore, the variance of the difference in adjusted means is the sum of the variances of the adjusted means.
We denote these variances as Vadj,CV_{adj, C} and Vadj,TV_{adj, T}, respectively, and they are defined as Define the control (treatment) population covariance between post-exposure and pre-exposure revenue as σXY,C\sigma_{XY,C} (σXY,T\sigma_{XY,T}).

Vadj,C=σYC2+θ2σXC22θσXY,CNCVadj,T=σYT2+θ2σXT22θσXY,TNT.\begin{align} V_{adj, C} &= \frac{\sigma^{2}_{YC} + \theta^{2}\sigma^{2}_{XC} - 2\theta\sigma_{XY,C}}{N_{C}}\\ V_{adj, T} &= \frac{\sigma^{2}_{YT} + \theta^{2}\sigma^{2}_{XT} - 2\theta\sigma_{XY,T}}{N_{T}}. \end{align}

Our estimated variance of Δ^A\hat{\Delta}_{A} is σ^ΔA2=Vadj,C+Vadj,T\hat{\sigma}^{2}_{\Delta_{A}} = V_{adj, C} + V_{adj, T}.

Mean or Binomial Metric, Relative case

For relative inference (i.e., estimating lift), the parameter of interest is

ΔR=μTμCμC.\begin{align} \Delta_{R}&=\frac{\mu_{T}-\mu_{C}}{\mu_{C}}. \end{align}

Our estimate of ΔR\Delta_{R} is the difference in adjusted means divided by the control mean:

Δ^R=(YˉTθXˉT)(YˉCθXˉC)YˉC.\begin{align} \hat{\Delta}_{R} = \frac{\left(\bar{Y}_{T} - \theta\bar{X}_{T}\right) - \left(\bar{Y}_{C} - \theta\bar{X}_{C}\right)}{\bar{Y}_{C}}. \end{align}

To derive σ^ΔR2\hat{\sigma}^{2}_{\Delta_{R}}, the estimated variance of Δ^R\hat{\Delta}_{R}, we use the delta method.

  1. Define the control (treatment) population post-exposure variance as σYC2\sigma^{2}_{YC} (σYT2\sigma^{2}_{YT}).
  2. Define the control (treatment) population pre-exposure variance as σXC2\sigma^{2}_{XC} (σXT2\sigma^{2}_{XT}).
  3. Define the covariance of the sample control means ΛC=Cov[YˉC,XˉC]=(σY,C2σXY,CσXY,CσX,C2)/NC \boldsymbol{\Lambda}_{C} = \text{Cov}\left[\bar{Y}_{C}, \bar{X}_{C}\right] =\begin{pmatrix} \sigma^{2}_{Y,C} & \sigma*{XY,C}\\ \sigma*{XY,C} & \sigma^{2}_{X,C} \end{pmatrix}/ N_{C}.
  4. Define the covariance of the sample treatment means ΛT=Cov[YˉT,XˉT]=(σY,T2σXY,TσXY,TσX,T2)/NT \boldsymbol{\Lambda}_{T} = \text{Cov}\left[\bar{Y}_{T}, \bar{X}_{T}\right] =\begin{pmatrix} \sigma^{2}_{Y,T} & \sigma*{XY,T}\\ \sigma*{XY,T} & \sigma^{2}_{X,T} \end{pmatrix}/ N_{T}.
  5. Define the vector of population means β0=[μYT,μXT,μYC,μXC].\boldsymbol{\beta}_{0} = \left[\mu_{YT}, \mu_{XT}, \mu_{YC}, \mu_{XC} \right].
  6. Define their sample counterparts as β^=[YˉT,XˉT,YˉC,XˉC].\hat{\boldsymbol{\beta}} = \left[\bar{Y}_{T}, \bar{X}_{T}, \bar{Y}_{C}, \bar{X}_{C} \right].
  7. Define Λ=Cov(β^)=(ΛT00ΛC),\boldsymbol{\Lambda} = \text{Cov}\left(\hat{\boldsymbol{\beta}}\right) = \begin{pmatrix} \boldsymbol{\Lambda}_{T} & \textbf{0}\\ \textbf{0} & \boldsymbol{\Lambda}_{C} \end{pmatrix}, where 0\textbf{0} is a 2×22 \times 2 matrix of zeros.

By the multivariate central limit theorem:

β^MVN(β0,Λ).\begin{align} \hat{\boldsymbol{\beta}} \stackrel{}{\sim}\mathcal{MVN}\left(\boldsymbol{\beta}_{0},\boldsymbol{\Lambda}\right). \end{align}

For vector β\boldsymbol{\beta}, define its kthk^{\text{th}} element as β[k]\beta[k].
Define the function g(β;θ)=(β[1]θβ[2])(β[3]θβ[4])β[3].g(\boldsymbol{\beta}; \theta) = \frac{\left(\beta[1] - \theta\beta[2]\right) - \left(\beta[3] - \theta\beta[4]\right)}{\beta[3]}.

Define the vector of partial derivatives as r=g(β)β\boldsymbol{\nabla}_{r} = \frac{\partial g(\boldsymbol{\beta})}{\partial\boldsymbol{\beta}}, where the individual elements are

[1]=1β[3][2]=θβ[3][3]=β[3](β[1]θβ[2]β[3]+θβ[4])β[3]2=β[1]+θβ[2]θβ[4]β[3]2[4]=θβ[3].\begin{align*} \boldsymbol{\nabla}[1] &= \frac{1}{\boldsymbol{\beta}[3]} \\\boldsymbol{\nabla}[2] &= \frac{-\theta}{\boldsymbol{\beta}[3]} \\\boldsymbol{\nabla}[3] &= \frac{-\boldsymbol{\beta}[3] - \left(\boldsymbol{\beta}[1] - \theta\boldsymbol{\beta}[2] - \boldsymbol{\beta}[3] + \theta\boldsymbol{\beta}[4]\right) }{\boldsymbol{\beta}[3]^{2}} = \frac{ -\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4] }{\boldsymbol{\beta}[3]^{2}} \\\boldsymbol{\nabla}[4] &= \frac{\theta}{\boldsymbol{\beta}[3]}. \end{align*}

By the delta method, Δ^r=g(β^)N(Δr=g(β),rΛr)\hat{\Delta}_{r} = g(\hat{\boldsymbol{\beta}}) \stackrel{}{\sim}\mathcal{N}\left(\Delta_{r} = g\left(\boldsymbol{\beta}\right), \boldsymbol{\nabla}_{r}^{\top}\Lambda\boldsymbol{\nabla}_{r} \right).

Decompose r\boldsymbol{\nabla}_{r} into r=[r[1:2],r[3:4]].\boldsymbol{\nabla}_{r} = \left[ \boldsymbol{\nabla}_{r}[1:2], \boldsymbol{\nabla}_{r}[3:4] \right].

Then the final variance

rΛr=r[1:2]ΛTr[1:2]+r[3:4]ΛCr[3:4]=[1β[3],θβ[3]](σY,T2σXY,TσXY,TσX,T2)/NT[1β[3],θβ[3]]+[β[1]+θβ[2]θβ[4]β[3]2,θβ[3]](σY,C2σXY,CσXY,CσX,C2)/NC[β[1]+θβ[2]θβ[4]β[3]2,θβ[3]]=1NTβ[3]2[1,θ](σY,T2σXY,TσXY,TσX,T2)[1,θ]+1NCβ[3]2[β[1]+θβ[2]θβ[4]β[3],θ](σY,C2σXY,CσXY,CσX,C2)[β[1]+θβ[2]θβ[4]β[3],θ]=σY,T2+θ2σX,T22σXY,TNTβ[3]2+1NCβ[3]2[β[1]+θβ[2]θβ[4]β[3],θ](σY,C2(β[1]+θβ[2]θβ[4])β[3]+θσXY,CσXY,C(β[1]+θβ[2]θβ[4]⚠︎)β[3]+θσX,C2)=σY,T2+θ2σX,T22σXY,TNTβ[3]2+1NCβ[3]2(σY,C2(β[1]+θβ[2]θβ[4])2β[3]2+2θσXY,C(β[1]+θβ[2]θβ[4])β[3]+θ2σX,C2)=σY,T2+θ2σX,T22σXY,TNTYˉC2+1NCYˉC2(σY,C2(YˉT+θXˉTθXˉC)2YˉC2+2θσXY,C(YˉT+θXˉTθXˉC)YˉC+θ2σX,C2),\begin{align} \boldsymbol{\nabla}_{r}^{\top}\Lambda\boldsymbol{\nabla}_{r} &= \boldsymbol{\nabla}_{r}[1:2]^{\top} \boldsymbol{\Lambda}_{T} \boldsymbol{\nabla}_{r}[1:2] + \boldsymbol{\nabla}_{r}[3:4]^{\top} \boldsymbol{\Lambda}_{C} \boldsymbol{\nabla}_{r}[3:4] \nonumber\\&= \left[\frac{1}{\boldsymbol{\beta}[3]}, \frac{-\theta}{\boldsymbol{\beta}[3]}\right] ^{\top} \begin{pmatrix} \sigma^{2}_{Y,T} & \sigma_{XY,T} \\ \sigma_{XY,T} & \sigma^{2}_{X,T} \end{pmatrix}/ N_{T} \left[\frac{1}{\boldsymbol{\beta}[3]}, \frac{-\theta}{\boldsymbol{\beta}[3]}\right] \nonumber\\&+ \left[\frac{ -\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4] }{\boldsymbol{\beta}[3]^{2}}, \frac{\theta}{\boldsymbol{\beta}[3]}\right] \begin{pmatrix} \sigma^{2}_{Y,C} & \sigma_{XY,C} \\ \sigma_{XY,C} & \sigma^{2}_{X,C} \end{pmatrix}/ N_{C} \left[\frac{ -\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4] }{\boldsymbol{\beta}[3]^{2}}, \frac{\theta}{\boldsymbol{\beta}[3]}\right] \nonumber\\&= \frac{1}{N_{T}\boldsymbol{\beta}[3]^{2}}\left[1, -\theta\right] ^{\top} \begin{pmatrix} \sigma^{2}_{Y,T} & \sigma_{XY,T} \\ \sigma_{XY,T} & \sigma^{2}_{X,T} \end{pmatrix} \left[1, -\theta\right] \nonumber\\&+ \frac{1}{N_{C}\boldsymbol{\beta}[3]^{2}} \left[\frac{ -\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4] }{\boldsymbol{\beta}[3]}, \theta\right] \begin{pmatrix} \sigma^{2}_{Y,C} & \sigma_{XY,C} \\ \sigma_{XY,C} & \sigma^{2}_{X,C} \end{pmatrix} \left[\frac{ -\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4] }{\boldsymbol{\beta}[3]}, \theta\right] \nonumber\\&=\frac{ \sigma^{2}_{Y,T} +\theta^{2}\sigma^{2}_{X,T} -2\sigma_{XY,T} }{N_{T}\boldsymbol{\beta}[3]^{2}} \nonumber\\&+ \frac{1}{N_{C}\boldsymbol{\beta}[3]^{2}} \left[\frac{ -\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4] }{\boldsymbol{\beta}[3]}, \theta\right] \begin{pmatrix} \frac{\sigma^{2}_{Y,C} \left( -\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4] \right)}{ \boldsymbol{\beta}[3] } + \theta\sigma_{XY,C} \\ \frac{\sigma_{XY,C} \left( -\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4]⚠︎ \right)}{\boldsymbol{\beta}[3]} +\theta\sigma^{2}_{X,C} \end{pmatrix} \nonumber\\&=\frac{ \sigma^{2}_{Y,T} +\theta^{2}\sigma^{2}_{X,T} -2\sigma_{XY,T} }{N_{T}\boldsymbol{\beta}[3]^{2}} \nonumber\\&+ \frac{1}{N_{C}\boldsymbol{\beta}[3]^{2}} \left( \frac{\sigma^{2}_{Y,C} \left( -\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4] \right)^{2}}{\boldsymbol{\beta}[3]^{2}} +2\frac{\theta\sigma_{XY,C} \left(-\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4]\right)}{\boldsymbol{\beta}[3]} +\theta^{2}\sigma^{2}_{X,C} \right) \nonumber\\&=\frac{ \sigma^{2}_{Y,T} +\theta^{2}\sigma^{2}_{X,T} -2\sigma_{XY,T} }{N_{T}\bar{Y}_{C}^{2}} \nonumber\\&+ \frac{1}{N_{C}\bar{Y}_{C}^{2}} \left( \frac{\sigma^{2}_{Y,C} \left( -\bar{Y}_{T} + \theta\bar{X}_{T} - \theta\bar{X}_{C} \right)^{2}}{\bar{Y}_{C}^{2}} +2\frac{\theta\sigma_{XY,C} \left(-\bar{Y}_{T} + \theta\bar{X}_{T} - \theta\bar{X}_{C}\right)}{\bar{Y}_{C}} +\theta^{2}\sigma^{2}_{X,C} \right), \end{align}

where in the last step we move away from β\boldsymbol{\beta} notation and use sample mean notation.

For estimating uncertainty in production, we use

σ^ΔR2=σY,T2+θ2σX,T22σXY,TNTYˉC2+1NCYˉC2(σY,C2(YˉT)2YˉC2+2θσXY,C(YˉT)YˉC+θ2σX,C2),\begin{align} \hat{\sigma}^{2}_{\Delta_{R}}&=\frac{ \sigma^{2}_{Y,T} +\theta^{2}\sigma^{2}_{X,T} -2\sigma_{XY,T} }{N_{T}\bar{Y}_{C}^{2}} \nonumber\\&+ \frac{1}{N_{C}\bar{Y}_{C}^{2}} \left( \frac{\sigma^{2}_{Y,C} \left( -\bar{Y}_{T} \right)^{2}}{\bar{Y}_{C}^{2}} +2\frac{\theta\sigma_{XY,C} \left(-\bar{Y}_{T}\right)}{\bar{Y}_{C}} +\theta^{2}\sigma^{2}_{X,C} \right), \end{align}

which leverages the fact that the pre-exposure revenue population means are equal due to randomization.

Ratio Metric

Throughout define the kthk^{\text{th}} element of vector x\textbf{x} as x[k]\textbf{x}[k].
Below we define parameters.

  1. Under control, define the numerator (denominator) post-exposure population mean as μMYC\mu_{MYC} (μDYC\mu_{DYC}).
  2. Under treatment, define the numerator (denominator) post-exposure population mean as μMYT\mu_{MYT} (μDYT\mu_{DYT}).
  3. Under control, define the numerator (denominator) pre-exposure population mean as μMXC\mu_{MXC} (μDXC\mu_{DXC}).
  4. Under treatment, define the numerator (denominator) pre-exposure population mean as μMXT\mu_{MXT} (μDXT\mu_{DXT}).

Due to randomization, μMXC\mu_{MXC} equals μDXC\mu_{DXC} and μMXT\mu_{MXT} μDXT\mu_{DXT}, but for bookkeeping purposes, it is easier to have separate parameters.

Ratio Metric, Absolute case

For ratio metrics the absolute parameter of interest is

Δa=μMYTμDYTμMYCμDYC.\begin{align} \Delta_{a}&= \frac{\mu_{MYT}}{\mu_{DYT}} - \frac{\mu_{MYC}}{\mu_{DYC}}. \end{align}

Below we define statistics.

  1. Under control, define the numerator (denominator) post-exposure sample mean as MˉYC\bar{M}_{YC} (DˉYC\bar{D}_{YC}).
  2. Under treatment, define the numerator (denominator) post-exposure sample mean as MˉYT\bar{M}_{YT} (DˉYT\bar{D}_{YT}).
  3. Under control, define the numerator (denominator) pre-exposure sample mean as MˉXC\bar{M}_{XC} (DˉXC\bar{D}_{XC}).
  4. Under treatment, define the numerator (denominator) pre-exposure sample mean as MˉXT\bar{M}_{XT} (DˉXT\bar{D}_{XT}).

Define:

  1. β0=[μMYT,μDYT,μMXT,μDXT,μMYC,μDYC,μMXC,μDXC]\boldsymbol{\beta}_{0} = \left[\mu_{MYT}, \mu_{DYT}, \mu_{MXT}, \mu_{DXT}, \mu_{MYC}, \mu_{DYC}, \mu_{MXC}, \mu_{DXC}\right].
  2. β^=[MˉYT,DˉYT,MˉXT,DˉXT,MˉYC,DˉYC,MˉXC,DˉXC]\hat{\boldsymbol{\beta}} = \left[\bar{M}_{YT}, \bar{D}_{YT}, \bar{M}_{XT}, \bar{D}_{XT}, \bar{M}_{YC}, \bar{D}_{YC}, \bar{M}_{XC}, \bar{D}_{XC} \right].
  3. ΛT=Cov[MˉYT,DˉYT,MˉXT,Dˉ_XT,]NT1(Var(MYT)Cov(MYT,DYT)Var(DYT)Cov(MYT,MXT)Cov(DYT,MXT)Var(MXT)Cov(MYT,DXT)Cov(DYT,DXT)Cov(MXT,DXT)Var(DXT))\boldsymbol{\Lambda}_{T} = \text{Cov}\left[\bar{M}_{YT}, \bar{D}_{YT}, \bar{M}_{XT}, \bar{D}\_{XT},\right] N*{T}^{-1} \begin{pmatrix} \text{Var}\left(M_{YT}\right) & & &\\ \text{Cov}\left(M_{YT}, D_{YT}\right) & \text{Var}\left(D_{YT}\right) & & \\ \text{Cov}\left(M_{YT}, M_{XT}\right) & \text{Cov}\left(D_{YT}, M_{XT}\right) & \text{Var}\left(M_{XT}\right) & \\ \text{Cov}\left(M_{YT}, D_{XT}\right) & \text{Cov}\left(D_{YT}, D_{XT}\right) & \text{Cov}\left(M_{XT}, D_{XT}\right) & \text{Var}\left(D_{XT}\right) \\ \end{pmatrix}
  4. ΛC=Cov[MˉYC,DˉYC,MˉXC,Dˉ_XC,]NC1(Var(MYC)Cov(MYC,DYC)Var(DYC)Cov(MYC,MXC)Cov(DYC,MXC)Var(MXC)Cov(MYC,DXC)Cov(DYC,DXC)Cov(MXC,DXC)Var(DXC))\boldsymbol{\Lambda}_{C} = \text{Cov}\left[\bar{M}_{YC}, \bar{D}_{YC}, \bar{M}_{XC}, \bar{D}\_{XC},\right] N*{C}^{-1} \begin{pmatrix} \text{Var}\left(M_{YC}\right) & & &\\ \text{Cov}\left(M_{YC}, D_{YC}\right) & \text{Var}\left(D_{YC}\right) & & \\ \text{Cov}\left(M_{YC}, M_{XC}\right) & \text{Cov}\left(D_{YC}, M_{XC}\right) & \text{Var}\left(M_{XC}\right) & \\ \text{Cov}\left(M_{YC}, D_{XC}\right) & \text{Cov}\left(D_{YC}, D_{XC}\right) & \text{Cov}\left(M_{XC}, D_{XC}\right) & \text{Var}\left(D_{XC}\right) \\ \end{pmatrix}
  5. Λ=Cov(β^)(ΛT04×404×4ΛC)\boldsymbol{\Lambda} = \text{Cov}\left(\hat{\boldsymbol{\beta}}\right) \begin{pmatrix} \boldsymbol{\Lambda}_{T} & \underset{4\times4}{\textbf{0}}\\ \underset{4\times4}{\textbf{0}} & \boldsymbol{\Lambda}_{C} \\ \end{pmatrix}

Define the function

g(β;θ)a=(β[1]β[2]θβ[3]β[4])(β[5]β[6]θβ[7]β[8]).\begin{align} g\left(\boldsymbol{\beta}; \theta\right)_{a}&= \left(\frac{\boldsymbol{\beta}[1]}{\boldsymbol{\beta}[2]} - \theta\frac{\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]} \right) -\left(\frac{\boldsymbol{\beta}[5]}{\boldsymbol{\beta}[6]} - \theta\frac{\boldsymbol{\beta}[7]}{\boldsymbol{\beta}[8]} \right). \end{align}

The CUPED estimator is

Δ^a=ga(β^;θ)=(β^[1]β^[2]θβ^[3]β^[4])(β^[5]β^[6]θβ^[7]β^[8]).\begin{align} \hat{\Delta}_{a}&=g_{a}(\hat{\boldsymbol{\beta}}; \theta) \\&= \left(\frac{\hat{\boldsymbol{\beta}}[1]}{\hat{\boldsymbol{\beta}}[2]} - \theta\frac{\hat{\boldsymbol{\beta}}[3]}{\hat{\boldsymbol{\beta}}[4]} \right) -\left(\frac{\hat{\boldsymbol{\beta}}[5]}{\hat{\boldsymbol{\beta}}[6]} - \theta\frac{\hat{\boldsymbol{\beta}}[7]}{\hat{\boldsymbol{\beta}}[8]} \right). \end{align}

Define the vector of partial derivatives as a(β;θ)=ga(β)β\boldsymbol{\nabla}_{a}\left(\boldsymbol{\beta}; \theta\right) = \frac{\partial g_{a}(\boldsymbol{\beta})}{\partial\boldsymbol{\beta}}.

[1]=1β[2][2]=β[1]β[2]2[3]=θβ[4][4]=θβ[3]β[4]2[5]=1β[6][6]=β[5]β[6]2[7]=θβ[8][8]=θβ[7]β[8]2.\begin{align*} \boldsymbol{\nabla}[1] &= \frac{1}{\boldsymbol{\beta}[2]} \\\boldsymbol{\nabla}[2] &= \frac{-\boldsymbol{\beta}[1]}{\boldsymbol{\beta}[2]^2} \\\boldsymbol{\nabla}[3] &= \frac{-\theta}{\boldsymbol{\beta}[4]} \\\boldsymbol{\nabla}[4] &= \frac{\theta\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2} \\\boldsymbol{\nabla}[5] &= \frac{-1}{\boldsymbol{\beta}[6]} \\\boldsymbol{\nabla}[6] &= \frac{\boldsymbol{\beta}[5]}{\boldsymbol{\beta}[6]^2} \\\boldsymbol{\nabla}[7] &= \frac{\theta}{\boldsymbol{\beta}[8]} \\\boldsymbol{\nabla}[8] &= \frac{-\theta\boldsymbol{\beta}[7]}{\boldsymbol{\beta}[8]^2}. \end{align*}

Note that a(β^;θ)\boldsymbol{\nabla}_{a}\left(\hat{\boldsymbol{\beta}}; \theta\right)

[1]=1DˉYT[2]=MˉYTDˉYT2[3]=θDˉXT[4]=θMˉXTDˉXT2[5]=1DˉYC[6]=MˉYCDˉYC2[7]=θDˉXC[8]=θMˉXCDˉXC2.\begin{align*} \boldsymbol{\nabla}[1] &= \frac{1}{\bar{D}_{YT}} \\\boldsymbol{\nabla}[2] &= \frac{-\bar{M}_{YT}}{\bar{D}_{YT}^{2}} \\\boldsymbol{\nabla}[3] &= \frac{-\theta}{\bar{D}_{XT}} \\\boldsymbol{\nabla}[4] &= \frac{\theta\bar{M}_{XT}}{\bar{D}_{XT}^2} \\\boldsymbol{\nabla}[5] &= \frac{-1}{\bar{D}_{YC}} \\\boldsymbol{\nabla}[6] &= \frac{\bar{M}_{YC}}{\bar{D}_{YC}^2} \\\boldsymbol{\nabla}[7] &= \frac{\theta}{\bar{D}_{XC}} \\\boldsymbol{\nabla}[8] &= \frac{-\theta\bar{M}_{XC}}{\bar{D}_{XC}^2}. \end{align*}

By the central limit theorem, Δ^a=g(β^)N(Δr=g(β),aΛa)\hat{\Delta}_{a} = g(\hat{\boldsymbol{\beta}}) \stackrel{}{\sim}\mathcal{N}\left(\Delta_{r} = g\left(\boldsymbol{\beta}\right), \boldsymbol{\nabla}_{a}^{\top}\Lambda\boldsymbol{\nabla}_{a} \right).

Decompose a\boldsymbol{\nabla}_{a} into its first four elements (a,T\boldsymbol{\nabla}_{a, T}) and its last four elements (a,C\boldsymbol{\nabla}_{a, C}). Our variance of interest is

σ^Δa2=a(β^;θ)Λa(β^;θ)=a,TΛTa,T+a,CΛCa,C.\begin{align} \hat{\sigma}^{2}_{\Delta_{a}}&= \boldsymbol{\nabla}_{a}\left(\hat{\boldsymbol{\beta}}; \theta\right)^{\top}\boldsymbol{\Lambda}\boldsymbol{\nabla}_{a}\left(\hat{\boldsymbol{\beta}}; \theta\right) \nonumber\\ &= \boldsymbol{\nabla}_{a, T}^{\top}\boldsymbol{\Lambda}_{T}\boldsymbol{\nabla}_{a, T} + \boldsymbol{\nabla}_{a, C}^{\top}\boldsymbol{\Lambda}_{C}\boldsymbol{\nabla}_{a, C}. \end{align}

All of these moments are available via CupedRatioRegressionAdjustedStatistics.

Optimal regression coefficient for ratio metrics

The optimal θ\theta minimizes Equation (14). We can write

a,TΛTa,T=a,T[1:2]ΛT[1:2,1:2]a,T[1:2]+2a,T[1:2]ΛT[3:4,1:2]a,T[3:4]+a,T[3:4]ΛT[3:4,3:4]a,T[3:4]=cT+2θ[1β[2],β[1]β[2]2]ΛT[3:4,1:2][1β[4],β[3]β[4]2]+θ2[1β[4],β[3]β[4]2]ΛT[3:4,3:4][1β[4],β[3]β[4]2]\begin{align*} \boldsymbol{\nabla}_{a, T}^{\top}\boldsymbol{\Lambda}_{T}\boldsymbol{\nabla}_{a, T}&=\boldsymbol{\nabla}_{a, T}[1:2]^{\top}\boldsymbol{\Lambda}_{T}[1:2, 1:2]\boldsymbol{\nabla}_{a, T}[1:2] \nonumber\\&+2\boldsymbol{\nabla}_{a, T}[1:2]^{\top}\boldsymbol{\Lambda}_{T}[3:4, 1:2]\boldsymbol{\nabla}_{a, T}[3:4]+\boldsymbol{\nabla}_{a, T}[3:4]^{\top}\boldsymbol{\Lambda}_{T}[3:4, 3:4]\boldsymbol{\nabla}_{a, T}[3:4] \nonumber\\ &=c_{T} + 2\theta \left[\frac{1}{\boldsymbol{\beta}[2]}, \frac{-\boldsymbol{\beta}[1]}{\boldsymbol{\beta}[2]^2} \right]^{\top} \boldsymbol{\Lambda}_{T}[3:4, 1:2]\left[\frac{-1}{\boldsymbol{\beta}[4]}, \frac{\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2} \right] + \theta^{2}\left[\frac{-1}{\boldsymbol{\beta}[4]}, \frac{\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2}\right]^{\top}\boldsymbol{\Lambda}_{T}[3:4, 3:4]\left[\frac{-1}{\boldsymbol{\beta}[4]}, \frac{\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2}\right] \end{align*}

where cTc_{T} is free of θ\theta.

Similarly we can write

a,CΛCa,C=cC+2θ[1β[2],β[1]β[2]2]ΛC[3:4,1:2][1β[4],β[3]β[4]2]+θ2[1β[8],β[7]β[8]2]ΛC[3:4,3:4][1β[8],β[7]β[8]2]\begin{align*} \boldsymbol{\nabla}_{a, C}^{\top}\boldsymbol{\Lambda}_{C}\boldsymbol{\nabla}_{a, C}&= c_{C} + 2\theta \left[\frac{-1}{\boldsymbol{\beta}[2]}, \frac{\boldsymbol{\beta}[1]}{\boldsymbol{\beta}[2]^2} \right]^{\top} \boldsymbol{\Lambda}_{C}[3:4, 1:2]\left[\frac{1}{\boldsymbol{\beta}[4]}, \frac{-\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2} \right] + \theta^{2}\left[\frac{-1}{\boldsymbol{\beta}[8]}, \frac{\boldsymbol{\beta}[7]}{\boldsymbol{\beta}[8]^2}\right]^{\top}\boldsymbol{\Lambda}_{C}[3:4, 3:4]\left[\frac{-1}{\boldsymbol{\beta}[8]}, \frac{\boldsymbol{\beta}[7]}{\boldsymbol{\beta}[8]^2}\right] \end{align*}

where cCc_{C} is a constant free of θ\theta.

Differentiating the sum of these two equations with respect to θ\theta and setting equal to zero shows that the minimum of this quadratic form occurs at

θopt=[1β[2],β[1]β[2]2]ΛC[3:4,1:2][1β[4],β[3]β[4]2]+[1β[2],β[1]β[2]2]ΛT[3:4,1:2][1β[4],β[3]β[4]2][1β[8],β[7]β[8]2]ΛC[3:4,3:4][1β[8],β[7]β[8]2]+[1β[4],β[3]β[4]2]ΛT[3:4,3:4][1β[4],β[3]β[4]2].\begin{align} \theta_{\text{opt}} &= -\frac{ \left[\frac{-1}{\boldsymbol{\beta}[2]}, \frac{\boldsymbol{\beta}[1]}{\boldsymbol{\beta}[2]^2} \right]^{\top} \boldsymbol{\Lambda}_{C}[3:4, 1:2]\left[\frac{1}{\boldsymbol{\beta}[4]}, \frac{-\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2} \right] + \left[\frac{1}{\boldsymbol{\beta}[2]}, \frac{-\boldsymbol{\beta}[1]}{\boldsymbol{\beta}[2]^2} \right]^{\top} \boldsymbol{\Lambda}_{T}[3:4, 1:2]\left[\frac{-1}{\boldsymbol{\beta}[4]}, \frac{\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2} \right] }{\left[\frac{-1}{\boldsymbol{\beta}[8]}, \frac{\boldsymbol{\beta}[7]}{\boldsymbol{\beta}[8]^2}\right]^{\top}\boldsymbol{\Lambda}_{C}[3:4, 3:4]\left[\frac{-1}{\boldsymbol{\beta}[8]}, \frac{\boldsymbol{\beta}[7]}{\boldsymbol{\beta}[8]^2}\right] + \left[\frac{-1}{\boldsymbol{\beta}[4]}, \frac{\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2}\right]^{\top}\boldsymbol{\Lambda}_{T}[3:4, 3:4]\left[\frac{-1}{\boldsymbol{\beta}[4]}, \frac{\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2}\right] }. \end{align}

The numerator represents the sum of the covariances, the denominator the sum of the variances. This is the same θ\theta as is presented in Appendix B of (Deng et al. 2013).

Ratio Metric, Relative case

For relative inference much of the approach in the previous section works, we simply need to define the appropriate gg function and its partial derivatives.

For ratio metrics the relative parameter of interest is

Δr=μMTYμDTYμMCYμDCYμMCYμDCY=μMTYμDTYμMCYμDCY1.\begin{align} \Delta_{r}&=\frac{\frac{\mu_{MTY}}{\mu_{DTY}} - \frac{\mu_{MCY}}{\mu_{DCY}}}{\frac{\mu_{MCY}}{\mu_{DCY}}} \\&= \frac{\frac{\mu_{MTY}}{\mu_{DTY}}}{\frac{\mu_{MCY}}{\mu_{DCY}}} - 1. \end{align}

Define the function

g(β;θ)r=(β[1]β[2]θβ[3]β[4])(β[5]β[6]θβ[7]β[8])β[5]β[6]\begin{align} g\left(\boldsymbol{\beta}; \theta\right)_{r}&= \frac{ \left(\frac{\boldsymbol{\beta}[1]}{\boldsymbol{\beta}[2]} - \theta\frac{\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]} \right) -\left(\frac{\boldsymbol{\beta}[5]}{\boldsymbol{\beta}[6]} - \theta\frac{\boldsymbol{\beta}[7]}{\boldsymbol{\beta}[8]} \right) }{\frac{\boldsymbol{\beta}[5]}{\boldsymbol{\beta}[6]}} \end{align}

We can consistently estimate Equation (18) with the CUPED estimator g(β^;θ)r.g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r}.

Define the numerator in Equation (18) as g(β;θ)r,numg\left(\boldsymbol{\beta}; \theta\right)_{r, num} and the denominator as g(β;θ)r,deng\left(\boldsymbol{\beta}; \theta\right)_{r, den}.

Define the vector of partial derivatives as r(β;θ)=gr(β)β\boldsymbol{\nabla}_{r}\left(\boldsymbol{\beta}; \theta\right) = \frac{\partial g_{r}(\boldsymbol{\beta})}{\partial\boldsymbol{\beta}}.

[1]=1β[2]g(β;θ)r,den[2]=β[1]β[2]2g(β;θ)r,den[3]=θβ[4]g(β;θ)r,den[4]=θβ[3]β[4]2g(β;θ)r,den[5]=g(β;θ)r,denβ[6]g(β;θ)r,numβ[6]g(β;θ)r,den2[6]=β[5]g(β;θ)r,denβ[6]2+β[5]g(β;θ)r,numβ[6]2g(β;θ)r,den2[7]=θβ[8]g(β;θ)r,den[8]=θβ[7]β[8]2g(β;θ)r,den.\begin{align*} \boldsymbol{\nabla}[1] &= \frac{\frac{1}{\boldsymbol{\beta}[2]} }{g\left(\boldsymbol{\beta}; \theta\right)_{r, den}} \\\boldsymbol{\nabla}[2] &= \frac{\frac{-\boldsymbol{\beta}[1]}{\boldsymbol{\beta}[2]^2}}{g\left(\boldsymbol{\beta}; \theta\right)_{r, den}} \\\boldsymbol{\nabla}[3] &= \frac{ \frac{-\theta}{\boldsymbol{\beta}[4]} }{g\left(\boldsymbol{\beta}; \theta\right)_{r, den}} \\\boldsymbol{\nabla}[4] &= \frac{\frac{\theta\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2}}{g\left(\boldsymbol{\beta}; \theta\right)_{r, den}} \\\boldsymbol{\nabla}[5] &= \frac{ \frac{-g\left(\boldsymbol{\beta}; \theta\right)_{r, den}}{\boldsymbol{\beta}[6]} - \frac{g\left(\boldsymbol{\beta}; \theta\right)_{r, num}}{\boldsymbol{\beta}[6]} }{g\left(\boldsymbol{\beta}; \theta\right)_{r, den}^{2}} \\\boldsymbol{\nabla}[6] &= \frac{ \frac{\boldsymbol{\beta}[5]g\left(\boldsymbol{\beta}; \theta\right)_{r, den}}{\boldsymbol{\beta}[6]^2} + \frac{\boldsymbol{\beta}[5]g\left(\boldsymbol{\beta}; \theta\right)_{r, num}}{\boldsymbol{\beta}[6]^2} }{g\left(\boldsymbol{\beta}; \theta\right)_{r, den}^{2}} \\\boldsymbol{\nabla}[7] &= \frac{\frac{\theta}{\boldsymbol{\beta}[8]} }{g\left(\boldsymbol{\beta}; \theta\right)_{r, den}} \\\boldsymbol{\nabla}[8] &= \frac{ \frac{-\theta\boldsymbol{\beta}[7]}{\boldsymbol{\beta}[8]^2} }{g\left(\boldsymbol{\beta}; \theta\right)_{r, den}}. \end{align*}

Note that

g(β^;θ)r,num=(MˉYTDˉYTθMˉXTDˉXT)(MˉYCDˉYCθMˉXCDˉXC)g(β^;θ)r,den=MˉYCDˉYC\begin{align*} g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, num} &= \left(\frac{\bar{M}_{YT}}{\bar{D}_{YT}} - \theta\frac{\bar{M}_{XT}}{\bar{D}_{XT}} \right) -\left(\frac{\bar{M}_{YC}}{\bar{D}_{YC}} - \theta\frac{\bar{M}_{XC}}{\bar{D}_{XC}} \right) \\g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den} &= \frac{\bar{M}_{YC}}{\bar{D}_{YC}} \end{align*}

Note that r(β^;θ)\boldsymbol{\nabla}_{r}\left(\hat{\boldsymbol{\beta}}; \theta\right) is equal to

[1]=1DˉYTg(β^;θ)r,den[2]=MˉYTDˉYT2g(β^;θ)r,den[3]=θDˉXTg(β^;θ)r,den[4]=θMˉXTDˉXT2g(β^;θ)r,den[5]=g(β^;θ)r,denDˉYCg(β^;θ)r,numDˉYCg(β^;θ)r,den2[6]=MˉYCg(β^;θ)r,denDˉYC2+MˉYCg(β^;θ)r,numDˉYC2g(β^;θ)r,den2[7]=θDˉXCg(β^;θ)r,den[8]=θMˉXCDˉXC2.g(β^;θ)r,den.\begin{align*} \boldsymbol{\nabla}[1] &= \frac{\frac{1}{\bar{D}_{YT}}}{g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}} \\\boldsymbol{\nabla}[2] &= \frac{\frac{-\bar{M}_{YT}}{\bar{D}_{YT}^{2}}}{g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}} \\\boldsymbol{\nabla}[3] &= \frac{ \frac{-\theta}{\bar{D}_{XT}} }{g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}} \\\boldsymbol{\nabla}[4] &= \frac{\frac{\theta\bar{M}_{XT}}{\bar{D}_{XT}^2}}{g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}} \\\boldsymbol{\nabla}[5] &= \frac{ \frac{-g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}}{\bar{D}_{YC}} - \frac{g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, num}}{\bar{D}_{YC}} }{g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}^{2}} \\\boldsymbol{\nabla}[6] &= \frac{ \frac{\bar{M}_{YC}g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}}{\bar{D}_{YC}^2} + \frac{\bar{M}_{YC}g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, num}}{\bar{D}_{YC}^2} }{g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}^{2}} \\\boldsymbol{\nabla}[7] &= \frac{\frac{\theta}{\bar{D}_{XC}} }{g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}} \\\boldsymbol{\nabla}[8] &= \frac{ \frac{-\theta\bar{M}_{XC}}{\bar{D}_{XC}^2}. }{g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}}. \end{align*}