Here we document the technical details behind GrowthBook CUPED variance estimates. We begin with revenue metrics , and then move to ratio metrics .
Mean or Binomial Metric
We use the notation below. We describe our approach in terms of revenue, but any Mean or Binomial metric can be substituted.
Define Y C Y_{C} Y C ( Y T ) \left(Y_{T}\right) ( Y T ) as the observed post-exposure revenue for a user exposed to control (treatment).
Define X C X_{C} X C ( X T ) \left(X_{T}\right) ( X T ) as the observed pre-exposure revenue for a user exposed to control (treatment).
Define Y Y Y (X X X ) as the post-exposure (pre-exposure) revenue for all users collectively in the experiment.
Define Y ˉ C \bar{Y}_{C} Y ˉ C ( Y ˉ T ) \left(\bar{Y}_{T}\right) ( Y ˉ T ) as the sample average post-exposure revenue for users exposed to control (treatment).
Define μ C \mu_{C} μ C ( μ T ) \left(\mu_{T}\right) ( μ T ) as the population average post-exposure revenue for users exposed to control (treatment).
Define N C N_{C} N C ( N T ) \left(N_{T}\right) ( N T ) as the number of users exposed to control (treatment).
Mean or Binomial Metric, Absolute case
For absolute inference, our target parameter is
Δ A = μ Y − μ Y C . \begin{align}
\Delta_{A}&=\mu_{Y}-\mu_{YC}.
\end{align} Δ A = μ Y − μ Y C .
As described in Equation 4 of (Deng et al. 2013 ), we find the optimal θ \theta θ using user data across both control and treatment:
θ = c o v ( Y , X ) / v a r ( X ) . \theta = cov(Y, X) / var(X). θ = co v ( Y , X ) / v a r ( X ) .
Our estimate of Δ A \Delta_{A} Δ A is the difference in adjusted means
Δ ^ A = ( Y ˉ T − θ X ˉ T ) − ( Y ˉ C − θ X ˉ C ) . \begin{align}
\hat{\Delta}_{A} &= \left(\bar{Y}_{T} - \theta\bar{X}_{T}\right) - \left(\bar{Y}_{C} - \theta\bar{X}_{C}\right).
\end{align} Δ ^ A = ( Y ˉ T − θ X ˉ T ) − ( Y ˉ C − θ X ˉ C ) .
Under a superpopulation framework and independence of random assignment, the adjusted means ( Y ˉ T − θ X ˉ T ) \left(\bar{Y}_{T} - \theta\bar{X}_{T}\right) ( Y ˉ T − θ X ˉ T ) and ( Y ˉ C − θ X ˉ C ) \left(\bar{Y}_{C} - \theta\bar{X}_{C}\right) ( Y ˉ C − θ X ˉ C ) are statistically independent.
Therefore, the variance of the difference in adjusted means is the sum of the variances of the adjusted means.
We denote these variances as V a d j , C V_{adj, C} V a d j , C and V a d j , T V_{adj, T} V a d j , T , respectively, and they are defined as
Define the control (treatment) population covariance between post-exposure and pre-exposure revenue as σ X Y , C \sigma_{XY,C} σ X Y , C (σ X Y , T \sigma_{XY,T} σ X Y , T ).
V a d j , C = σ Y C 2 + θ 2 σ X C 2 − 2 θ σ X Y , C N C V a d j , T = σ Y T 2 + θ 2 σ X T 2 − 2 θ σ X Y , T N T . \begin{align}
V_{adj, C} &= \frac{\sigma^{2}_{YC} + \theta^{2}\sigma^{2}_{XC} - 2\theta\sigma_{XY,C}}{N_{C}}\\
V_{adj, T} &= \frac{\sigma^{2}_{YT} + \theta^{2}\sigma^{2}_{XT} - 2\theta\sigma_{XY,T}}{N_{T}}.
\end{align} V a d j , C V a d j , T = N C σ Y C 2 + θ 2 σ XC 2 − 2 θ σ X Y , C = N T σ Y T 2 + θ 2 σ XT 2 − 2 θ σ X Y , T .
Our estimated variance of Δ ^ A \hat{\Delta}_{A} Δ ^ A is σ ^ Δ A 2 = V a d j , C + V a d j , T \hat{\sigma}^{2}_{\Delta_{A}} = V_{adj, C} + V_{adj, T} σ ^ Δ A 2 = V a d j , C + V a d j , T .
Mean or Binomial Metric, Relative case
For relative inference (i.e., estimating lift), the parameter of interest is
Δ R = μ T − μ C μ C . \begin{align}
\Delta_{R}&=\frac{\mu_{T}-\mu_{C}}{\mu_{C}}.
\end{align} Δ R = μ C μ T − μ C .
Our estimate of Δ R \Delta_{R} Δ R is the difference in adjusted means divided by the control mean:
Δ ^ R = ( Y ˉ T − θ X ˉ T ) − ( Y ˉ C − θ X ˉ C ) Y ˉ C . \begin{align}
\hat{\Delta}_{R} = \frac{\left(\bar{Y}_{T} - \theta\bar{X}_{T}\right) - \left(\bar{Y}_{C} - \theta\bar{X}_{C}\right)}{\bar{Y}_{C}}.
\end{align} Δ ^ R = Y ˉ C ( Y ˉ T − θ X ˉ T ) − ( Y ˉ C − θ X ˉ C ) .
To derive σ ^ Δ R 2 \hat{\sigma}^{2}_{\Delta_{R}} σ ^ Δ R 2 , the estimated variance of Δ ^ R \hat{\Delta}_{R} Δ ^ R , we use the delta method.
Define the control (treatment) population post-exposure variance as σ Y C 2 \sigma^{2}_{YC} σ Y C 2 (σ Y T 2 \sigma^{2}_{YT} σ Y T 2 ).
Define the control (treatment) population pre-exposure variance as σ X C 2 \sigma^{2}_{XC} σ XC 2 (σ X T 2 \sigma^{2}_{XT} σ XT 2 ).
Define the covariance of the sample control means Λ C = Cov [ Y ˉ C , X ˉ C ] = ( σ Y , C 2 σ ∗ X Y , C σ ∗ X Y , C σ X , C 2 ) / N C \boldsymbol{\Lambda}_{C} = \text{Cov}\left[\bar{Y}_{C}, \bar{X}_{C}\right] =\begin{pmatrix}
\sigma^{2}_{Y,C} & \sigma*{XY,C}\\
\sigma*{XY,C} & \sigma^{2}_{X,C}
\end{pmatrix}/ N_{C} Λ C = Cov [ Y ˉ C , X ˉ C ] = ( σ Y , C 2 σ ∗ X Y , C σ ∗ X Y , C σ X , C 2 ) / N C .
Define the covariance of the sample treatment means Λ T = Cov [ Y ˉ T , X ˉ T ] = ( σ Y , T 2 σ ∗ X Y , T σ ∗ X Y , T σ X , T 2 ) / N T \boldsymbol{\Lambda}_{T} = \text{Cov}\left[\bar{Y}_{T}, \bar{X}_{T}\right] =\begin{pmatrix}
\sigma^{2}_{Y,T} & \sigma*{XY,T}\\
\sigma*{XY,T} & \sigma^{2}_{X,T}
\end{pmatrix}/ N_{T} Λ T = Cov [ Y ˉ T , X ˉ T ] = ( σ Y , T 2 σ ∗ X Y , T σ ∗ X Y , T σ X , T 2 ) / N T .
Define the vector of population means β 0 = [ μ Y T , μ X T , μ Y C , μ X C ] . \boldsymbol{\beta}_{0} = \left[\mu_{YT}, \mu_{XT}, \mu_{YC}, \mu_{XC} \right]. β 0 = [ μ Y T , μ XT , μ Y C , μ XC ] .
Define their sample counterparts as β ^ = [ Y ˉ T , X ˉ T , Y ˉ C , X ˉ C ] . \hat{\boldsymbol{\beta}} = \left[\bar{Y}_{T}, \bar{X}_{T}, \bar{Y}_{C}, \bar{X}_{C} \right]. β ^ = [ Y ˉ T , X ˉ T , Y ˉ C , X ˉ C ] .
Define Λ = Cov ( β ^ ) = ( Λ T 0 0 Λ C ) , \boldsymbol{\Lambda} = \text{Cov}\left(\hat{\boldsymbol{\beta}}\right) = \begin{pmatrix}
\boldsymbol{\Lambda}_{T} & \textbf{0}\\
\textbf{0} & \boldsymbol{\Lambda}_{C}
\end{pmatrix}, Λ = Cov ( β ^ ) = ( Λ T 0 0 Λ C ) ,
where 0 \textbf{0} 0 is a 2 × 2 2 \times 2 2 × 2 matrix of zeros.
By the multivariate central limit theorem:
β ^ ∼ M V N ( β 0 , Λ ) . \begin{align}
\hat{\boldsymbol{\beta}}
\stackrel{}{\sim}\mathcal{MVN}\left(\boldsymbol{\beta}_{0},\boldsymbol{\Lambda}\right).
\end{align} β ^ ∼ M V N ( β 0 , Λ ) .
For vector β \boldsymbol{\beta} β , define its k th k^{\text{th}} k th element as β [ k ] \beta[k] β [ k ] .
Define the function
g ( β ; θ ) = ( β [ 1 ] − θ β [ 2 ] ) − ( β [ 3 ] − θ β [ 4 ] ) β [ 3 ] . g(\boldsymbol{\beta}; \theta) = \frac{\left(\beta[1] - \theta\beta[2]\right) - \left(\beta[3] - \theta\beta[4]\right)}{\beta[3]}. g ( β ; θ ) = β [ 3 ] ( β [ 1 ] − θβ [ 2 ] ) − ( β [ 3 ] − θβ [ 4 ] ) .
Define the vector of partial derivatives as ∇ r = ∂ g ( β ) ∂ β \boldsymbol{\nabla}_{r} = \frac{\partial g(\boldsymbol{\beta})}{\partial\boldsymbol{\beta}} ∇ r = ∂ β ∂ g ( β ) , where the individual elements are
∇ [ 1 ] = 1 β [ 3 ] ∇ [ 2 ] = − θ β [ 3 ] ∇ [ 3 ] = − β [ 3 ] − ( β [ 1 ] − θ β [ 2 ] − β [ 3 ] + θ β [ 4 ] ) β [ 3 ] 2 = − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] β [ 3 ] 2 ∇ [ 4 ] = θ β [ 3 ] . \begin{align*}
\boldsymbol{\nabla}[1] &= \frac{1}{\boldsymbol{\beta}[3]}
\\\boldsymbol{\nabla}[2] &= \frac{-\theta}{\boldsymbol{\beta}[3]}
\\\boldsymbol{\nabla}[3] &= \frac{-\boldsymbol{\beta}[3]
-
\left(\boldsymbol{\beta}[1] - \theta\boldsymbol{\beta}[2] - \boldsymbol{\beta}[3] + \theta\boldsymbol{\beta}[4]\right)
}{\boldsymbol{\beta}[3]^{2}}
= \frac{
-\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4]
}{\boldsymbol{\beta}[3]^{2}}
\\\boldsymbol{\nabla}[4] &= \frac{\theta}{\boldsymbol{\beta}[3]}.
\end{align*} ∇ [ 1 ] ∇ [ 2 ] ∇ [ 3 ] ∇ [ 4 ] = β [ 3 ] 1 = β [ 3 ] − θ = β [ 3 ] 2 − β [ 3 ] − ( β [ 1 ] − θ β [ 2 ] − β [ 3 ] + θ β [ 4 ] ) = β [ 3 ] 2 − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] = β [ 3 ] θ .
By the delta method,
Δ ^ r = g ( β ^ ) ∼ N ( Δ r = g ( β ) , ∇ r ⊤ Λ ∇ r ) \hat{\Delta}_{r} = g(\hat{\boldsymbol{\beta}}) \stackrel{}{\sim}\mathcal{N}\left(\Delta_{r} = g\left(\boldsymbol{\beta}\right), \boldsymbol{\nabla}_{r}^{\top}\Lambda\boldsymbol{\nabla}_{r} \right) Δ ^ r = g ( β ^ ) ∼ N ( Δ r = g ( β ) , ∇ r ⊤ Λ ∇ r ) .
Decompose ∇ r \boldsymbol{\nabla}_{r} ∇ r into ∇ r = [ ∇ r [ 1 : 2 ] , ∇ r [ 3 : 4 ] ] . \boldsymbol{\nabla}_{r} = \left[
\boldsymbol{\nabla}_{r}[1:2], \boldsymbol{\nabla}_{r}[3:4]
\right]. ∇ r = [ ∇ r [ 1 : 2 ] , ∇ r [ 3 : 4 ] ] .
Then the final variance
∇ r ⊤ Λ ∇ r = ∇ r [ 1 : 2 ] ⊤ Λ T ∇ r [ 1 : 2 ] + ∇ r [ 3 : 4 ] ⊤ Λ C ∇ r [ 3 : 4 ] = [ 1 β [ 3 ] , − θ β [ 3 ] ] ⊤ ( σ Y , T 2 σ X Y , T σ X Y , T σ X , T 2 ) / N T [ 1 β [ 3 ] , − θ β [ 3 ] ] + [ − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] β [ 3 ] 2 , θ β [ 3 ] ] ( σ Y , C 2 σ X Y , C σ X Y , C σ X , C 2 ) / N C [ − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] β [ 3 ] 2 , θ β [ 3 ] ] = 1 N T β [ 3 ] 2 [ 1 , − θ ] ⊤ ( σ Y , T 2 σ X Y , T σ X Y , T σ X , T 2 ) [ 1 , − θ ] + 1 N C β [ 3 ] 2 [ − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] β [ 3 ] , θ ] ( σ Y , C 2 σ X Y , C σ X Y , C σ X , C 2 ) [ − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] β [ 3 ] , θ ] = σ Y , T 2 + θ 2 σ X , T 2 − 2 σ X Y , T N T β [ 3 ] 2 + 1 N C β [ 3 ] 2 [ − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] β [ 3 ] , θ ] ( σ Y , C 2 ( − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] ) β [ 3 ] + θ σ X Y , C σ X Y , C ( − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] ⚠︎ ) β [ 3 ] + θ σ X , C 2 ) = σ Y , T 2 + θ 2 σ X , T 2 − 2 σ X Y , T N T β [ 3 ] 2 + 1 N C β [ 3 ] 2 ( σ Y , C 2 ( − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] ) 2 β [ 3 ] 2 + 2 θ σ X Y , C ( − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] ) β [ 3 ] + θ 2 σ X , C 2 ) = σ Y , T 2 + θ 2 σ X , T 2 − 2 σ X Y , T N T Y ˉ C 2 + 1 N C Y ˉ C 2 ( σ Y , C 2 ( − Y ˉ T + θ X ˉ T − θ X ˉ C ) 2 Y ˉ C 2 + 2 θ σ X Y , C ( − Y ˉ T + θ X ˉ T − θ X ˉ C ) Y ˉ C + θ 2 σ X , C 2 ) , \begin{align}
\boldsymbol{\nabla}_{r}^{\top}\Lambda\boldsymbol{\nabla}_{r} &=
\boldsymbol{\nabla}_{r}[1:2]^{\top}
\boldsymbol{\Lambda}_{T}
\boldsymbol{\nabla}_{r}[1:2]
+
\boldsymbol{\nabla}_{r}[3:4]^{\top}
\boldsymbol{\Lambda}_{C}
\boldsymbol{\nabla}_{r}[3:4]
\nonumber\\&=
\left[\frac{1}{\boldsymbol{\beta}[3]}, \frac{-\theta}{\boldsymbol{\beta}[3]}\right]
^{\top}
\begin{pmatrix}
\sigma^{2}_{Y,T} & \sigma_{XY,T} \\
\sigma_{XY,T} & \sigma^{2}_{X,T}
\end{pmatrix}/ N_{T}
\left[\frac{1}{\boldsymbol{\beta}[3]}, \frac{-\theta}{\boldsymbol{\beta}[3]}\right]
\nonumber\\&+ \left[\frac{
-\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4]
}{\boldsymbol{\beta}[3]^{2}}, \frac{\theta}{\boldsymbol{\beta}[3]}\right]
\begin{pmatrix}
\sigma^{2}_{Y,C} & \sigma_{XY,C} \\
\sigma_{XY,C} & \sigma^{2}_{X,C}
\end{pmatrix}/ N_{C}
\left[\frac{
-\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4]
}{\boldsymbol{\beta}[3]^{2}}, \frac{\theta}{\boldsymbol{\beta}[3]}\right]
\nonumber\\&=
\frac{1}{N_{T}\boldsymbol{\beta}[3]^{2}}\left[1, -\theta\right]
^{\top}
\begin{pmatrix}
\sigma^{2}_{Y,T} & \sigma_{XY,T} \\
\sigma_{XY,T} & \sigma^{2}_{X,T}
\end{pmatrix}
\left[1, -\theta\right]
\nonumber\\&+
\frac{1}{N_{C}\boldsymbol{\beta}[3]^{2}}
\left[\frac{
-\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4]
}{\boldsymbol{\beta}[3]}, \theta\right]
\begin{pmatrix}
\sigma^{2}_{Y,C} & \sigma_{XY,C} \\
\sigma_{XY,C} & \sigma^{2}_{X,C}
\end{pmatrix}
\left[\frac{
-\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4]
}{\boldsymbol{\beta}[3]}, \theta\right]
\nonumber\\&=\frac{
\sigma^{2}_{Y,T}
+\theta^{2}\sigma^{2}_{X,T}
-2\sigma_{XY,T}
}{N_{T}\boldsymbol{\beta}[3]^{2}}
\nonumber\\&+
\frac{1}{N_{C}\boldsymbol{\beta}[3]^{2}}
\left[\frac{
-\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4]
}{\boldsymbol{\beta}[3]}, \theta\right]
\begin{pmatrix}
\frac{\sigma^{2}_{Y,C} \left(
-\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4]
\right)}{
\boldsymbol{\beta}[3]
}
+ \theta\sigma_{XY,C} \\
\frac{\sigma_{XY,C}
\left(
-\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4]⚠︎
\right)}{\boldsymbol{\beta}[3]}
+\theta\sigma^{2}_{X,C}
\end{pmatrix}
\nonumber\\&=\frac{
\sigma^{2}_{Y,T}
+\theta^{2}\sigma^{2}_{X,T}
-2\sigma_{XY,T}
}{N_{T}\boldsymbol{\beta}[3]^{2}}
\nonumber\\&+
\frac{1}{N_{C}\boldsymbol{\beta}[3]^{2}}
\left(
\frac{\sigma^{2}_{Y,C} \left(
-\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4]
\right)^{2}}{\boldsymbol{\beta}[3]^{2}}
+2\frac{\theta\sigma_{XY,C}
\left(-\boldsymbol{\beta}[1] + \theta\boldsymbol{\beta}[2] - \theta\boldsymbol{\beta}[4]\right)}{\boldsymbol{\beta}[3]}
+\theta^{2}\sigma^{2}_{X,C}
\right)
\nonumber\\&=\frac{
\sigma^{2}_{Y,T}
+\theta^{2}\sigma^{2}_{X,T}
-2\sigma_{XY,T}
}{N_{T}\bar{Y}_{C}^{2}}
\nonumber\\&+
\frac{1}{N_{C}\bar{Y}_{C}^{2}}
\left(
\frac{\sigma^{2}_{Y,C} \left(
-\bar{Y}_{T} + \theta\bar{X}_{T} - \theta\bar{X}_{C}
\right)^{2}}{\bar{Y}_{C}^{2}}
+2\frac{\theta\sigma_{XY,C}
\left(-\bar{Y}_{T} + \theta\bar{X}_{T} - \theta\bar{X}_{C}\right)}{\bar{Y}_{C}}
+\theta^{2}\sigma^{2}_{X,C}
\right),
\end{align} ∇ r ⊤ Λ ∇ r = ∇ r [ 1 : 2 ] ⊤ Λ T ∇ r [ 1 : 2 ] + ∇ r [ 3 : 4 ] ⊤ Λ C ∇ r [ 3 : 4 ] = [ β [ 3 ] 1 , β [ 3 ] − θ ] ⊤ ( σ Y , T 2 σ X Y , T σ X Y , T σ X , T 2 ) / N T [ β [ 3 ] 1 , β [ 3 ] − θ ] + [ β [ 3 ] 2 − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] , β [ 3 ] θ ] ( σ Y , C 2 σ X Y , C σ X Y , C σ X , C 2 ) / N C [ β [ 3 ] 2 − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] , β [ 3 ] θ ] = N T β [ 3 ] 2 1 [ 1 , − θ ] ⊤ ( σ Y , T 2 σ X Y , T σ X Y , T σ X , T 2 ) [ 1 , − θ ] + N C β [ 3 ] 2 1 [ β [ 3 ] − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] , θ ] ( σ Y , C 2 σ X Y , C σ X Y , C σ X , C 2 ) [ β [ 3 ] − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] , θ ] = N T β [ 3 ] 2 σ Y , T 2 + θ 2 σ X , T 2 − 2 σ X Y , T + N C β [ 3 ] 2 1 [ β [ 3 ] − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] , θ ] ( β [ 3 ] σ Y , C 2 ( − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] ) + θ σ X Y , C β [ 3 ] σ X Y , C ( − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] ⚠︎ ) + θ σ X , C 2 ) = N T β [ 3 ] 2 σ Y , T 2 + θ 2 σ X , T 2 − 2 σ X Y , T + N C β [ 3 ] 2 1 ( β [ 3 ] 2 σ Y , C 2 ( − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] ) 2 + 2 β [ 3 ] θ σ X Y , C ( − β [ 1 ] + θ β [ 2 ] − θ β [ 4 ] ) + θ 2 σ X , C 2 ) = N T Y ˉ C 2 σ Y , T 2 + θ 2 σ X , T 2 − 2 σ X Y , T + N C Y ˉ C 2 1 ( Y ˉ C 2 σ Y , C 2 ( − Y ˉ T + θ X ˉ T − θ X ˉ C ) 2 + 2 Y ˉ C θ σ X Y , C ( − Y ˉ T + θ X ˉ T − θ X ˉ C ) + θ 2 σ X , C 2 ) ,
where in the last step we move away from β \boldsymbol{\beta} β notation and use sample mean notation.
For estimating uncertainty in production, we use
σ ^ Δ R 2 = σ Y , T 2 + θ 2 σ X , T 2 − 2 σ X Y , T N T Y ˉ C 2 + 1 N C Y ˉ C 2 ( σ Y , C 2 ( − Y ˉ T ) 2 Y ˉ C 2 + 2 θ σ X Y , C ( − Y ˉ T ) Y ˉ C + θ 2 σ X , C 2 ) , \begin{align}
\hat{\sigma}^{2}_{\Delta_{R}}&=\frac{
\sigma^{2}_{Y,T}
+\theta^{2}\sigma^{2}_{X,T}
-2\sigma_{XY,T}
}{N_{T}\bar{Y}_{C}^{2}}
\nonumber\\&+
\frac{1}{N_{C}\bar{Y}_{C}^{2}}
\left(
\frac{\sigma^{2}_{Y,C} \left(
-\bar{Y}_{T}
\right)^{2}}{\bar{Y}_{C}^{2}}
+2\frac{\theta\sigma_{XY,C}
\left(-\bar{Y}_{T}\right)}{\bar{Y}_{C}}
+\theta^{2}\sigma^{2}_{X,C}
\right),
\end{align} σ ^ Δ R 2 = N T Y ˉ C 2 σ Y , T 2 + θ 2 σ X , T 2 − 2 σ X Y , T + N C Y ˉ C 2 1 ( Y ˉ C 2 σ Y , C 2 ( − Y ˉ T ) 2 + 2 Y ˉ C θ σ X Y , C ( − Y ˉ T ) + θ 2 σ X , C 2 ) ,
which leverages the fact that the pre-exposure revenue population means are equal due to randomization.
Ratio Metric
Throughout define the k th k^{\text{th}} k th element of vector x \textbf{x} x as x [ k ] \textbf{x}[k] x [ k ] .
Below we define parameters.
Under control, define the numerator (denominator) post-exposure population mean as μ M Y C \mu_{MYC} μ M Y C (μ D Y C \mu_{DYC} μ D Y C ).
Under treatment, define the numerator (denominator) post-exposure population mean as μ M Y T \mu_{MYT} μ M Y T (μ D Y T \mu_{DYT} μ D Y T ).
Under control, define the numerator (denominator) pre-exposure population mean as μ M X C \mu_{MXC} μ MXC (μ D X C \mu_{DXC} μ D XC ).
Under treatment, define the numerator (denominator) pre-exposure population mean as μ M X T \mu_{MXT} μ MXT (μ D X T \mu_{DXT} μ D XT ).
Due to randomization, μ M X C \mu_{MXC} μ MXC equals μ D X C \mu_{DXC} μ D XC and μ M X T \mu_{MXT} μ MXT μ D X T \mu_{DXT} μ D XT , but for bookkeeping purposes, it is easier to have separate parameters.
Ratio Metric, Absolute case
For ratio metrics the absolute parameter of interest is
Δ a = μ M Y T μ D Y T − μ M Y C μ D Y C . \begin{align}
\Delta_{a}&=
\frac{\mu_{MYT}}{\mu_{DYT}} - \frac{\mu_{MYC}}{\mu_{DYC}}.
\end{align} Δ a = μ D Y T μ M Y T − μ D Y C μ M Y C .
Below we define statistics.
Under control, define the numerator (denominator) post-exposure sample mean as M ˉ Y C \bar{M}_{YC} M ˉ Y C (D ˉ Y C \bar{D}_{YC} D ˉ Y C ).
Under treatment, define the numerator (denominator) post-exposure sample mean as M ˉ Y T \bar{M}_{YT} M ˉ Y T (D ˉ Y T \bar{D}_{YT} D ˉ Y T ).
Under control, define the numerator (denominator) pre-exposure sample mean as M ˉ X C \bar{M}_{XC} M ˉ XC (D ˉ X C \bar{D}_{XC} D ˉ XC ).
Under treatment, define the numerator (denominator) pre-exposure sample mean as M ˉ X T \bar{M}_{XT} M ˉ XT (D ˉ X T \bar{D}_{XT} D ˉ XT ).
Define:
β 0 = [ μ M Y T , μ D Y T , μ M X T , μ D X T , μ M Y C , μ D Y C , μ M X C , μ D X C ] \boldsymbol{\beta}_{0} = \left[\mu_{MYT}, \mu_{DYT}, \mu_{MXT}, \mu_{DXT}, \mu_{MYC}, \mu_{DYC}, \mu_{MXC}, \mu_{DXC}\right] β 0 = [ μ M Y T , μ D Y T , μ MXT , μ D XT , μ M Y C , μ D Y C , μ MXC , μ D XC ] .
β ^ = [ M ˉ Y T , D ˉ Y T , M ˉ X T , D ˉ X T , M ˉ Y C , D ˉ Y C , M ˉ X C , D ˉ X C ] \hat{\boldsymbol{\beta}} = \left[\bar{M}_{YT}, \bar{D}_{YT}, \bar{M}_{XT}, \bar{D}_{XT},
\bar{M}_{YC}, \bar{D}_{YC}, \bar{M}_{XC}, \bar{D}_{XC}
\right] β ^ = [ M ˉ Y T , D ˉ Y T , M ˉ XT , D ˉ XT , M ˉ Y C , D ˉ Y C , M ˉ XC , D ˉ XC ] .
Λ T = Cov [ M ˉ Y T , D ˉ Y T , M ˉ X T , D ˉ _ X T , ] N ∗ T − 1 ( Var ( M Y T ) Cov ( M Y T , D Y T ) Var ( D Y T ) Cov ( M Y T , M X T ) Cov ( D Y T , M X T ) Var ( M X T ) Cov ( M Y T , D X T ) Cov ( D Y T , D X T ) Cov ( M X T , D X T ) Var ( D X T ) ) \boldsymbol{\Lambda}_{T} = \text{Cov}\left[\bar{M}_{YT}, \bar{D}_{YT}, \bar{M}_{XT}, \bar{D}\_{XT},\right]
N*{T}^{-1}
\begin{pmatrix}
\text{Var}\left(M_{YT}\right) & & &\\
\text{Cov}\left(M_{YT}, D_{YT}\right) & \text{Var}\left(D_{YT}\right) & & \\
\text{Cov}\left(M_{YT}, M_{XT}\right) & \text{Cov}\left(D_{YT}, M_{XT}\right) & \text{Var}\left(M_{XT}\right) & \\
\text{Cov}\left(M_{YT}, D_{XT}\right) & \text{Cov}\left(D_{YT}, D_{XT}\right) & \text{Cov}\left(M_{XT}, D_{XT}\right) & \text{Var}\left(D_{XT}\right) \\
\end{pmatrix} Λ T = Cov [ M ˉ Y T , D ˉ Y T , M ˉ XT , D ˉ _ XT , ] N ∗ T − 1 Var ( M Y T ) Cov ( M Y T , D Y T ) Cov ( M Y T , M XT ) Cov ( M Y T , D XT ) Var ( D Y T ) Cov ( D Y T , M XT ) Cov ( D Y T , D XT ) Var ( M XT ) Cov ( M XT , D XT ) Var ( D XT )
Λ C = Cov [ M ˉ Y C , D ˉ Y C , M ˉ X C , D ˉ _ X C , ] N ∗ C − 1 ( Var ( M Y C ) Cov ( M Y C , D Y C ) Var ( D Y C ) Cov ( M Y C , M X C ) Cov ( D Y C , M X C ) Var ( M X C ) Cov ( M Y C , D X C ) Cov ( D Y C , D X C ) Cov ( M X C , D X C ) Var ( D X C ) ) \boldsymbol{\Lambda}_{C} = \text{Cov}\left[\bar{M}_{YC}, \bar{D}_{YC}, \bar{M}_{XC}, \bar{D}\_{XC},\right]
N*{C}^{-1}
\begin{pmatrix}
\text{Var}\left(M_{YC}\right) & & &\\
\text{Cov}\left(M_{YC}, D_{YC}\right) & \text{Var}\left(D_{YC}\right) & & \\
\text{Cov}\left(M_{YC}, M_{XC}\right) & \text{Cov}\left(D_{YC}, M_{XC}\right) & \text{Var}\left(M_{XC}\right) & \\
\text{Cov}\left(M_{YC}, D_{XC}\right) & \text{Cov}\left(D_{YC}, D_{XC}\right) & \text{Cov}\left(M_{XC}, D_{XC}\right) & \text{Var}\left(D_{XC}\right) \\
\end{pmatrix} Λ C = Cov [ M ˉ Y C , D ˉ Y C , M ˉ XC , D ˉ _ XC , ] N ∗ C − 1 Var ( M Y C ) Cov ( M Y C , D Y C ) Cov ( M Y C , M XC ) Cov ( M Y C , D XC ) Var ( D Y C ) Cov ( D Y C , M XC ) Cov ( D Y C , D XC ) Var ( M XC ) Cov ( M XC , D XC ) Var ( D XC )
Λ = Cov ( β ^ ) ( Λ T 0 4 × 4 0 4 × 4 Λ C ) \boldsymbol{\Lambda} = \text{Cov}\left(\hat{\boldsymbol{\beta}}\right)
\begin{pmatrix}
\boldsymbol{\Lambda}_{T} & \underset{4\times4}{\textbf{0}}\\
\underset{4\times4}{\textbf{0}} & \boldsymbol{\Lambda}_{C} \\
\end{pmatrix} Λ = Cov ( β ^ ) ( Λ T 4 × 4 0 4 × 4 0 Λ C )
Define the function
g ( β ; θ ) a = ( β [ 1 ] β [ 2 ] − θ β [ 3 ] β [ 4 ] ) − ( β [ 5 ] β [ 6 ] − θ β [ 7 ] β [ 8 ] ) . \begin{align}
g\left(\boldsymbol{\beta}; \theta\right)_{a}&= \left(\frac{\boldsymbol{\beta}[1]}{\boldsymbol{\beta}[2]} - \theta\frac{\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]} \right)
-\left(\frac{\boldsymbol{\beta}[5]}{\boldsymbol{\beta}[6]} - \theta\frac{\boldsymbol{\beta}[7]}{\boldsymbol{\beta}[8]} \right).
\end{align} g ( β ; θ ) a = ( β [ 2 ] β [ 1 ] − θ β [ 4 ] β [ 3 ] ) − ( β [ 6 ] β [ 5 ] − θ β [ 8 ] β [ 7 ] ) .
The CUPED estimator is
Δ ^ a = g a ( β ^ ; θ ) = ( β ^ [ 1 ] β ^ [ 2 ] − θ β ^ [ 3 ] β ^ [ 4 ] ) − ( β ^ [ 5 ] β ^ [ 6 ] − θ β ^ [ 7 ] β ^ [ 8 ] ) . \begin{align}
\hat{\Delta}_{a}&=g_{a}(\hat{\boldsymbol{\beta}}; \theta)
\\&=
\left(\frac{\hat{\boldsymbol{\beta}}[1]}{\hat{\boldsymbol{\beta}}[2]} - \theta\frac{\hat{\boldsymbol{\beta}}[3]}{\hat{\boldsymbol{\beta}}[4]} \right)
-\left(\frac{\hat{\boldsymbol{\beta}}[5]}{\hat{\boldsymbol{\beta}}[6]} - \theta\frac{\hat{\boldsymbol{\beta}}[7]}{\hat{\boldsymbol{\beta}}[8]} \right).
\end{align} Δ ^ a = g a ( β ^ ; θ ) = ( β ^ [ 2 ] β ^ [ 1 ] − θ β ^ [ 4 ] β ^ [ 3 ] ) − ( β ^ [ 6 ] β ^ [ 5 ] − θ β ^ [ 8 ] β ^ [ 7 ] ) .
Define the vector of partial derivatives as ∇ a ( β ; θ ) = ∂ g a ( β ) ∂ β \boldsymbol{\nabla}_{a}\left(\boldsymbol{\beta}; \theta\right) = \frac{\partial g_{a}(\boldsymbol{\beta})}{\partial\boldsymbol{\beta}} ∇ a ( β ; θ ) = ∂ β ∂ g a ( β ) .
∇ [ 1 ] = 1 β [ 2 ] ∇ [ 2 ] = − β [ 1 ] β [ 2 ] 2 ∇ [ 3 ] = − θ β [ 4 ] ∇ [ 4 ] = θ β [ 3 ] β [ 4 ] 2 ∇ [ 5 ] = − 1 β [ 6 ] ∇ [ 6 ] = β [ 5 ] β [ 6 ] 2 ∇ [ 7 ] = θ β [ 8 ] ∇ [ 8 ] = − θ β [ 7 ] β [ 8 ] 2 . \begin{align*}
\boldsymbol{\nabla}[1] &= \frac{1}{\boldsymbol{\beta}[2]}
\\\boldsymbol{\nabla}[2] &= \frac{-\boldsymbol{\beta}[1]}{\boldsymbol{\beta}[2]^2}
\\\boldsymbol{\nabla}[3] &= \frac{-\theta}{\boldsymbol{\beta}[4]}
\\\boldsymbol{\nabla}[4] &= \frac{\theta\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2}
\\\boldsymbol{\nabla}[5] &= \frac{-1}{\boldsymbol{\beta}[6]}
\\\boldsymbol{\nabla}[6] &= \frac{\boldsymbol{\beta}[5]}{\boldsymbol{\beta}[6]^2}
\\\boldsymbol{\nabla}[7] &= \frac{\theta}{\boldsymbol{\beta}[8]}
\\\boldsymbol{\nabla}[8] &= \frac{-\theta\boldsymbol{\beta}[7]}{\boldsymbol{\beta}[8]^2}.
\end{align*} ∇ [ 1 ] ∇ [ 2 ] ∇ [ 3 ] ∇ [ 4 ] ∇ [ 5 ] ∇ [ 6 ] ∇ [ 7 ] ∇ [ 8 ] = β [ 2 ] 1 = β [ 2 ] 2 − β [ 1 ] = β [ 4 ] − θ = β [ 4 ] 2 θ β [ 3 ] = β [ 6 ] − 1 = β [ 6 ] 2 β [ 5 ] = β [ 8 ] θ = β [ 8 ] 2 − θ β [ 7 ] .
Note that ∇ a ( β ^ ; θ ) \boldsymbol{\nabla}_{a}\left(\hat{\boldsymbol{\beta}}; \theta\right) ∇ a ( β ^ ; θ )
∇ [ 1 ] = 1 D ˉ Y T ∇ [ 2 ] = − M ˉ Y T D ˉ Y T 2 ∇ [ 3 ] = − θ D ˉ X T ∇ [ 4 ] = θ M ˉ X T D ˉ X T 2 ∇ [ 5 ] = − 1 D ˉ Y C ∇ [ 6 ] = M ˉ Y C D ˉ Y C 2 ∇ [ 7 ] = θ D ˉ X C ∇ [ 8 ] = − θ M ˉ X C D ˉ X C 2 . \begin{align*}
\boldsymbol{\nabla}[1] &= \frac{1}{\bar{D}_{YT}}
\\\boldsymbol{\nabla}[2] &= \frac{-\bar{M}_{YT}}{\bar{D}_{YT}^{2}}
\\\boldsymbol{\nabla}[3] &= \frac{-\theta}{\bar{D}_{XT}}
\\\boldsymbol{\nabla}[4] &= \frac{\theta\bar{M}_{XT}}{\bar{D}_{XT}^2}
\\\boldsymbol{\nabla}[5] &= \frac{-1}{\bar{D}_{YC}}
\\\boldsymbol{\nabla}[6] &= \frac{\bar{M}_{YC}}{\bar{D}_{YC}^2}
\\\boldsymbol{\nabla}[7] &= \frac{\theta}{\bar{D}_{XC}}
\\\boldsymbol{\nabla}[8] &= \frac{-\theta\bar{M}_{XC}}{\bar{D}_{XC}^2}.
\end{align*} ∇ [ 1 ] ∇ [ 2 ] ∇ [ 3 ] ∇ [ 4 ] ∇ [ 5 ] ∇ [ 6 ] ∇ [ 7 ] ∇ [ 8 ] = D ˉ Y T 1 = D ˉ Y T 2 − M ˉ Y T = D ˉ XT − θ = D ˉ XT 2 θ M ˉ XT = D ˉ Y C − 1 = D ˉ Y C 2 M ˉ Y C = D ˉ XC θ = D ˉ XC 2 − θ M ˉ XC .
By the central limit theorem,
Δ ^ a = g ( β ^ ) ∼ N ( Δ r = g ( β ) , ∇ a ⊤ Λ ∇ a ) \hat{\Delta}_{a} = g(\hat{\boldsymbol{\beta}}) \stackrel{}{\sim}\mathcal{N}\left(\Delta_{r} = g\left(\boldsymbol{\beta}\right), \boldsymbol{\nabla}_{a}^{\top}\Lambda\boldsymbol{\nabla}_{a} \right) Δ ^ a = g ( β ^ ) ∼ N ( Δ r = g ( β ) , ∇ a ⊤ Λ ∇ a ) .
Decompose ∇ a \boldsymbol{\nabla}_{a} ∇ a into its first four elements (∇ a , T \boldsymbol{\nabla}_{a, T} ∇ a , T ) and its last four elements (∇ a , C \boldsymbol{\nabla}_{a, C} ∇ a , C ).
Our variance of interest is
σ ^ Δ a 2 = ∇ a ( β ^ ; θ ) ⊤ Λ ∇ a ( β ^ ; θ ) = ∇ a , T ⊤ Λ T ∇ a , T + ∇ a , C ⊤ Λ C ∇ a , C . \begin{align}
\hat{\sigma}^{2}_{\Delta_{a}}&=
\boldsymbol{\nabla}_{a}\left(\hat{\boldsymbol{\beta}}; \theta\right)^{\top}\boldsymbol{\Lambda}\boldsymbol{\nabla}_{a}\left(\hat{\boldsymbol{\beta}}; \theta\right)
\nonumber\\
&= \boldsymbol{\nabla}_{a, T}^{\top}\boldsymbol{\Lambda}_{T}\boldsymbol{\nabla}_{a, T}
+ \boldsymbol{\nabla}_{a, C}^{\top}\boldsymbol{\Lambda}_{C}\boldsymbol{\nabla}_{a, C}.
\end{align} σ ^ Δ a 2 = ∇ a ( β ^ ; θ ) ⊤ Λ ∇ a ( β ^ ; θ ) = ∇ a , T ⊤ Λ T ∇ a , T + ∇ a , C ⊤ Λ C ∇ a , C .
All of these moments are available via CupedRatioRegressionAdjustedStatistics.
Optimal regression coefficient for ratio metrics
The optimal θ \theta θ minimizes Equation (14).
We can write
∇ a , T ⊤ Λ T ∇ a , T = ∇ a , T [ 1 : 2 ] ⊤ Λ T [ 1 : 2 , 1 : 2 ] ∇ a , T [ 1 : 2 ] + 2 ∇ a , T [ 1 : 2 ] ⊤ Λ T [ 3 : 4 , 1 : 2 ] ∇ a , T [ 3 : 4 ] + ∇ a , T [ 3 : 4 ] ⊤ Λ T [ 3 : 4 , 3 : 4 ] ∇ a , T [ 3 : 4 ] = c T + 2 θ [ 1 β [ 2 ] , − β [ 1 ] β [ 2 ] 2 ] ⊤ Λ T [ 3 : 4 , 1 : 2 ] [ − 1 β [ 4 ] , β [ 3 ] β [ 4 ] 2 ] + θ 2 [ − 1 β [ 4 ] , β [ 3 ] β [ 4 ] 2 ] ⊤ Λ T [ 3 : 4 , 3 : 4 ] [ − 1 β [ 4 ] , β [ 3 ] β [ 4 ] 2 ] \begin{align*}
\boldsymbol{\nabla}_{a, T}^{\top}\boldsymbol{\Lambda}_{T}\boldsymbol{\nabla}_{a, T}&=\boldsymbol{\nabla}_{a, T}[1:2]^{\top}\boldsymbol{\Lambda}_{T}[1:2, 1:2]\boldsymbol{\nabla}_{a, T}[1:2]
\nonumber\\&+2\boldsymbol{\nabla}_{a, T}[1:2]^{\top}\boldsymbol{\Lambda}_{T}[3:4, 1:2]\boldsymbol{\nabla}_{a, T}[3:4]+\boldsymbol{\nabla}_{a, T}[3:4]^{\top}\boldsymbol{\Lambda}_{T}[3:4, 3:4]\boldsymbol{\nabla}_{a, T}[3:4]
\nonumber\\
&=c_{T} + 2\theta
\left[\frac{1}{\boldsymbol{\beta}[2]}, \frac{-\boldsymbol{\beta}[1]}{\boldsymbol{\beta}[2]^2} \right]^{\top}
\boldsymbol{\Lambda}_{T}[3:4, 1:2]\left[\frac{-1}{\boldsymbol{\beta}[4]}, \frac{\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2} \right]
+
\theta^{2}\left[\frac{-1}{\boldsymbol{\beta}[4]}, \frac{\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2}\right]^{\top}\boldsymbol{\Lambda}_{T}[3:4, 3:4]\left[\frac{-1}{\boldsymbol{\beta}[4]}, \frac{\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2}\right]
\end{align*} ∇ a , T ⊤ Λ T ∇ a , T = ∇ a , T [ 1 : 2 ] ⊤ Λ T [ 1 : 2 , 1 : 2 ] ∇ a , T [ 1 : 2 ] + 2 ∇ a , T [ 1 : 2 ] ⊤ Λ T [ 3 : 4 , 1 : 2 ] ∇ a , T [ 3 : 4 ] + ∇ a , T [ 3 : 4 ] ⊤ Λ T [ 3 : 4 , 3 : 4 ] ∇ a , T [ 3 : 4 ] = c T + 2 θ [ β [ 2 ] 1 , β [ 2 ] 2 − β [ 1 ] ] ⊤ Λ T [ 3 : 4 , 1 : 2 ] [ β [ 4 ] − 1 , β [ 4 ] 2 β [ 3 ] ] + θ 2 [ β [ 4 ] − 1 , β [ 4 ] 2 β [ 3 ] ] ⊤ Λ T [ 3 : 4 , 3 : 4 ] [ β [ 4 ] − 1 , β [ 4 ] 2 β [ 3 ] ]
where c T c_{T} c T is free of θ \theta θ .
Similarly we can write
∇ a , C ⊤ Λ C ∇ a , C = c C + 2 θ [ − 1 β [ 2 ] , β [ 1 ] β [ 2 ] 2 ] ⊤ Λ C [ 3 : 4 , 1 : 2 ] [ 1 β [ 4 ] , − β [ 3 ] β [ 4 ] 2 ] + θ 2 [ − 1 β [ 8 ] , β [ 7 ] β [ 8 ] 2 ] ⊤ Λ C [ 3 : 4 , 3 : 4 ] [ − 1 β [ 8 ] , β [ 7 ] β [ 8 ] 2 ] \begin{align*}
\boldsymbol{\nabla}_{a, C}^{\top}\boldsymbol{\Lambda}_{C}\boldsymbol{\nabla}_{a, C}&=
c_{C} + 2\theta
\left[\frac{-1}{\boldsymbol{\beta}[2]}, \frac{\boldsymbol{\beta}[1]}{\boldsymbol{\beta}[2]^2} \right]^{\top}
\boldsymbol{\Lambda}_{C}[3:4, 1:2]\left[\frac{1}{\boldsymbol{\beta}[4]}, \frac{-\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2} \right]
+
\theta^{2}\left[\frac{-1}{\boldsymbol{\beta}[8]}, \frac{\boldsymbol{\beta}[7]}{\boldsymbol{\beta}[8]^2}\right]^{\top}\boldsymbol{\Lambda}_{C}[3:4, 3:4]\left[\frac{-1}{\boldsymbol{\beta}[8]}, \frac{\boldsymbol{\beta}[7]}{\boldsymbol{\beta}[8]^2}\right]
\end{align*} ∇ a , C ⊤ Λ C ∇ a , C = c C + 2 θ [ β [ 2 ] − 1 , β [ 2 ] 2 β [ 1 ] ] ⊤ Λ C [ 3 : 4 , 1 : 2 ] [ β [ 4 ] 1 , β [ 4 ] 2 − β [ 3 ] ] + θ 2 [ β [ 8 ] − 1 , β [ 8 ] 2 β [ 7 ] ] ⊤ Λ C [ 3 : 4 , 3 : 4 ] [ β [ 8 ] − 1 , β [ 8 ] 2 β [ 7 ] ]
where c C c_{C} c C is a constant free of θ \theta θ .
Differentiating the sum of these two equations with respect to θ \theta θ and setting equal to zero shows that the minimum of this quadratic form occurs at
θ opt = − [ − 1 β [ 2 ] , β [ 1 ] β [ 2 ] 2 ] ⊤ Λ C [ 3 : 4 , 1 : 2 ] [ 1 β [ 4 ] , − β [ 3 ] β [ 4 ] 2 ] + [ 1 β [ 2 ] , − β [ 1 ] β [ 2 ] 2 ] ⊤ Λ T [ 3 : 4 , 1 : 2 ] [ − 1 β [ 4 ] , β [ 3 ] β [ 4 ] 2 ] [ − 1 β [ 8 ] , β [ 7 ] β [ 8 ] 2 ] ⊤ Λ C [ 3 : 4 , 3 : 4 ] [ − 1 β [ 8 ] , β [ 7 ] β [ 8 ] 2 ] + [ − 1 β [ 4 ] , β [ 3 ] β [ 4 ] 2 ] ⊤ Λ T [ 3 : 4 , 3 : 4 ] [ − 1 β [ 4 ] , β [ 3 ] β [ 4 ] 2 ] . \begin{align}
\theta_{\text{opt}} &=
-\frac{
\left[\frac{-1}{\boldsymbol{\beta}[2]}, \frac{\boldsymbol{\beta}[1]}{\boldsymbol{\beta}[2]^2} \right]^{\top}
\boldsymbol{\Lambda}_{C}[3:4, 1:2]\left[\frac{1}{\boldsymbol{\beta}[4]}, \frac{-\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2} \right] +
\left[\frac{1}{\boldsymbol{\beta}[2]}, \frac{-\boldsymbol{\beta}[1]}{\boldsymbol{\beta}[2]^2} \right]^{\top}
\boldsymbol{\Lambda}_{T}[3:4, 1:2]\left[\frac{-1}{\boldsymbol{\beta}[4]}, \frac{\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2} \right]
}{\left[\frac{-1}{\boldsymbol{\beta}[8]}, \frac{\boldsymbol{\beta}[7]}{\boldsymbol{\beta}[8]^2}\right]^{\top}\boldsymbol{\Lambda}_{C}[3:4, 3:4]\left[\frac{-1}{\boldsymbol{\beta}[8]}, \frac{\boldsymbol{\beta}[7]}{\boldsymbol{\beta}[8]^2}\right] +
\left[\frac{-1}{\boldsymbol{\beta}[4]}, \frac{\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2}\right]^{\top}\boldsymbol{\Lambda}_{T}[3:4, 3:4]\left[\frac{-1}{\boldsymbol{\beta}[4]}, \frac{\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2}\right]
}.
\end{align} θ opt = − [ β [ 8 ] − 1 , β [ 8 ] 2 β [ 7 ] ] ⊤ Λ C [ 3 : 4 , 3 : 4 ] [ β [ 8 ] − 1 , β [ 8 ] 2 β [ 7 ] ] + [ β [ 4 ] − 1 , β [ 4 ] 2 β [ 3 ] ] ⊤ Λ T [ 3 : 4 , 3 : 4 ] [ β [ 4 ] − 1 , β [ 4 ] 2 β [ 3 ] ] [ β [ 2 ] − 1 , β [ 2 ] 2 β [ 1 ] ] ⊤ Λ C [ 3 : 4 , 1 : 2 ] [ β [ 4 ] 1 , β [ 4 ] 2 − β [ 3 ] ] + [ β [ 2 ] 1 , β [ 2 ] 2 − β [ 1 ] ] ⊤ Λ T [ 3 : 4 , 1 : 2 ] [ β [ 4 ] − 1 , β [ 4 ] 2 β [ 3 ] ] .
The numerator represents the sum of the covariances, the denominator the sum of the variances.
This is the same θ \theta θ as is presented in Appendix B of (Deng et al. 2013 ).
Ratio Metric, Relative case
For relative inference much of the approach in the previous section works, we simply need to define the appropriate g g g function and its partial derivatives.
For ratio metrics the relative parameter of interest is
Δ r = μ M T Y μ D T Y − μ M C Y μ D C Y μ M C Y μ D C Y = μ M T Y μ D T Y μ M C Y μ D C Y − 1. \begin{align}
\Delta_{r}&=\frac{\frac{\mu_{MTY}}{\mu_{DTY}} - \frac{\mu_{MCY}}{\mu_{DCY}}}{\frac{\mu_{MCY}}{\mu_{DCY}}}
\\&= \frac{\frac{\mu_{MTY}}{\mu_{DTY}}}{\frac{\mu_{MCY}}{\mu_{DCY}}} - 1.
\end{align} Δ r = μ D C Y μ MC Y μ D T Y μ MT Y − μ D C Y μ MC Y = μ D C Y μ MC Y μ D T Y μ MT Y − 1.
Define the function
g ( β ; θ ) r = ( β [ 1 ] β [ 2 ] − θ β [ 3 ] β [ 4 ] ) − ( β [ 5 ] β [ 6 ] − θ β [ 7 ] β [ 8 ] ) β [ 5 ] β [ 6 ] \begin{align}
g\left(\boldsymbol{\beta}; \theta\right)_{r}&=
\frac{
\left(\frac{\boldsymbol{\beta}[1]}{\boldsymbol{\beta}[2]} - \theta\frac{\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]} \right)
-\left(\frac{\boldsymbol{\beta}[5]}{\boldsymbol{\beta}[6]} - \theta\frac{\boldsymbol{\beta}[7]}{\boldsymbol{\beta}[8]} \right)
}{\frac{\boldsymbol{\beta}[5]}{\boldsymbol{\beta}[6]}}
\end{align} g ( β ; θ ) r = β [ 6 ] β [ 5 ] ( β [ 2 ] β [ 1 ] − θ β [ 4 ] β [ 3 ] ) − ( β [ 6 ] β [ 5 ] − θ β [ 8 ] β [ 7 ] )
We can consistently estimate Equation (18) with the CUPED estimator
g ( β ^ ; θ ) r . g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r}. g ( β ^ ; θ ) r .
Define the numerator in Equation (18) as g ( β ; θ ) r , n u m g\left(\boldsymbol{\beta}; \theta\right)_{r, num} g ( β ; θ ) r , n u m and the denominator as g ( β ; θ ) r , d e n g\left(\boldsymbol{\beta}; \theta\right)_{r, den} g ( β ; θ ) r , d e n .
Define the vector of partial derivatives as ∇ r ( β ; θ ) = ∂ g r ( β ) ∂ β \boldsymbol{\nabla}_{r}\left(\boldsymbol{\beta}; \theta\right) = \frac{\partial g_{r}(\boldsymbol{\beta})}{\partial\boldsymbol{\beta}} ∇ r ( β ; θ ) = ∂ β ∂ g r ( β ) .
∇ [ 1 ] = 1 β [ 2 ] g ( β ; θ ) r , d e n ∇ [ 2 ] = − β [ 1 ] β [ 2 ] 2 g ( β ; θ ) r , d e n ∇ [ 3 ] = − θ β [ 4 ] g ( β ; θ ) r , d e n ∇ [ 4 ] = θ β [ 3 ] β [ 4 ] 2 g ( β ; θ ) r , d e n ∇ [ 5 ] = − g ( β ; θ ) r , d e n β [ 6 ] − g ( β ; θ ) r , n u m β [ 6 ] g ( β ; θ ) r , d e n 2 ∇ [ 6 ] = β [ 5 ] g ( β ; θ ) r , d e n β [ 6 ] 2 + β [ 5 ] g ( β ; θ ) r , n u m β [ 6 ] 2 g ( β ; θ ) r , d e n 2 ∇ [ 7 ] = θ β [ 8 ] g ( β ; θ ) r , d e n ∇ [ 8 ] = − θ β [ 7 ] β [ 8 ] 2 g ( β ; θ ) r , d e n . \begin{align*}
\boldsymbol{\nabla}[1] &= \frac{\frac{1}{\boldsymbol{\beta}[2]} }{g\left(\boldsymbol{\beta}; \theta\right)_{r, den}}
\\\boldsymbol{\nabla}[2] &= \frac{\frac{-\boldsymbol{\beta}[1]}{\boldsymbol{\beta}[2]^2}}{g\left(\boldsymbol{\beta}; \theta\right)_{r, den}}
\\\boldsymbol{\nabla}[3] &=
\frac{
\frac{-\theta}{\boldsymbol{\beta}[4]}
}{g\left(\boldsymbol{\beta}; \theta\right)_{r, den}}
\\\boldsymbol{\nabla}[4] &= \frac{\frac{\theta\boldsymbol{\beta}[3]}{\boldsymbol{\beta}[4]^2}}{g\left(\boldsymbol{\beta}; \theta\right)_{r, den}}
\\\boldsymbol{\nabla}[5] &=
\frac{
\frac{-g\left(\boldsymbol{\beta}; \theta\right)_{r, den}}{\boldsymbol{\beta}[6]} - \frac{g\left(\boldsymbol{\beta}; \theta\right)_{r, num}}{\boldsymbol{\beta}[6]}
}{g\left(\boldsymbol{\beta}; \theta\right)_{r, den}^{2}}
\\\boldsymbol{\nabla}[6] &=
\frac{
\frac{\boldsymbol{\beta}[5]g\left(\boldsymbol{\beta}; \theta\right)_{r, den}}{\boldsymbol{\beta}[6]^2} +
\frac{\boldsymbol{\beta}[5]g\left(\boldsymbol{\beta}; \theta\right)_{r, num}}{\boldsymbol{\beta}[6]^2}
}{g\left(\boldsymbol{\beta}; \theta\right)_{r, den}^{2}}
\\\boldsymbol{\nabla}[7] &=
\frac{\frac{\theta}{\boldsymbol{\beta}[8]}
}{g\left(\boldsymbol{\beta}; \theta\right)_{r, den}}
\\\boldsymbol{\nabla}[8] &=
\frac{
\frac{-\theta\boldsymbol{\beta}[7]}{\boldsymbol{\beta}[8]^2}
}{g\left(\boldsymbol{\beta}; \theta\right)_{r, den}}.
\end{align*} ∇ [ 1 ] ∇ [ 2 ] ∇ [ 3 ] ∇ [ 4 ] ∇ [ 5 ] ∇ [ 6 ] ∇ [ 7 ] ∇ [ 8 ] = g ( β ; θ ) r , d e n β [ 2 ] 1 = g ( β ; θ ) r , d e n β [ 2 ] 2 − β [ 1 ] = g ( β ; θ ) r , d e n β [ 4 ] − θ = g ( β ; θ ) r , d e n β [ 4 ] 2 θ β [ 3 ] = g ( β ; θ ) r , d e n 2 β [ 6 ] − g ( β ; θ ) r , d e n − β [ 6 ] g ( β ; θ ) r , n u m = g ( β ; θ ) r , d e n 2 β [ 6 ] 2 β [ 5 ] g ( β ; θ ) r , d e n + β [ 6 ] 2 β [ 5 ] g ( β ; θ ) r , n u m = g ( β ; θ ) r , d e n β [ 8 ] θ = g ( β ; θ ) r , d e n β [ 8 ] 2 − θ β [ 7 ] .
Note that
g ( β ^ ; θ ) r , n u m = ( M ˉ Y T D ˉ Y T − θ M ˉ X T D ˉ X T ) − ( M ˉ Y C D ˉ Y C − θ M ˉ X C D ˉ X C ) g ( β ^ ; θ ) r , d e n = M ˉ Y C D ˉ Y C \begin{align*}
g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, num} &= \left(\frac{\bar{M}_{YT}}{\bar{D}_{YT}} - \theta\frac{\bar{M}_{XT}}{\bar{D}_{XT}} \right)
-\left(\frac{\bar{M}_{YC}}{\bar{D}_{YC}} - \theta\frac{\bar{M}_{XC}}{\bar{D}_{XC}} \right)
\\g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den} &= \frac{\bar{M}_{YC}}{\bar{D}_{YC}}
\end{align*} g ( β ^ ; θ ) r , n u m g ( β ^ ; θ ) r , d e n = ( D ˉ Y T M ˉ Y T − θ D ˉ XT M ˉ XT ) − ( D ˉ Y C M ˉ Y C − θ D ˉ XC M ˉ XC ) = D ˉ Y C M ˉ Y C
Note that ∇ r ( β ^ ; θ ) \boldsymbol{\nabla}_{r}\left(\hat{\boldsymbol{\beta}}; \theta\right) ∇ r ( β ^ ; θ ) is equal to
∇ [ 1 ] = 1 D ˉ Y T g ( β ^ ; θ ) r , d e n ∇ [ 2 ] = − M ˉ Y T D ˉ Y T 2 g ( β ^ ; θ ) r , d e n ∇ [ 3 ] = − θ D ˉ X T g ( β ^ ; θ ) r , d e n ∇ [ 4 ] = θ M ˉ X T D ˉ X T 2 g ( β ^ ; θ ) r , d e n ∇ [ 5 ] = − g ( β ^ ; θ ) r , d e n D ˉ Y C − g ( β ^ ; θ ) r , n u m D ˉ Y C g ( β ^ ; θ ) r , d e n 2 ∇ [ 6 ] = M ˉ Y C g ( β ^ ; θ ) r , d e n D ˉ Y C 2 + M ˉ Y C g ( β ^ ; θ ) r , n u m D ˉ Y C 2 g ( β ^ ; θ ) r , d e n 2 ∇ [ 7 ] = θ D ˉ X C g ( β ^ ; θ ) r , d e n ∇ [ 8 ] = − θ M ˉ X C D ˉ X C 2 . g ( β ^ ; θ ) r , d e n . \begin{align*}
\boldsymbol{\nabla}[1] &= \frac{\frac{1}{\bar{D}_{YT}}}{g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}}
\\\boldsymbol{\nabla}[2] &= \frac{\frac{-\bar{M}_{YT}}{\bar{D}_{YT}^{2}}}{g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}}
\\\boldsymbol{\nabla}[3] &=
\frac{
\frac{-\theta}{\bar{D}_{XT}}
}{g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}}
\\\boldsymbol{\nabla}[4] &= \frac{\frac{\theta\bar{M}_{XT}}{\bar{D}_{XT}^2}}{g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}}
\\\boldsymbol{\nabla}[5] &=
\frac{
\frac{-g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}}{\bar{D}_{YC}} - \frac{g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, num}}{\bar{D}_{YC}}
}{g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}^{2}}
\\\boldsymbol{\nabla}[6] &=
\frac{
\frac{\bar{M}_{YC}g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}}{\bar{D}_{YC}^2} +
\frac{\bar{M}_{YC}g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, num}}{\bar{D}_{YC}^2}
}{g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}^{2}}
\\\boldsymbol{\nabla}[7] &=
\frac{\frac{\theta}{\bar{D}_{XC}}
}{g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}}
\\\boldsymbol{\nabla}[8] &=
\frac{
\frac{-\theta\bar{M}_{XC}}{\bar{D}_{XC}^2}.
}{g\left(\hat{\boldsymbol{\beta}}; \theta\right)_{r, den}}.
\end{align*} ∇ [ 1 ] ∇ [ 2 ] ∇ [ 3 ] ∇ [ 4 ] ∇ [ 5 ] ∇ [ 6 ] ∇ [ 7 ] ∇ [ 8 ] = g ( β ^ ; θ ) r , d e n D ˉ Y T 1 = g ( β ^ ; θ ) r , d e n D ˉ Y T 2 − M ˉ Y T = g ( β ^ ; θ ) r , d e n D ˉ XT − θ = g ( β ^ ; θ ) r , d e n D ˉ XT 2 θ M ˉ XT = g ( β ^ ; θ ) r , d e n 2 D ˉ Y C − g ( β ^ ; θ ) r , d e n − D ˉ Y C g ( β ^ ; θ ) r , n u m = g ( β ^ ; θ ) r , d e n 2 D ˉ Y C 2 M ˉ Y C g ( β ^ ; θ ) r , d e n + D ˉ Y C 2 M ˉ Y C g ( β ^ ; θ ) r , n u m = g ( β ^ ; θ ) r , d e n D ˉ XC θ = g ( β ^ ; θ ) r , d e n D ˉ XC 2 − θ M ˉ XC . .