Testing Mean and Covariance Equivalence
Implementation
To make our robust optimization scheme incremental, we need to implement an efficient clustering algorithm. This efficiency needs to be w.r.t. both computation and storage. This means that, ideally, we will only have to store a small batch of the most recent observations in memory.
As an initial step to implement our incremental clustering approach, we need the ability to test the equivalence of mean vectors and covariance matrices of our current batch of observations against our prior mixture model. One method for testing this is presented below.
Covariance Test
To begin, let’s assume that we have a set of observations, $ { x_n } \in \mathbf{R}^d $. And we want to check if this set of observations has the same covariance as a hypothesis covariance matrix (i.e., we want to see if $\Sigma_x = \Sigma_0$, where $\Sigma_x = \text{cov}(x)$ and $\Sigma_0$ is our hypothesis).
To do this, we must first transform our original data set with Cholesky decomposition of our hypothesis covariance ( the covariance test only works for unit covariance matrices ), as shown below.
Utilizing the transformed data set, we can construct the $W$-statistic, as shown below, which is known to have an asymptotic $\chi^2$ distribution with degrees of freedom $d(d+1)/2$.
Mean Test
To test the equivalence of mean vectors, we can construct the $T$-statistic, as shown below, which is known to have an asymptotic $F$ distribution.
Test
As a simple test to validate our implementation, we utilized a simulated data set composed of several Gaussian components. The initial test is presented in the video below. From this simple test, we can see that we are able to distinguish when components in our steaming mixture model match our global mixture model.
Testing mean and covariance equivalence on a simulated data set.