Variational Inference Characterization
To transition our robust optimization approach from a batch estimation technique to one that can run incremental, we need a fast, efficient clustering technique. One such technique is an approximate inference methodology known as variational inference. Specifically, we will utilize variational inference with a Dirichlet prior over the categorical distribution.
We have implemented the Dirichlet variational Bayes clustering approach, and would like to begin quantifying its ability to accurately characterize the provided data. As an initial test, we will run our algorithm over a unimodal Gaussian data set (i.e., $X \sim \mathcal{N}(10,2)$), where we are interested in the accuracy and run-time of our estimator as the cardinality of the data-set is varied, as provided in Fig. 1.
From Fig. 1, we can see that the run-time of the variational Bayes clustering seems to increase exponentially with the cardinality of the data-set. Additionally, we can note that our mean estimate seems to converge rather quickly to steady-state accuracy level. However, what’s disconcerting – specifically for our problem – is that an accurate characterization of the covariance matrix seems to require a rather large data-set.
Fig 1 :: Initial characterization of Variational Bayes clustering on uni-modal Gaussian data.
One way that we maybe able to improve our covariance characterization is through the use of Linear Response Variational Bayes. However, this approach assumes that the mean estimate provided by the estimator is accurate. To validate that this is true for our estimator, we will generate data from the same distribution as above; however, this time, we will run 100 test at each data-set cardinality. This will provide us with a rough estimate of the variance of our mean estimation as a function of data-set cardinality, as provided in Fig. 2.
Fig 2 :: Repeatability of Variational Bayes mean estimation as a function of data-set size.
Next Steps
- Test VDB against collapsed Gibb’s sampling
- Implement Linear Response Variational Bayes (LRVB).