Density Estimation

As has been shown over previous days, the m-estimator is fairly sensitive to both the user specified kernel width and the dataset. So, today we move our focus to methods that can estimate the correct data distribution as a first step to move toward an adaptive estimator.

Density Estimination

Assume that we have n observations which are realizations of univariate random variables,

$X_1, \ldots, X_n \quad \text{i.i.d} \quad \sim \quad F,$

where F is the cumulative distribution function. The goal of this field is to estimate the density,

$\hat{f}(x) = \frac{d}{dx}\hat{F}(x) \quad \quad \text{where} \quad \quad \hat{F}(x) = \frac{1}{n}\sum_i^n I(X_i \leq x).$

Gaussian Mixture Model

One method of density estimation is the Gaussian Mixture Model (GMM). This model assumes, as the name suggests, that the data points are generated by a mixture of finitely many Gaussian distributions with unknown parameters, as shown below.

$p(x) = \sum_i^n \omega_i \mathcal{N}(X|\mu_i,\Sigma_i)$

One method to solve GMM is to utilized the Expectation-Maximization (EM) algorithm. This method iterates between expectation and maximization steps to find the correct parameters. The expectation steps holds the current estimate of the Gaussian parameters constant to posterior distribution of the latent variables $\omega$. Next, the maximization steps utilizes the $\omega$ values to find the new parameters of the Gaussian by maximizing the Eq. shown below,

$\Theta^{K+1} = \text{argmax} \int_{\Omega} p(\Omega|\Theta,|Z) \text{ln} p(\Theta|\Omega,Z)d\Omega$

where $\Omega$ is the set of weights, $\Theta$ is the set of Gaussian parameters, and Z is an indicator function which takes a value of zero or one.

One notable disadvantange of GMM is,

When one has insufficiently many points per mixture, estimating the covariance matrices becomes difficult, and the algorithm is known to diverge and find solutions with infinite likelihood unless one regularizes the covariances artificially,

which is significant when utilizing GMM for data rejection. This is due to the hope that outlier distribution should have substantially fewer data points then the inlier dataset.

Bayesian Gaussian Mixture With Dirichlet Distribution

Testing

To begin testing, a simple dataset was generated from two independent Gaussian distributions. This sample dataset is depicted in Fig. 1.

Fig 1 :: Generated dataset

Using this dataset, the Gaussian Mixture Model (GMM) and the Bayesian Gaussian Mixture Model with a Dirichlet Process (BGMM) were tested. The result when both the inlier and outlier distributions are sampled evenly is shown in the figure below. When each cluster has the same number of samples, both models perform well.

Fig 2 :: Testing GMM and BGMM when ratio of faulty to clean observables is 1

As noted above, GMM can perform poorly when insufficiently many points are included in each mixture. To test this, the outlier distribution was down sampled to see when GMM begins to fail. To test this statement, the next three plots were included.

Fig 3 :: Testing GMM and BGMM when ratio of faulty to clean observables is 0.1

When the ratio of faulty to clean observables is down sampled to 0.01 (i.e., 5000 inlier samples and 50 outlier samples), you can begin to see that GMM is have difficulty discerning between the distributions while the BGMM model is still accurately representing both distributions.

Fig 4 :: Testing GMM and BGMM when ratio of faulty to clean observables is 0.01

When the ratio of faulty to clean observables is down sampled to 0.005, it can be seen that GMM provides an extremely poor estimate of the densities while BGMM still accurately represents the data.

Fig 5 :: Testing GMM and BGMM when ratio of faulty to clean observables is 0.005

Density Estimination

Gaussian Mixture Model

Bayesian Gaussian Mixture With Dirichlet Distribution

Testing

Related Posts: