Ref: Barfoot, Timothy D. State Estimation for Robotics. Cambridge University Press, 2017. Section 5.3

Deriving Cauchy M-Estimator For M.A.P Estimation

We would like to derive a common M-Estimator (Maximum Likelihood Type Estimator), the Cauchy cost function, from the point of view of covariance estimation. Specifically, we would like to derive it from the point-of-view of maximizing the a posteriori distribution.

We will begin with the traditional MAP estimation cost function.

$J(x) = \frac{1}{2} \sum_{n=1}^{N} r(x)^T \Sigma_n^{-1} r(x),$

where $r_n(x)$ is the residual of state $x$ at iteration $n$, and $\Sigma_n$ is the provided covariance at iteration $n$. This provides us with the traditional optimization problem, as provided below.

$\{x^{*}\} = \text{arg min}_x \ J^{'}(x) \ = \ \text{arg min}_x -\text{ln} \ p(x|z)$

In the optimization problem provided above, it is assumed that the covariance is provided a priori. However, this is not always a valid assumption, so, it is desirable to be able to estimation both the state vector and the covariance estimate concurrently. To do so, we can augment our optimization problem as

$\{x^*, \Sigma^*\} = \text{arg min}_{x,\Sigma} \ J^{'}(x,\Sigma) = \ \text{arg min}_{x,\Sigma} -\text{ln} \ p(x|\Sigma, z)p(\Sigma),$

which can be factorized as

$\{x^*, \Sigma^*\} = \text{arg min}_{x,\Sigma} \prod_{n=1}^N \ p(x|\Sigma_n, z_n)p(\Sigma_n).$

Now, we need to provide a prior on our covariance matrix. A commonly utilized prior, due to the fact that it’s a conjugate proir to symmetric nonnegative-definite matrices, is the Inverse Wishart distribution, which is defined as,

$\Sigma_n \sim \mathcal{W}^{-1}(\Psi_n, \nu_n) \quad \text{if} \quad p(\Sigma_n) = \frac{|\Psi_n|^{\nu_n/2}}{2^{\frac{\nu_n\Sigma_n}{2}}\Gamma_{\Sigma_n(\nu_n/2)}} |\Sigma_n|^{-\frac{\nu_n + \Sigma_n +1}{2}} e^{-\frac{1}{2} Tr[\Psi_n\Sigma_n^{-1}]},$

where, $\nu_n$ is the degrees-of-freedom, and $\Psi_n$ is a scaling matrix.

If we plug the inverse wishart prior into the factorized objective function, we are left with,

$J^{'}(x,\Sigma) = \frac{1}{2} \Sigma_{n=1}^N ( r_n(x)^T \Sigma_n^{-1} r_n(x) - [ \nu_n + \Sigma_n + 2 ] \text{ln}( |\Sigma_n^{-1}| ) ) + Tr[\Psi_n \Sigma_n].$

Now, to find the optimal covariance estimate, we can set the partial derivative of our objective function w.r.t. $\Sigma^{-1}$ equal to zero, as provided below.

$\frac{\partial{J^{'}(x,\Sigma)}}{\partial{\Sigma_n^{-1}}} = \frac{1}{2} r_n(x) r_n(x)^T - \frac{1}{2} [\nu_n + \Sigma_n +2] \Sigma_n + \frac{1}{2} \Psi_n$

Setting the expression provided above equal to zero, we are left with the optimal covariance estimate.

$\Sigma_n(x) = \frac{1}{\nu_n + \Sigma_n +2} \Psi_n + \frac{1}{\nu_n + \Sigma_n +2} r_n(x) r_n(x)^T$

Finally, if we plug our expression for the optimal $\Sigma_n$ back into the objective function, we are left with,

$J^{'}(x) = \frac{1}{2}\sum_{n=1}^N \alpha_n \ \text{ln} \ (1 + r_n(x)^T \Psi_n^{-1} r_n(x)),$

where $\alpha = \nu_n + \Sigma_n + 2$. Which is the same objective function as specified by the Cauchy M-estimator.