References:

1) Alvarez, Ignacio, Jarad Niemi, and Matt Simpson. “Bayesian inference for a covariance matrix.” arXiv preprint arXiv:1408.4050 (2014).

Problem setting

When utilizing Bayesian inference, a core interest is the calculation of the posterior distribution. This calculation is generally made difficult – if not intractable – by the requirement to calculate the marginal likelihood (i.e., the denominator of Bayes theorem, as depicted below).

To make this calculation tractable ( actually analytical ), we can utilize the concept of conjugate priors.

Definition: If $\mathcal{A}$ is a family of distribution for $p(x|\theta)$ and $\mathcal{B}$ is a family of prior distribution for $p(\theta)$, then, $\mathcal{A}$ is conjugate to $\mathcal{B}$ if $p(x|\theta)p(\theta) \in \mathcal{A} \quad \forall \quad p(x|\theta) \in \mathcal{A}, p(\theta) \in \mathcal{B}$.

Conjugate Prior for P.S.D. matrices

For this work, we are interest in estimating covariance matrices. So, below, we will review several conjugate priors for positive semi-definite matrices (PD).

The Inverse Wishart Prior

The inverse Wishart (IW) density is defined as

where $\Lambda$ is a P.D. d-dimensional matrix, and $\nu$ is the degrees-of-freedom. This prior makes the assumption that each variance term is from a inverse chi-square distribution.

This prior has two major issues,

  • Uncertainty for all variance parameters is linked by a single hyperparameter ($\nu$) – (i.e., no way to include prior information on individual variance components)

  • Dependency between variance and correlation terms – (i.e, large variance will force unity correlation, while small variance will force null correlation)

The Scaled Inverse Wishart Prior

For the , scaled inverse Wishart (SIW) , we will define our covariance as $\Sigma := \Delta Q \Delta$, where $\Delta_{ii} = \delta_i$. The density for $Q$ and $\Delta$ is defined below.

This prior is recommended over the IW because prior information can be incorporated about the individual standard deviation components.

Hierarchical Half-t Prior

For the , hierarchical half-t prior, we will define the density as

where $\Lambda$ is a diagonal matrix with $\Lambda_{i,i} = \lambda_i$ such that

To Do:

  • Would like to test all of the defined prior models in our collapsed Gibb’s sampling implementation to see their affect.
  • Look into separation strategy methods – (i.e., model standard deviation and correlation coefficients independently and then combine to form a prior.)