Testing Covarince Matrix Priors
Problem setting
When utilizing Bayesian inference, a core interest is the calculation of the posterior distribution. This calculation is generally made difficult – if not intractable – by the requirement to calculate the marginal likelihood (i.e., the denominator of Bayes theorem, as depicted below).
To make this calculation tractable ( actually analytical ), we need to intelligently select priors.
The Inverse Wishart Prior
The inverse Wishart (IW) density is defined as
where $\Lambda$ is a P.D. d-dimensional matrix, and $\nu$ is the degrees-of-freedom. This prior makes the assumption that each variance term is from a inverse chi-square distribution. This distribution can easily be sampled from using the Barlett decomposition, as presented below.
""" Bartlett decomposition based sampling """
def wishartrand(self):
dim = self.inv_psi.shape[0]
chol = np.linalg.cholesky(self.inv_psi)
foo = np.zeros((dim, dim))
for i in range(dim):
for j in range(i + 1):
if i == j:
foo[i, j] = np.sqrt(chi2.rvs(self.nu - (i + 1) + 1))
else:
foo[i, j] = np.random.normal(0, 1)
return np.dot(chol, np.dot(foo, np.dot(foo.T, chol.T)))
Utilizing the implementation presented above, we can sample the prior distribution. For this evaluation, we sampled the inverse Wishart prior 10,000 times with hyperparameters set as $\nu = d+1$, and $\Lambda = I$. The figure shows the first and second variance terms with the color corresponding the the magnitude of correlation between the two terms. From this figure, we can see a clear correlation between variance and correlation.
Fig 1 :: Sampling from Inverse Wishart Prior
The Scaled Inverse Wishart Prior
One possible covariance prior, which may allow us to overcome the dependency between variance and correlation terms is the , scaled inverse Wishart (SIW) . This prior also has the added benefit of allowing prior information to be incorporated about the individual standard deviation components
We can define the scaled inverse Wishart as $\Sigma := \Delta Q \Delta$, where $\Delta_{ii} = \delta_i$. The density for $Q$ and $\Delta$ is defined below.
The scaled inverse wishart can be sampling utilizing to code presented below.
""" Scaled Inverse Wishart """
def siw(self):
Q = invwishart.rvs(self.dim, self.scale)
D = norm.rvs(self.mean, self.var, self.dim)
siw = np.zeros(shape=(self.dim, self.dim))
for j in range(0, self.dim):
for k in range(0, self.dim):
if j is k:
siw[j, k] = np.exp(D[j]) * np.sqrt(Q[j, j])
else:
siw[j, k] = D[j] * D[k] * Q[j, k]
return siw
Again, utilizing the implementation presented above, we can sample the prior distribution. For this evaluation, we sampled the scaled inverse Wishart prior 10,000 times with hyperparameters set as $\nu = d+1$, and $\Lambda = 0.8I$, $b=0$, $\zeta=1$. The figure shows the first and second variance terms with the color corresponding the the magnitude of correlation between the two terms. From this figure, we can clearly see less of a correlation between the variance and correlation terms, when compared to the inverse Wishart.
Fig 2 :: Sampling from Scaled Inverse Wishart Prior
Hierarchical Half-t Prior
Finally, we can test the , hierarchical half-t prior, which we will define the density as
where $\Lambda$ is a diagonal matrix with $\Lambda_{i,i} = \lambda_i$ such that
""" Hierarchical Half-t Prior """
def hiw(self):
n = self.dim
d = self.dof
g = self.scale
Lambda = np.diag(gamma.rvs(1.0 / 2.0, loc=0, scale=g, size=n))
return invwishart.rvs(n + d - 1, 2 * d * Lambda)
Utilizing the implementation presented above, we can sample the prior distribution. For this evaluation, we sampled the hierarchical half-t prior 10,000 times with hyperparameters set as $\nu=2$, and $\zeta = 1.04$. The figure shows the first and second variance terms with the color corresponding the the magnitude of correlation between the two terms.
Fig 2 :: Sampling from Scaled Inverse Wishart Prior
To Do:
- Need to test prior models in collapsed Gibb’s sampling implementation to see their affect.
- Need to test separation strategy methods