PatterRecognition-C10-Approximate-Inference

Keywords: Variational Linear Regression, Variational Logistic Regression, Variational Inference, Python

This is the Chapter10 ReadingNotes from book Bishop-Pattern-Recognition-and-Machine-Learning-2006. [Code_Python]

👉Simply Understand Variational Inference >>

Variational Inference

Functional like entropy $H[p]$, which takes a probability distribution $p(x)$ as the input and returns the quantity:
$$
H[p] = \int p(x) \ln p(x) dx
\tag{10.1}
$$
as the output.

We can the introduce the concept of a functional derivative, which expresses how the value of the functional changes in response to infinitesimal changes to the input function.

Suppose we have a fully Bayesian model in which all parameters are given prior distributions.

Our probabilistic model specifies the joint distribution $p(\pmb{X},\pmb{Z})$, and our goal is to find an approximation for the posterior distribution $p(\pmb{Z}|\pmb{X})$ as well as for the model evidence $p(X)$.

Compare

As before, we can maximize the lower bound $\mathcal{L}(q)$ by optimization with respect to the distribution $q(\pmb{Z})$, which is equivalent to minimizing the $KL$ divergence. If we allow any possible choice for $q(\pmb{Z})$, then the maximum of the lower bound occurs when the $KL$ divergence vanishes, which occurs when $q(\pmb{Z})$ equals the posterior distribution $p(\pmb{Z}|\pmb{X})$.

However, in practice, $p(\pmb{Z}|\pmb{X})$ is intractable.

We therefore consider instead a restricted family of distributions $q(\pmb{Z})$ and then seek the member of this family for which the $KL$ divergence is minimized.

Approximation

One way to restrict the family of approximating distributions is to use a parametric distribution $q(\pmb{Z}|\pmb{\omega})$ governed by a set of parameters $\omega$. The lower bound $L(q)$ then becomes a function of $\omega$, and we can exploit standard nonlinear optimization techniques to determine the optimal values for the parameters.

Factorized distributions

Properties of factorized approximations

Example: The univariate Gaussian

Model comparison

Illustration: Variational Mixture of Gaussians

Variational distribution

Variational lower bound

Predictive density

Determining the number of components

Induced factorizations

Variational Linear Regression

Variational distribution

Predictive distribution

Lower bound

Exponential Family Distributions

Lower bound

Local Variational Methods

Variational Logistic Regression

Variational posterior distribution

Optimizing the variational parameters

Inference of hyperparameters

Expectation Propagation

Example: The clutter problem

Expectation propagation on graphs

https://zhuanlan.zhihu.com/p/59615785?utm_id=0

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×