Tweedie Formula

A simple empirical Bayesian posterior estimate

05 January 2025

Tweedie’s formula (Efron, 2011) is a key method in empirical Bayes to obtain the posterior estimate based on observed data. Assume we have a data $\mathrm{X_0}$ and a noisy measurement $\mathrm{X_1}$ which follow

\[\begin{align} \mathrm{X_1 = \alpha X_0 + \sigma Z},\label{transform} \end{align}\]

where $\mathrm{X_0 \sim p_0}$, $\mathrm{X_1 \sim p_1}$, and $\mathrm{Z \sim N(0, {I})}$.

We next study the unconditional score function $\mathrm{\nabla_{x_1} \log p_1(x_1)}$:

\[\begin{align} \mathrm{\nabla_{x_1} \log p_1(x_1)} &= \mathrm{\frac{1}{p_1(x_1)} \nabla_{x_1} p_1(x_1)} \notag \\ &= \mathrm{\frac{1}{p_1(x_1)} \nabla_{x_1} \int p_{1|0}(x_1|x_0) p_0(x_0) d x_0} \notag \\ &= \mathrm{\frac{1}{p_1(x_1)} \int p_{1|0}(x_1|x_0) \nabla_{x_1} \log p_{1|0}(x_1|x_0) p_0(x_0) d x_0} \notag \\ &= \mathrm{\int p_{0|1}(x_0|x_1) \nabla_{x_1} \log p_{1|0}(x_1|x_0) d x_0} \notag \\ &= \mathrm{\int p_{0|1}(x_0|x_1) \frac{\alpha x_0-x_1}{\sigma^2} d x_0} \notag \\ &= \mathrm{\frac{\alpha E[x_0|x_1] - x_1}{\sigma^2}} \notag \\ \end{align}\]

The posterior mean $\mathrm{E[x_0|x_1]}$ behaves like a denoiser, as described by (Chung et al., 2023)

\[\begin{align} \mathrm{E[x_0\\|x_1]=\frac{1}{\alpha}\big(x_1+\sigma^2 \nabla_{x_1} \log p_1(x_1)}\big).\label{tweedie} \end{align}\]

Connections to Diffusion Models

Consider a simple OU process (Song et al., 2021) (Ho et al., 2020)

\[\begin{align} \mathrm{d x_t = -\frac{\beta_t}{2} x_t dt + \sqrt{\beta_t} d w_t,}\label{vpsde} \end{align}\]

where $t\in[0, 1]$. $\mathrm{x_0 \sim p_0}$ and $\mathrm{x_1 \sim p_1}$. With the help of integration factors, the above process yields a simple closed-form solution similar to Eq.\eqref{transform}

\[\begin{align} \mathrm{x_1 = \alpha x_0 + \sigma Z},\notag \end{align}\]

where $\mathrm{\alpha=e^{-\frac{1}{2}\int_0^1 \beta_s d s}}$, $\mathrm{\sigma=\sqrt{1-e^{-\int_0^1 \beta_s d s}}}$.

Additionally, the reverse of Eq.\eqref{vpsde} (Anderson, 1982) follows that

\[\begin{align} \mathrm{d x_t = -\frac{\beta_t}{2} x_t dt -\beta_t \nabla \log p_t(x_t)dt + \sqrt{\beta_t} d w_t}.\label{bsde} \end{align}\]

The above provides an iterative process to generate $\mathrm{X_0}$ using multiple scores $\mathrm{\{\nabla \log p_t\}_{t=0}^1}$. In contrast, Tweedie’s formula in Eq.\eqref{tweedie} requires only one score function, $\mathrm{\nabla_{x_1} \log p_1(x_1)}$, to generate $\mathrm{X_0}$.

Does this imply that the iterative process in Eq.\eqref{bsde} is not necessary? This might be the case if we can obtain a sufficiently accurate score estimator, $\mathrm{s_{\theta} \approx \nabla_{x_1} \log p_1(x_1)}$, given a large enough dataset $\mathrm{x_0\sim p_0}$. However, this is not trivial for highly complex, high-dimensional, multimodal real-world data and the composition of maps often facilitates the computations.

Second-order Tweedie

The second-order formula can be derived via the exponential family of Eq.\eqref{transform} (Meng et al., 2021):

\[\begin{align} \mathrm{p_1(x_1 | x_0)=e^{-x_0^\intercal T(x_1) -\varphi(x_0)} p(x_1)}.\notag \end{align}\]

where $\mathrm{T(x)=\frac{\alpha x}{\sigma^2}}$, $\mathrm{\varphi(x_0)}$ is a function to normalize $\mathrm{p_1(x_1 \\| x_0)}$ and $\mathrm{p(x_1)\propto e^{-\frac{\|x_1\|_2^2}{2\sigma^2}}}$.

By Bayes rule, the posterior follows

\[\begin{align} \mathrm{p_0(x_0| x_1) = \frac{p_1(x_1 |x_0) p_0(x_0)}{p_1(x_1)}=e^{x_0^\intercal T(x_1) - \varphi(x_0) -\lambda(x_1)}p(x_0)}, \notag \end{align}\]

where $\mathrm{\lambda(x_1)=log\frac{p_1(x_1)}{p(x_1)}}$. Since the poterior is normalized, it follows that

\[\begin{align} \mathrm{\int e^{x_0^\intercal T(x_1) - \varphi(x_0) -\lambda(x_1)}p(x_0)d x_0=1}.\label{normalized_property} \end{align}\]

Take the $1_{\text{st}}$ and $2_{\text{nd}}$ order derivatives of Eq.\eqref{normalized_property} w.r.t. $\mathrm{x_1}$

\[\begin{align} &\mathrm{\int \frac{\alpha x_0^\intercal}{\sigma^2} p_0(x_0 |x_1)d x_0= J(x_1)}.\label{jacobian}\\ &\mathrm{\int \bigg( \frac{\alpha x_0}{\sigma^2} - J(x_1)\bigg) \bigg(\frac{\alpha x_0}{\sigma^2} - J(x_1)\bigg)^\intercal p_0(x_0|x_1)d x_0=H(x_1)}.\label{hessian}\\ \end{align}\]

where $\mathrm{J}(\cdot)$ and $\mathrm{H}(\cdot)$ are the Jacobian and Hessian of $\mathrm{\lambda(\cdot)}$, respectively.

Plugging Eq.\eqref{jacobian} into Eq.\eqref{hessian} and combining the definition of $\mathrm{\lambda(x_1)}$, we have

\[\begin{align} &\mathrm{E[x_0 |x_1]=\frac{\sigma^2}{\alpha} J(x_1)=\frac{1}{\alpha}(x_1 + \sigma^2 \nabla_{x_1} \log p_1(x_1))}.\label{jacob_v2}\\ &\mathrm{E[x_0 x_0^\intercal |x_1]=\frac{\sigma^4}{\alpha^2} \bigg(\nabla_{x_1}^2 \log p_1 (x_1)+\frac{1}{\sigma^2}I\bigg) + J(x_1) J(x_1)^\intercal}.\label{hessian_v2}\\ \end{align}\]

Eq.\eqref{jacob_v2} matches the first-order result in Eq.\eqref{tweedie} and Eq.\eqref{hessian_v2} provides the second-order estimate.

Efron, B. (2011). Tweedie’s Formula and Selection Bias. Journal of the American Statistical Association, 106 (496), 1602–1614.
Chung, H., Kim, J., McCann, M. T., Klasky, M. L., & Ye, J. C. (2023). Diffusion Posterior Sampling for General Noisy Inverse Problems. ICLR.
Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2021). Score-Based Generative Modeling through Stochastic Differential Equations . ICLR.
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. NeurIPS.
Anderson, B. D. O. (1982). Reverse-time diffusion equation models. Stochastic Processes and Their Applications, 12(3), 313–326.
Meng, C., Song, Y., Li, W., & Ermon, S. (2021). Estimating High Order Gradients of the Data Distribution by Denoising. Advances in Neural Information Processing Systems (NeurIPS).