Tweedie Formula

A simple empirical Bayesian posterior estimate

Tweedie’s formula (Efron, 2011) is a key method in empirical Bayes to obtain the posterior estimate based on observed data. Assume we have a data $\mathrm{X_0}$ and a noisy measurement $\mathrm{X_1}$ which follow

\[\begin{align} \mathrm{X_1 = \alpha X_0 + \sigma Z},\label{transform} \end{align}\]

where $\mathrm{X_0 \sim p_0}$, $\mathrm{X_1 \sim p_1}$, and $\mathrm{Z \sim N(0, {I})}$.

We next study the unconditional score function $\mathrm{\nabla_{x_1} \log p_1(x_1)}$:

\[\begin{align} \mathrm{\nabla_{x_1} \log p_1(x_1)} &= \mathrm{\frac{1}{p_1(x_1)} \nabla_{x_1} p_1(x_1)} \notag \\ &= \mathrm{\frac{1}{p_1(x_1)} \nabla_{x_1} \int p_{1|0}(x_1|x_0) p_0(x_0) d x_0} \notag \\ &= \mathrm{\frac{1}{p_1(x_1)} \int p_{1|0}(x_1|x_0) \nabla_{x_1} \log p_{1|0}(x_1|x_0) p_0(x_0) d x_0} \notag \\ &= \mathrm{\int p_{0|1}(x_0|x_1) \nabla_{x_1} \log p_{1|0}(x_1|x_0) d x_0} \notag \\ &= \mathrm{\int p_{0|1}(x_0|x_1) \frac{\alpha x_0-x_1}{\sigma^2} d x_0} \notag \\ &= \mathrm{\frac{\alpha E[x_0|x_1] - x_1}{\sigma^2}} \notag \\ \end{align}\]

The posterior mean $\mathrm{E[x_0|x_1]}$ behaves like a denoiser, as described by (Chung et al., 2023)

\[\begin{align} \mathrm{E[x_0\\|x_1]=\frac{1}{\alpha}\big(x_1+\sigma^2 \nabla_{x_1} \log p_1(x_1)}\big).\label{tweedie} \end{align}\]

Connections to Diffusion Models

Consider a simple OU process (Song et al., 2021) (Ho et al., 2020)

\[\begin{align} \mathrm{d x_t = -\frac{\beta_t}{2} x_t dt + \sqrt{\beta_t} d w_t,}\label{vpsde} \end{align}\]

where $t\in[0, 1]$. $\mathrm{x_0 \sim p_0}$ and $\mathrm{x_1 \sim p_1}$. With the help of integration factors, the above process yields a simple closed-form solution similar to Eq.\eqref{transform}

\[\begin{align} \mathrm{x_1 = \alpha x_0 + \sigma Z},\notag \end{align}\]

where $\mathrm{\alpha=e^{-\frac{1}{2}\int_0^1 \beta_s d s}}$, $\mathrm{\sigma=\sqrt{1-e^{-\int_0^1 \beta_s d s}}}$.

Additionally, the reverse of Eq.\eqref{vpsde} (Anderson, 1982) follows that

\[\begin{align} \mathrm{d x_t = -\frac{\beta_t}{2} x_t dt -\beta_t \nabla \log p_t(x_t)dt + \sqrt{\beta_t} d w_t}.\label{bsde} \end{align}\]

The above provides an iterative process to generate $\mathrm{X_0}$ using multiple scores \(\mathrm{\{\nabla \log p_t\}_{t=0}^1}\). In contrast, Tweedie’s formula in Eq.\eqref{tweedie} requires only one score function, $\mathrm{\nabla_{x_1} \log p_1(x_1)}$, to generate $\mathrm{X_0}$.

Does this imply that the iterative process in Eq.\eqref{bsde} is not necessary? This might be the case if we can obtain a sufficiently accurate score estimator, $\mathrm{s_{\theta} \approx \nabla_{x_1} \log p_1(x_1)}$, given a large enough dataset $\mathrm{x_0\sim p_0}$. However, this is not trivial for highly complex, high-dimensional, multimodal real-world data and the composition of maps often facilitates the computations.

Second-order Tweedie

The second-order formula can be derived via the exponential family of Eq.\eqref{transform} (Meng et al., 2021):

\[\begin{align} \mathrm{p_1(x_1 | x_0)=e^{-x_0^\intercal T(x_1) -\varphi(x_0)} p(x_1)}.\notag \end{align}\]

where $\mathrm{T(x)=\frac{\alpha x}{\sigma^2}}$, $\mathrm{\varphi(x_0)}$ is a function to normalize \(\mathrm{p_1(x_1 \\| x_0)}\) and $\mathrm{p(x_1)\propto e^{-\frac{\|x_1\|_2^2}{2\sigma^2}}}$.

By Bayes rule, the posterior follows

\[\begin{align} \mathrm{p_0(x_0| x_1) = \frac{p_1(x_1 |x_0) p_0(x_0)}{p_1(x_1)}=e^{x_0^\intercal T(x_1) - \varphi(x_0) -\lambda(x_1)}p(x_0)}, \notag \end{align}\]

where \(\mathrm{\lambda(x_1)=log\frac{p_1(x_1)}{p(x_1)}}\). Since the poterior is normalized, it follows that

\[\begin{align} \mathrm{\int e^{x_0^\intercal T(x_1) - \varphi(x_0) -\lambda(x_1)}p(x_0)d x_0=1}.\label{normalized_property} \end{align}\]

Take the \(1_{\text{st}}\) and \(2_{\text{nd}}\) order derivatives of Eq.\eqref{normalized_property} w.r.t. $\mathrm{x_1}$

\[\begin{align} &\mathrm{\int \frac{\alpha x_0^\intercal}{\sigma^2} p_0(x_0 |x_1)d x_0= J(x_1)}.\label{jacobian}\\ &\mathrm{\int \bigg( \frac{\alpha x_0}{\sigma^2} - J(x_1)\bigg) \bigg(\frac{\alpha x_0}{\sigma^2} - J(x_1)\bigg)^\intercal p_0(x_0|x_1)d x_0=H(x_1)}.\label{hessian}\\ \end{align}\]

where $\mathrm{J}(\cdot)$ and $\mathrm{H}(\cdot)$ are the Jacobian and Hessian of $\mathrm{\lambda(\cdot)}$, respectively.

Plugging Eq.\eqref{jacobian} into Eq.\eqref{hessian} and combining the definition of $\mathrm{\lambda(x_1)}$, we have

\[\begin{align} &\mathrm{E[x_0 |x_1]=\frac{\sigma^2}{\alpha} J(x_1)=\frac{1}{\alpha}(x_1 + \sigma^2 \nabla_{x_1} \log p_1(x_1))}.\label{jacob_v2}\\ &\mathrm{E[x_0 x_0^\intercal |x_1]=\frac{\sigma^4}{\alpha^2} \bigg(\nabla_{x_1}^2 \log p_1 (x_1)+\frac{1}{\sigma^2}I\bigg) + J(x_1) J(x_1)^\intercal}.\label{hessian_v2}\\ \end{align}\]

Eq.\eqref{jacob_v2} matches the first-order result in Eq.\eqref{tweedie} and Eq.\eqref{hessian_v2} provides the second-order estimate.

  1. Efron, B. (2011). Tweedie’s Formula and Selection Bias. Journal of the American Statistical Association, 106 (496), 1602–1614.
  2. Chung, H., Kim, J., McCann, M. T., Klasky, M. L., & Ye, J. C. (2023). Diffusion Posterior Sampling for General Noisy Inverse Problems. ICLR.
  3. Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2021). Score-Based Generative Modeling through Stochastic Differential Equations . ICLR.
  4. Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. NeurIPS.
  5. Anderson, B. D. O. (1982). Reverse-time diffusion equation models. Stochastic Processes and Their Applications, 12(3), 313–326.
  6. Meng, C., Song, Y., Li, W., & Ermon, S. (2021). Estimating High Order Gradients of the Data Distribution by Denoising. Advances in Neural Information Processing Systems (NeurIPS).