Replica Exchange and Variance Reduction
Running multiple MCMCs at different temperatures to explore the solution thoroughly.
Variance-reduced sampling algorithms (Dubey et al., 2016) (Xu et al., 2018) are not widely adopted in practice. Alternatively, we focus on the energy variance reduction to exploit exponential accelerations but no longer consider the gradient variance reduction.
To this end, we consider a standard sampling algorithm, the stochastic gradient Langevin dynamics (SGLD), which is a mini-batch numerical discretization of a stochastic differential equation (SDE) as follows:
where
To accelerate the simulations, replica exchange proposes to run multiple stochastic processes with different temperatures, where interactions between different SGLD chains are conducted in a manner that encourages large jumps (Yin & Zhu, 2010) (Chen et al., 2019) (Deng et al., 2020) (Deng et al., 2021).
In particular, the parameters swap the positions with a probability
where
The desire to obtain more effective swaps drives us to design more efficient energy estimators. To reduce the variance of the noisy energy estimator
The following shows a demo that explains how variance-reduced reSGLD works.
- Dubey, A., Reddi, S. J., Póczos, B., Smola, A. J., Xing, E. P., & Williamson, S. A. (2016). Variance Reduction in Stochastic Gradient Langevin Dynamics. NeurIPS.
- Xu, P., Chen, J., Zou, D., & Gu, Q. (2018). Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization. NeurIPS.
- Yin, G., & Zhu, C. (2010). Hybrid Switching Diffusions: Properties and Applications. Springer.
- Chen, Y., Chen, J., Dong, J., Peng, J., & Wang, Z. (2019). Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion. ICLR.
- Deng, W., Feng, Q., Gao, L., Liang, F., & Lin, G. (2020). Non-Convex Learning via Replica Exchange Stochastic Gradient MCMC. ICML.
- Deng, W., Feng, Q., Karagiannis, G., Lin, G., & Liang, F. (2021). Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction. ICLR.