SMC2015 : Sequential Monte Carlo workshop 26-28 Aug 2015 Paris (France)

# List of talks (by alphabetic order of speakers)

Click on title to see the slides

## Some applications of asymmetric Metropolis-Hastings updates in the context of exact approximate algorithms

Christophe Andrieu (joint work with A. Doucet and S. Yildirim)

There has recently been some interest in approximating MH updates in situations where quantities required for their implementation are not tractable. This is often the case for the acceptance ratio involved in the MH accept/reject mechanism, for examples in situations where the target density is not tractable. In this scenario one can for example resort to pseudo-marginal strategies which exploit the possibility to unbiasedly estimate a function proportional to the target density. Other more general strategies are however available, cover more scenarios, and the aim of the talk is to review such scenarios and show how “asymmetric” MH updates can be useful, and in particular allow for the reduction of the variability of these approximations and improve performance at virtually no cost on a parallel computer.

## Fighting malaria with SMC

John Aston (joint work with Adam Hall, Michael Chappell and Adam Johansen)

Malaria is a mosquito borne parasitic illness with affects millions of people worldwide each year. There are a number of treatments for the disease, but currently some of the most effective are based on the Artemisinin compounds. These drugs have complex pharamacokinectic/pharamacodynamic (PK/PD) properties but also are only effective against the disease at certain times during the parasite'slifecycle. We propose to look at PK/PD models for Artemisinins using SMC samplers to investigate regimes currently used for the treatment of malaria in SE Asia. A compartmental model for the drug's kinetics will be developed along with a model for the lifecycle of the parasite. This will then be analyzed using SMC, and will involve embedding ODE solvers within the SMC likelihood evaluations as well as having deterministically changing models depending on the current stage of the parasites' lifecycle. The models will be investigated from both an identifiability and estimation point of view. It is hoped that models such as these will be able to enhance treatment regimes by allowing a greater understanding of drug and parasite interactions.

## On the convergence of adaptive SMC Methods

Alex Beskos

In several implementations of SMC methods it is important, in terms of algorithmic efficiency, to exploit the information of the history of the samples to optimally tune their subsequent propagations. We look at examples of such "Adaptive" SMC implementations at applications in the literature. We provide a carefully formulated asymptotic theory for a class of such algorithms. There are only limited results about the theoretical underpinning of such adaptive methods: we bridge this gap by providing a WLLN and a CLT for algorithms that adapt the mutation steps or the temperatures. The CLT in particular seems to be the first result of its kind in the literature and provides a formal justification of algorithms used in many real data contexts. We obtain that for a general class of adaptive SMC algorithms the asymptotic variance of the estimators from the adaptive SMC method is identical to a so-called ‘perfect’ SMC algorithm which uses ideal proposal kernels. Our results are supported by application on a complex high-dimensional posterior distribution associated with the Navier-Stokes model, where adapting high-dimensional parameters of the proposal kernels is critical for the efficiency of the algorithm.

## Monte Carlo methods and nonlinear stochastic optimal control approximation

This talk will introduce a newly proposed framework for nonlinear stochastic optimal control. Loosely speaking, the optimal controller in this nonlinear stochastic setting is related to the solution of a deterministic partial differential equation (PDE): the Hamilton-Jacobi-Bellman (HJB) equation. We will discuss how the solution to the HJB equation can be formulated in terms of an expectation over a stochastic trajectory defined by an uncontrolled stochastic differential equation. Indeed, this relationship is a simple consequence of the Feynman-Kac formula (and more generally its nonlinear variants). It then follows that this expectation (or path-integral) can be approximated via Monte Carlo simulation and since the optimal controller is related to the solution of this HJB equation it is a short leap from there to the optimal controller (or more specifically a Monte Carlo approximation of the optimal controller). We will discuss the intricacies involved in carrying out this Monte Carlo simulation; some of which are particular to the control framework. We will discuss the stability of controlled system given the approximated controller. A number of other recent extensions and relevant (open) discussion points will also be touched upon. And more generally, we hope to stimulate further interest in this topic.

## Divide-and-conquer Sequential Monte Carlo

Alexandre Bouchard-Côté (joint work with Fredrik Lindsten, Adam Johansen, Christian Naesseth, Bonnie Kirkpatrick, Thomas Schön, and John Aston)

I will describe Divide-and-Conquer Sequential Monte Carlo (D&C SMC), a method for performing inference on a collection of auxiliary distributions organized into a tree. In contrast to standard SMC samplers, D&C SMC exploits multiple populations of weighted particles, while still being an exact approximate method. D&C SMC is applicable to a broad class of probabilistic graphical models, including models with loops. I will provide several examples of tree-structured auxiliary distributions, including one derived from a regular lattice undirected model, and one derived from a hierarchical Bayesian model. These examples are constructed using a simple recipe that applies to certain "self-similar" collections of models. I will also briefly discuss how D&C SMC can take advantage of distributed computing architectures.

## Nested particle filters for online parameter estimation in discrete–time state–space Markov models

Dan Crisan (joint work with Joaquin Miguez)

We propose a new method for approximating the posterior probability distribution of the fixed parameters of a state-space dynamic system using a sequential Monte Carlo method. The method relies on a nested structure that employs two layers of particle filters to approximate the posterior probability measure of the static parameters and the dynamic variables of the system of interest, in a vein similar to the recent “sequential Monte Carlo square” (SMC2) algorithm. However, unlike the SMC2 scheme, the proposed algorithm operates in a purely sequential and recursive manner. In particular, the computational complexity of the recursive steps of the proposed method is constant over time.

## Random weight SMC for Bayesian model comparison with intractable likelihoods

Richard Everitt

Markov random field models are used widely in computer science, statistical physics and spatial statistics and network analysis. However, Bayesian analysis of these models using standard Monte Carlo methods is not possible due to their intractable likelihood functions. Several methods have been developed that permit exact, or close to exact, simulation from the posterior distribution. However, estimating the evidence and Bayes' factors (BFs) for these models remains challenging in general. This talk describes new random weight importance sampling and sequential Monte Carlo methods for estimating BFs that use simulation to circumvent the evaluation of the intractable likelihood, and compares them to existing methods. In some cases we observe an advantage in the use of biased weight estimates; an initial investigation into the theoretical and empirical properties of this class of methods is presented.This is joint work with Adam Johansen, Ellen Rowing and Melina Evdemon-Hogan.

## Particle Metropolis adjusted Langevin algorithms

Paul Fearnhead (joint work with Chris Nemeth and Chris Sherlock)

Particle MCMC has recently been introduced as a class of algorithms that can be used to analyse state-space models. They use MCMC moves to update the parameters of the model, and particle filters to both propose values for the latent state and to obtain estimates of the posterior density that are used to calculate the acceptance probability. For many applications it is easy to adapt the particle filter so that it also gives an estimate of the gradient of the log-posterior, and this estimate can then be used within the proposal distribution for the parameters. This results in a particle version of a Metropolis adjusted Langevin algorithm (particle MALA).

Here I will present results on the theoretical properties of particle MALA under standard asymptotics, which correspond to an increasing dimension of the parameters. Our results show that the behaviour of particle MALA depends crucially on how accurately we can estimate the gradient of the log-posterior. If the error in the estimate of the gradient is not controlled sufficiently well as we increase dimension then asymptotically there will be no advantage in using particle MALA over a particle MCMC algorithm using a random-walk proposal. However if the error is well-behaved, then particle MALA has the same advantage over particle MCMC using a random-walk proposal that standard MALA has over random-walk MCMC. Furthermore, we show that asymptotically the optimal acceptance rate is 15.47% and that we should tune the number of particles so that the variance of our estimate of the log-posterior is roughly 3.

http://arxiv.org/pdf/1412.7299v1

## Pseudo-Marginal Monte Carlo optimisation

Axel Finke (joint work with Adam M. Johansen)

We extend existing Monte Carlo schemes for performing optimisation in latent-variable settings, i.e. in situations in which the objective function is intractable. This often occurs when performing (marginal) maximum-likelihood or (marginal) maximum-a-posteriori estimation. To this end, we present a flexible framework for combining the SAME algorithm from Doucet, Godsill & Robert (2002) with state-of-the-art MCMC kernels such as pseudo-marginal MCMC or conditional SMC kernels. We also construct population-based approaches by incorporating these ideas into SMC samplers.

## Sequential Quasi-Monte Carlo

Mathieu Gerber (joint work with Nicolas Chopin)

We derive and study SQMC (Sequential Quasi-Monte Carlo), a class of algorithms obtained by introducing QMC point sets in particle filtering. SQMC is related to, and may be seen as an extension of, the array-RQMC algorithm of L'Ecuyer et al. (2006). The complexity of SQMC is $O(N\log N)$, where N is the number of simulations at each iteration, and its error rate is smaller than the Monte Carlo rate $O_P(N^{-1/2})$. The only requirement to implement SQMC is the ability to write the simulation of particle $x_t^n$ given $x_{t-1}^n$ as a deterministic function of $x_{t-1}^n$ and a fixed number of uniform variates. We show that SQMC is amenable to the same extensions as standard SMC, such as forward smoothing, backward smoothing, unbiased likelihood evaluation, and so on. In particular, SQMC may replace SMC within a PMCMC (particle Markov chain Monte Carlo) algorithm. We establish several convergence results. We provide numerical evidence that SQMC may significantly outperform SMC in practical scenarios.

## Multilevel SMC samplers

Ajay Jasra (joint work with Alex Beskos, Kody Law, Raul Tempone and and Yan Zhou)

In this talk we consider the approximation of expectations w.r.t. probability distributions associated to the solution of partial differential equations (PDEs); this scenario appears routinely in Bayesian inverse problems. In practice, one often has to solve the associated PDE numerically, using, for instance finite element methods and leading to a discretisation bias, with the step-size level h_L. In addition, the expectation cannot be computed analytically and one often resorts to Monte Carlo methods. In the context of this problem, it is known that the introduction of the multilevel Monte Carlo (MLMC) method can reduce the amount of computational effort to estimate expectations, for a given level of error. This is achieved via a telescoping identity associated to a Monte Carlo approximation of a sequence of probability distributions. In many practical problems of interest, one cannot achieve an i.i.d. sampling of the associated sequence of probability distributions. A sequential Monte Carlo (SMC) version of the MLMC method is introduced to deal with this problem. It is shown that under appropriate assumptions, the attractive property of a reduction of the amount of computational effort to estimate expectations, for a given level of error, can be maintained within the SMC context.

## Markov interacting importance samplers

Robert Kohn (joint work with Eduardo F. Mendes and Marcel Scharth)

We introduce a new Markov chain Monte Carlo (MCMC) sampler called the Markov Interacting Importance Sampler (MIIS). The MIIS sampler uses conditional importance sampling (IS) approximations to jointly sample the current state of the Markov Chain and estimate conditional expectations, possibly by incorporating a full range of variance reduction techniques. We compute Rao-Blackwellized estimates based on the conditional expectations to construct control variates for estimating expectations under the target distribution. The control variates are particularly efficient when there are substantial correlations between the variables in the target distribution, a challenging setting for MCMC. An important motivating application of MIIS occurs when the exact Gibbs sampler is not available because it is infeasible to directly simulate from the conditional distributions. In this case the MIIS method can be more efficient than a Metropolis-within-Gibbs approach. We also introduce the MIIS random walk algorithm, designed to accelerate convergence and improve upon the computational efficiency of standard random walk samplers. Simulated and empirical illustrations for Bayesian analysis show that the method significantly reduces the variance of Monte Carlo estimates compared to standard MCMC approaches, at equivalent implementation and computational effort.

## Statistical inference for oscillation processes

Sylvain Le Corff

In this talk, a new model for time series with a specific oscillation pattern is proposed. This model consists of a hidden phase process and a nonparametric curve characterizing the pattern f leading together to a generalized hidden Markov model. The identifiability is established under mild assumptions on the pattern f. In the case of time-varying amplitude and baseline a Rao Blackwellised SMC procedure based on fixed lag particle smoother is introduced in a nonparametric EM procedure to estimate f. The performance of this method is assessed using human electrocardiogram recordings.

## Variance estimation and asymptotically optimal allocation in the particle filter

Anthony Lee (joint work with Nick Whiteley)

Particle filters provide approximations of marginal likelihoods and filtering expectations in hidden Markov models. However, estimating the Monte Carlo variance of these approximations, without generating multiple independent realizations of the approximations themselves, is not straightforward. We present an unbiased estimator of the variance of the marginal likelihood approximation, and consistent estimators of the asymptotic variance of the approximations of the marginal likelihood and filtering expectations. These estimators are byproducts of a single run of a particle filter and have no added computational complexity or storage requirements. With additional storage requirements, one can also consistently estimate higher-order terms in the non-asymptotic variance. When the number of particles is allowed to vary over time, we show how one can approximate their asymptotically optimal allocation.

# Two simple schemes for the parallelisation of particle filters

Joaquin Miguez (joint work with Katrin Achutegui, Dan Crisan, Gonzalo Ríos and Manuel Vázquez)

Considerable effort has been devoted in the past decade to the design of schemes for the parallel, or distributed, implementation of particle filters, which are needed in many practical applications. In the talk we discuss two approaches. The first one is possibly the simplest practical scheme, which consists in splitting the computational budget into M fully independent particle filters with N particles each, and then obtaining the desired estimators by averaging over the M independent outcomes of the filters. This approach minimises the parallelisation overhead yet displays some desirable theoretical properties. In particular, it can be proved that the mean square estimation error (of integrals with respect to the filter measure) vanishes asymptotically with the same rate, proportional to 1/MN, as the standard (centralised) particle filter with the same total number of particles. A time-error index to quantify the improvement in running times, and to compare schemes with different degrees of parallelisation, is also described. The second approach involves the interaction of the M separate filters by periodically exchanging small subsets of particles. We also present results regarding the asymptotic convergence, uniform over time, of this algorithm and discuss some sufficient conditions needed to guarantee it. Finally, we present illustrative computer simulation examples and use the two schemes to discuss the trade-off between performance and computational overhead in the parallelisation of particle filters.

## Efficient sampling of log-concave distributions over high-dimensional spaces

Eric Moulines

Sampling over high-dimensional space has become a prerequisite in the applications of Bayesian statistics to machine learning problem. In many situations of interest, the log-posterior distribution is concave. The likelihood part is generally smooth and gradient Lipshitz while the prior is concave but typically not smooth (the archetypical problem is the LASSO or the elastic-net penalty, but many other problems can be cast into this framework). We will describe methods to sample such distributions, which are adapted from the state-of-the-art optimization procedures which have been developed in this context. We will also provide convergence in Wasserstein distance to the equilibrium, showing explicitly the dependence in the dimension of the parameter space and the sparsity (effective dimension of the model). Several examples will be presented to illustrate our results.

## Sequential Monte Carlo on High-Performance Hardware Using LibBi

Lawrence Murray

LibBi (<www.libbi.org>) is a software package for Sequential Monte Carlo on modern computer hardware, including multi-core central processing units (CPUs), many-core graphics processing units (GPUs) and distributed-memory clusters of such devices. The software includes a domain-specific language for specifying a model, and a compiler for generating code that will run a particular method with this model on a chosen hardware platform. This talk will provide an introduction to the LibBi software, and an update on its latest developments.

## Nested Sequential Monte Carlo Methods

Christian Anderson Naesseth (joint work with Fredrik Lindsten and Thomas B. Schön)

In this talk I present nested sequential Monte Carlo (NSMC), a methodology to sample from sequences of probability distributions, even where the random variables are high-dimensional. NSMC generalises the SMC framework by requiring only approximate, so called properly weighted, samples from the SMC proposal distribution, while still resulting in a correct SMC algorithm. Furthermore, NSMC can in itself be used to produce such properly weighted samples. Consequently, one NSMC sampler can be used to construct an efficient high-dimensional proposal distribution for another NSMC sampler, and this nesting of the algorithm can be done to an arbitrary degree. These properties allow us to consider complex and high-dimensional models using SMC. I will illustrate the merits of the method on a climatological model for drought prediction, with a 1056-dimensional state space.

Related publication: http://arxiv.org/abs/1502.02536

## The intrinsic dimension of importance sampling

Omiros Papaspiliopoulos

We study importance sampling and particle filters in high dimensions and link the collapse of importance sampling to results about absolute continuity of measures in Hilbert spaces and the notion of effective dimension.

## Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator

Michael Pitt

When an unbiased estimator of the likelihood is used within a Metropolis–Hastings chain, it is necessary to trade off the number of Monte Carlo samples used to construct this estimator against the asymptotic variances of averages computed under this chain. Many Monte Carlo samples will typically result in Metropolis–Hastings averages with lower asymptotic variances than the corresponding Metropolis–Hastings averages using fewer samples. However, the computing time required to construct the likelihood estimator increases with the number of Monte Carlo samples. Under the assumption that the distribution of the additive noise introduced by the log-likelihood estimator is Gaussian with variance inversely proportional to the number of Monte Carlo samples and independent of the parameter value at which it is evaluated, we provide guidelines on the number of samples to select. We demonstrate our results by considering a stochastic volatility model applied to stock index returns.

## Stability of the optimal filter in continuous time: beyond the Beneš filter

Sylvain Rubenthaler

We exhibit a case in which the optimal filter is stable with respect to its initial condition. The computations are such that this result will allow to prove the stability of an associated particle filter. The set of assumptions we propose is a neighbourhood of the Beneš filter. Observations in continuous time could be understood as "arbitrarily frequent observations". The proof is based on a decomposition which takes us back to optimal filters with observations which are not continuous in time.

## Bayesian smoothing of neural sources in MEG/EEG

Alberto Sorrentino

Estimating brain activity from Magneto/Electro-encephalographytime series can be reduced to estimating the location, orientation and strengths of a small ($<10$) but unknown number of current dipoles. In the last decade, particle filters have been proposed as a viable approach for tracking the time-varying posterior distribution of these parameters. However, particle filters tend to provide poor estimates ofthe sources at their first appearance, because the signal strength is relatively low at the source onset. Therefore, for off-line inference purposes there may be benefit in exploiting information from the subsequent time points by studying the smoothing distribution, rather than the filtering distribution. In this talk I will discuss a slight modification of the two-filter smoothing that we have used to improvethe localization of the sources at their first appearance. I will show analysis on synthetic data and application to experimental data recorded from an epileptic subject.

## Particle filtering in high dimension

Ramon van Handel

The challenge of designing particle filters that (provably)perform efficiently in high dimension remains largely open. In this talk, I will try to identify the fundamental difficulty of the problem, propose a general program to surmount this difficulty, and explain to what extent this program has been realized. I will also try to clarify some basic misconceptions that appear in the recent literature on this topic.

## Fluctuations, stability and instability of a distributed particle filter with local exchange

Nick Whiteley (joint work with Kari Heine)

We study a distributed particle filter proposed by Bolic et al. (IEEE Trans. Sig. Proc. 2005). This algorithm involves m groups of M particles, with interaction between groups occurring through a "local exchange" mechanism. We establish a central limit theorem in the regime where $M$ is fixed and $m$ grows. A formula we obtain for the asymptotic variance can be interpreted in terms of colliding Markov chains, enabling analytic and numerical evaluations of how the asymptotic variance behaves over time, with comparison to a benchmark algorithm consisting of m independent particle filters. Subject to regularity conditions, when $m$ is fixed both algorithms converge time-uniformly at rate $M^{-1/2}$. Through use of our asymptotic variance formula we give counter-examples satisfying the same regularity conditions to show that when $M$ is fixed neither algorithm, in general, converges time-uniformly at rate $m^{-1/2}$.

http://arxiv.org/abs/1505.02390

 Online user: 1