A metric for determining whether the draws from the algorithm are feasible is the acceptance rate. Gelman, Roberts, and Gilks (1996) show that, asymptotically, the optimal acceptance rate for the Metropolis-Hastings algorithm is around 0.25. Rates between 0.15 and 0.40 still produce efficient results that are close to the optimal rate. See Roberts and Rosenthal (2001) for a more detailed treatment on choosing the scale parameters to attain these acceptance rates.
The standard deviation, σ, of the candidate normal distribution is critical in determining the acceptance rate. High σ's result in lower acceptance rates, low σ's result in higher acceptance rates.
Supposing we have a high-dimensional problem with many such σ's, it becomes a pain to manually set their values to achieve good acceptance rates. I present a method (there are plenty) to automatically change the σ's to their optimal values.
Here is the function used to adjust the σ's:
In R code we have:
k = c((1-1/k)/(cosh(target)-1), (k-1)/(cosh(target-1)-1))
1+sign(accept-target)*(cosh(accept-target)-1)*
k[(sign(accept-target)+1)/2+1]
}
The function expects accept to be in [0, 1], target to be in (0, 1), and k > 1. It looks like this:
It is a combination of hyperbolic cosine functions. They have a flat region and increase exponentially. The flat region (at 0.25 in the picture) is meant to be the optimal acceptance rate.
16 May 2014 edit: Here is a similar function that will result in steeper drops for when the acceptance rate is less than the target. Also, it is symmetric.
During the burn-in period, calculate the acceptance rate A for each dimension within some window (say the latest 100 iterations of the algorithm). Then for each σ, let σnew = σcurrent * f(A). This will increase σ when a high amount of acceptances are observed, hence reducing future acceptance rates. The reverse applies to low acceptances.
The parameter k controls the maximum change made to the candidate distribution standard deviation. In the function I provided, the highest that σ can increase by is a factor of k, and the smallest that σ can decrease by is a factor of 1/k.
Using this method on a Bayesian model with 2,500 parameters I was able to get all the parameters to have an acceptance rate well within [0.15, 0.40] with a burn-in of 2000 iterations. The figure shows the acceptance rate of 5000 post-burn-in draws.
No comments:
Post a Comment