Adaptation Schemes for IMA
Filed under: Progress | Tags: abcima, challenges, mcmc, metropolis hastings, monte carlo, next steps, progress, Software, spect, statistics |
I have been reading about adaptation schemes since my origional scheme has some major problems (as has been discussed before) and I am trying to be somewhat less naive in my approach. Interestingly enough, my big issue at the moment is trying to come up with an intelligent way to do something called “diminishing adaptation” whereby as time goes on the adaptations are less and less severe.
One simple approach which I have a particular problem with based on my own experience is simply taking the covariance matrix of the data up to this point. Well, as I have already seen, it is possible to have degeneracy happen in practice even though there is a theoretical result out there in preprint on Cambridge’s MCMC preprint server which discuses the fact that the eigenvalues of the covariance matrix will not become degenerate under certain conditions. I am studying this to see how I may leverage this result and to see why I may have been having the problem before. Yet as I have seen, chains may may fail to mix in the initial 50 points which would then (if I immediately adapted based on those 50 points) cause my distribution to collapse.
I am considering two possible schema for doing adaptation of the variance-covariance matrix. In either case there is a desire to enable targeted mixing rates, to which end I am keeping track of what the candidate distributions have been and the associated mixing rates in an attempt to use past information to calibrate choices rather than take whatever is the latest set of values from the chain. So my ideas are as follows:
1) use a multiple of the old covariance matrix based on the new mixing rate where the multiplier monotonically approaches one from either above or below for the cases of too high and too low a mixing rate respectively
2) Use the mean (arithmetic? geometric? hyper-geometric?) of the eigenvalues of the old and new covariance matrices.
In either case I am unsure as to how to adapt the eigenvectors as they play a substantial role making the candidate distribution match up with the posterior (exactly why we would expect better mixing with adaptation!) I am thinking that I should merely take the most recent set and let things do at that, although it would perhaps be better to use some sort of average set of eigenvectors too.