1. One of our old 0.7 cluster, I recently noticed that VM list always return 500 (wiggle timeout), then I found some sniffle nodes keep complaining about no_proc of some hash key which should be caused by failing to open some leveldb files.
2. After eleveldb:repair those db files, VM list works for the most cases.
So I am thinking, if what I said above is true, the high-availability of the cluster could be improved somehow by working against it.