One corrupted leveldb file may fail all the coverage query

Description

1. One of our old 0.7 cluster, I recently noticed that VM list always return 500 (wiggle timeout), then I found some sniffle nodes keep complaining about no_proc of some hash key which should be caused by failing to open some leveldb files.

2. After eleveldb:repair those db files, VM list works for the most cases.

So I am thinking, if what I said above is true, the high-availability of the cluster could be improved somehow by working against it.

Environment

None

Status

Assignee

Heinz N. Gies

Reporter

刘振

Labels

None

Components

Affects versions

Priority

Low
Configure