Failure Tolerance with the Rafting Algorithm (Kubernetes)

link to visualizer link to writeup on algorithm Play with the algorithm (break stuff!) as implemented in etcd Why? Failure resiliency and “uptime”are two hallmarks of good sysops. In addition, much of operations level goals can be boiled down to: Raising the mean (average) time between failures (MTBF) Lowering the mean (average) time to repair … Continue reading Failure Tolerance with the Rafting Algorithm (Kubernetes)