OpenEBS

OpenEBS is a storage platform that provides persistent and containerized block storage for DevOps and container environments. www.openebs.io

Follow publication

HA vs. DR and “extra” HA for your DB

--

Principles to prevent cascading failures

In the vein of earlier blogs, this is intended to share observations built from a couple of decades of helping enterprises build resilient systems and filtered through a lot of listening to Kubernetes and StackStorm users over the last 4–5 years in particular. As such — mileage may vary; I’m learning here and offer this as a way for us all to learn together so feedback is not just welcome, it is loved and needed.

Here is what I have seen too often, and it is the basis of all the principles I share below. For all of us, as we build these systems of systems with many more dependencies and more change and dynamism than any one human could possible fully understand — we want to make sure that whatever we do we don’t spawn opaque cascading failures.

In short, Don’t Injure Yourself. DIY.

For example, don’t have your automation so intelligent that it knows to kill nodes that are not responding without also looking at why that node might be moving slowly. Maybe — true story — your load has peaked the day after Thanksgiving and by pulling the slow nodes out of the queue you are simply shortening the time before all the other nodes get overwhelmed. These are the things brownouts are made of.

So how can you avoid being thrown off the end of your own automation treadmill?

A few hard-learned principles that I draw upon below:

  1. Shift down — tackle failure as close as possible to the failure to limit the risk of injuring yourself with cascading failures
  2. Build every layer — every system — so that it is built to fail
  3. Build every layer — don’t think you have DR when you have HA; don’t think you have HA just because you have one workload that spans clusters
  4. Related to 3 — infrastructure as code. Always. No black boxes. The desired state is in the repo. Yes, the control loop at the center of Kubernetes will make…

Continue Reading the article in MayaData’s Blog

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Published in OpenEBS

OpenEBS is a storage platform that provides persistent and containerized block storage for DevOps and container environments. www.openebs.io

Written by Evan Powell

Founding CEO of a few companies including StackStorm (BRCD) and Nexenta — and more recently DeepTempo which build LogLMs for cyber security collective defense.

No responses yet

Write a response