FLUSHALL on Spinnaker's Redis Cache



Should I do a Redis flushall on my Redis database?


Before we go into the details, a good rule of thumb is to never use flushall with Spinnaker. Now let me explain why, and what issues we have seen.

First it’s important to know how Redis plays a role in Spinnaker. Redis has three primary functions.
As a cache for Igor
As a cache for Clouddriver.
As a queue for Orcha

More Details:

  1. Igor provides a single point of integration with Jenkins, Travis and Git repositories ( BitBucket, Stash, and Github ) within Spinnaker.

    Igor keeps track of the credentials for multiple Jenkins and/or Travis hosts and sends events to echo whenever build information has changed.

  2. Orcha is the orchestration engine for Spinnaker. It is responsible for taking a pipeline or task definition and managing the stages and tasks, coordinating the other Spinnaker services.

    Orca pipelines are composed of stages which in turn are composed of tasks. The tasks of a stage share a common context and can publish to a global context shared across the entire pipeline allowing multiple stages to co-ordinate. For example a bake stage publishes details of the image it creates which is then used by a deploy stage.

    Orca persists a running execution to Redis.

  3. Many of Spinnakers micros services poll the provider system quite frequently, in order to be less taxing, Clouddriver and Igor will poll the provider for necessary information roughly every 30 seconds and store that information on Redis. This helps reduce the impact Spinnaker has on the provider system.

Example Of Why Not To Flushall

We had an interaction with a user using Igor 1.87.4. Their Spinnaker system was not running at the speed it needed to after a rollback, and the user decided to run a flushall on Redis Igor store. Despite having the disable concurrent pipeline execution checked, this caused Igor’s cache to be completely cleared, this is when the party really started.

disable concurrent pipeline execution

With Igor’s cache cleared, it reached out and got all the Jenkins jobs, which in turn caused Echo, to blast out a notification to Spinnaker to kick off all corresponding pipelines. In this case it was 500 different executions of pipelines. Due to a little luck we were able to stop nearly all of them before any major issues occurred.



The latest tutorials sent straight to your inbox.