<p>At Subskribes, our velocity of feature development is high. Our development philosophy follows the model of Continuous Integration and Deployment (CI/CD). Ideally, when releasing a feature to a specific stage of our environment, we want to isolate that release from the release of any other feature. This enables us to make functionality available only when it has cleared our quality bar. To facilitate this isolation, we recently introduced Feature Flags into our environment, leveraging the <a href="https://docs.aws.amazon.com/appconfig/latest/userguide/what-is-appconfig.html" rel="nofollow">AppConfig service from AWS</a>.</p>
<p>The concept of Feature Flags has been extensively covered elsewhere, so we will not go into depth here. <a href="https://martinfowler.com/articles/feature-toggles.html" rel="nofollow">This article</a> is a good starting point. Note that it covers use cases more complex than what we are targeting at Subskribe. At a high level, Feature Flags provide the ability to dynamically enable and disable application functionality at runtime. This allows code to be deployed as part of our continuous release methodology and then toggled on or off in different environments.</p>
Design
AWS AppConfig
A few options were considered, including other commercial solutions as well as rolling our own from scratch. Ultimately we decided to use AWS AppConfig. There were a number of reasons:
Frontend and Backend Support
The Subskribe application uses Node.js, Typescript, React, and GraphQL for its frontend and Java for its backend. Our approach therefore needs to support the ability to query whether a flag is enabled from both Typescript hosted in Node.js, as well as Java code running in the JVM.
While AWS AppConfig exposes an API that can be called from Node.js and Typescript, we want our frontend and backend to have a consistent view of which features are enabled. Because of that, we decided to expose a GraphQL query from our Java backend which returns the flag values stored there. The backend handles all queries to the AppConfig API.
Data Fetching and Caching
Feature Flag settings are stored both locally in configuration files deployed with the Subskribe application as well as remotely in the AWS AppConfig service. This allows developers to change settings locally without needing to depend on AWS. That said, if a feature has a setting both within a local configuration file as well as in AWS AppConfig, the AWS setting takes precedence (whether enabled or disabled). This enables us to toggle a feature on or off irrespective of what configuration settings happen to get deployed with the application.
Unlike at application start, once the application is up and running, if a call to AWS to fetch the Feature Flag configuration fails (after some retries) we simply log an error and return the old, cached value. Subsequent queries of the feature flag values from Java or UI code will trigger new attempts to fetch the configuration from AWS.
Why do we require the flag values stored in AWS to be successfully read by the application on startup and not fallback to the config files? This is because we view AWS AppConfig as the source of truth for this data. If we relied on the local config file values in the face of AWS download failures, we would either need to ensure the local config values kept getting updated (which would defeat the purpose of dynamic flag settings) or we would have to live with an inconsistent set of flags whenever we had a hiccup in contacting AWS on application startup.
API Design
While the AppConfig library has an easy-to-use API, we wanted something simpler for our backend and frontend developers to query. As such, we built a very simple Features class which provides an interface to query whether a specific feature is enabled, abstracting away calls to AWS as well as lookups to any internal configuration.
As an implementation detail, we created a wrapper class around AWS’s AppConfigDataClient that abstracts away the loading, caching, and fetching that was described above.
To query from application Java code:
As noted above, we created a GraphQL query so these values can also be retrieved by our UI. The query has a simple definition which can be called using your favorite GraphQL client library:
which returns a boolean value.
The block diagram below provides an overview of the architecture we use in the Subskribe application.
Deploying Feature Flag Updates
We ended up building out simple yaml files, the format of which looks like:
Those files are stored in github. We have a separate deployment pipeline which listens to github for updates to those yaml files and when a change is committed it makes AWS API calls (via the AWS cli), pushing the changes to the appropriate environment in AWS.
The following diagram illustrates our different deployment flows.
While our code deployment pipeline can take minutes or hours to work through (depending on various factors), our Feature Flag deployments complete within a few seconds.
Conclusion
We have been working with Feature Flags for a while. They have been successful in reducing the number of deployment issues we are seeing related to pushing functionality. When we have found an issue with a feature in one of our deployment environments, we have been able to quickly disable the offending functionality without needing to rollback or redeploy our code.