Apache ZooKeeper Configuration Management
One of the steps towards building a successful distributed software system is establishing effective configuration management. It is a complex engineering process responsible for planning, identifying, tracking and verifying changes in the software and its configuration as well as maintaining configuration integrity throughout the life cycle of the system.
Let's consider how to store and manage configuration settings of the entire system and its components using Apache ZooKeeper, a high-performance coordination service for distributed applications.
Configuration of Distributed System
In this article we will describe a concept of configuration settings management for the following types of distributed systems or their combination:
- A server application in a cluster (several instances of the same application are deployed to a clustered environment for load-balancing and/or high-availability support);
- A set of services with various functionality, which are communicating with each other via common protocol and forms custom software platform.
Generally, configuration items could be arranged by scope in the following groups:
- Global: which are the same for entire system in any sub-configuration (system name, company website url, etc.)
- Environment-specific: which may differ between environments: development, test, production (security settings, database sever urls, backup settings etc.)
- Service-specific: which holds settings that are related to functionality of the service (database constants, timeouts, links to external resources, etc.)
- Instance-specific: which is usually responsible for identification of specific instance in a cluster (host, role in the ensemble, recovery options and so on)
However, items of the classification described above don't have clear boundaries. It depends on system architecture, size and complexity.
The default way to manage settings is to use configuration files that usually have some common and individual sections. Hence, with a growth of the system scale and complexity, volume of the unique configuration data increases. At the same time, common configuration entries are being copied between different components and the risk of their inconsistency across the system grows. Moreover, situation is often aggravated by the presence of several platform development environments (development, test, production, etc.), which require own runtime configuration for both system-wide and service-specific settings.
Continuously increasing volume and variability of configuration data in the form of configuration files makes the task of ensuring its integrity, scalability and security quite complex and resource-consuming. In this article, I'm going to show how to use Apache ZooKeeper to design centralized configuration storage for distributed systems as an alternative to file-based solutions.


