Update a heavy traffic system with minimum or zero downtime
How to update a live heavy traffic system with minimum or zero downtime is always an important thought in following I have discussed a few procedures that can be used to update a live site with minimum or not downtime
- If the services are exposed through a load balancer in front for example Nginx then you can take down one node at a time from the cluster and update the service the load balancer will send traffic to the next available node, repeat this process till all the nodes in cluster are upgraded
- If the application structure is such that the database is shared and the upgrade is breaking like database table column drop etc then the new features need to be implemented in backward compatible way so the time window in which all the nodes are upgraded the nodes running old code should not fail. This might require
- Implement feature toggles at application level
- Have code versions
- Might require some data migration after upgrade
- The above point can increase the technical debt of the application overtime
- Consider using tools like liquibase to version database
- Blue green deployment approach where you have production nodes say blue colour and production like new upgraded nodes say green colour and you point the load balancers for partial traffic to green nodes till the point you are comfortable with the upgrade and move all traffic to new nodes overtime.
Modern architectures that are microservices based and having an API gateway in front are easier to upgrade using above techniques as the changesets are smaller and can ensure zero or minimum down time.