Part of keeping your UrbanCode Deploy infrastructure healthy is to ensure that you have appropriate cleanup policies in place. One challenge here is identifying what your corporations actual retention policies are, and establishing the ultimate source of truth for deployable artifact information. A key to understanding and persisting data is to first understand what makes up your meaningful data.
One problem that I find in working across various organizations tends to come back to a simple truth. That being that no one actually knows what they “have to keep” and what the audit requirements actually are. In this case we want to identify what the dimensions of a normal audit are think, for a simplified example we in terms of the deployable artifacts, the process run, the separation of duties/ authorizations, and the variables/properties used to drive the deployment process. This is the standard what was deployed, when it was deployed, how it was deployed, and who deployed it. This article is focused on the what and how you manage the retention of deployables in relation to where they have been deployed.
In even a fairly small continuous integration and continuous deployment delivery pipeline, you could be generating hundreds of builds a day and performing dozens if not a one to one number of deployments a day as well. The number of meaningful builds tends to drop of dramatically from the available CI builds, to finally those build artifacts which deployed successfully and met some level of validation. As we continue further along the delivery pipeline towards production, the subset of what deploy artifacts is meaningful becomes even smaller.
UrbanCode Deploy understands this concept and provides you the mechanism to configure these settings in a meaningful way. This configuration is available at 3 levels the System, each Component, and on each target Environment. The System is not combined with the Component and Environment, but provides the default value for Components where no value is set. The least restrictive wins so it is important that you do not set the System and Component level defaults too high here, and focus on a strategy that will allow you to keep the number of artifacts your must retain from exploding exponentially as more and more teams come on board. This means that you should really only be focusing on understanding your retention policies for your production environments, anything more than that and it is worth understanding what the actual requirements are to see why this is the case.
The key concepts here are that UrbanCode Deploy is capturing information about each version of each component as well as capturing the information about each version that has been deployed to an application environment. These two pieces of information form the basis of how we can analyze what is eligible for archival/deletion.
Archival of Component Versions
At the system level we specify an archive path, which is where artifacts that have moved to before being deleted from the system. The default behavior is to archive component versions, when there is an archive path defined that versions artifact will first be archived then deleted from the file system and meta-data from the database. Note that this is a one-way archive, so while this will store a copy of the artifacts they will no longer accessible or restorable through the tool ( that is without a full system restore to a date prior to archival). The archive process consists of the creation of a simple zip file
<component_name>_<version_name>.zip of all the files that make up the component version and saving this to the archive path.
This can provide you with an opportunity to provide a long term storage option for artifacts that you are not comfortable deleting. One additional consideration here is that you can and should put the archive path on another storage device otherwise you will never see any space returned to the system even when components are archived and deleted.
The high-level concepts are designed to be simple enough to avoid a lot of complexity but powerful enough to meet enterprise customer needs. In any event, it is helpful to take a look through the process flow to get a complete understanding of what is going on here. A key to keeping your UrbanCode Deploy servers healthy is to understand how to cleanup older, unwanted, and data you no longer need. One aspect of this is to ensure that you are not keeping too many versions. It is easy to see how a few mis-configurations in this space can lead to a growth problem of the storage required to maintain your assets. But at the same time it is pretty easy to inspect your system for these values and correct them without too much fuss and reclaim lots of disk space.
There are three levels to consider: (each deployed to, least aggressive wins)
- System ( Sets Component defaults and Archive Path)
- Component ( Used in place of System defaults if set )
- Environment ( Compared with Component level settings highest value is used, aka least aggressive )
The interesting part of this discussion here is how we identify whether an artifact is eligible for deletion. As mentioned above their are 3 different scopes to keep in mind here, and for simplicity we will talk about the process happening at a single version or a component at a time.
Pseudo code for the process:
# The system provides the default component values num_days = system.getCleanupNumDays() num_versions = system.getCleanupNumVersions() for comp in system.getAllComponents(): # Use component values if they are set if comp.num_days != 0 || comp.num_versions != 0: num_days = comp.num_days num_versions = comp.num_versions # Check if we should cleanup anything at the component level if num_days == -1 || num_version == -1: # cleanup process over for this component, config says to keep all versions continue ## Now iterate over each component version foreach version in current_component: should_cleanup = true if version.isCurrentlyDeployed() # active versions are never cleaned up, so we are done with this version should_cleanup = false continue foreach env version.hasBeenDeployedTo(): if env.num_days == -1 || env.num_version = -1: # Keep all, break out of cleanup for this version, at least one env says keep all should_cleanup = false continue # Keep the larger value num_days = max( env.num_days, num_days ) num_versions = max( env.num_versions, num_versions ) # At each environment if num_days < abs(version.age - env.request_age): should_cleanup = false continue # Perform the cleanup Check # both num_versions and num_days are checked and must be met # Checks to perform # is it currently deployed # is it old enough to clean up # are there more than num_versions new versions in env and comp if should_cleanup && num_versions < env.numVersionsSince( version ) && num_versions < component.numVersionsSince( version ) && num_days < version.age: # If archive path is set, archive the component version first if system.isArchivePathSet(): system.arrchive(version) # delete it from the system system.delete(version)
Example Retention Policy
A typical retention policy may look like this:
- System level - Keep 5 versions, 5 day retention
- Component level - Keep 5 versions, 7 day retention
- Environment Level
- DEV - Keep 2 version, 4 days
- SIT - Keep 2 versions, 7 days
- UAT - Keep 2 versions, 14 days
- PROD - Keep 10 versions, 360 days
Now this may look aggressive, but take into account the length of you development cycles could be taken into account to make the retention days longer, but given that you will have current version and last version for each environment, and the 4 latest versions of the component this should be a sufficient starting point. If you find that this is too aggressive you can always scale it back on a case by case basis, but starting this way can help to drive the avoid hoarding of items that no one ever looks at again.
Key concepts here:
- A version that is currently deployed is
Activeand will not be removed
- Components hold an internal index, incremented per new version since the version name is user specified it can’t always be easily derived
- Component versions and Environment Process Requests are used to track ‘age’
- In the case of an artifact deployed to multiple environments, the highest retention policy will win
- Using the Archive Path can provide you a long term storage option if you would like to have a safety net
Designing a solution for your organization will require understanding both what your actual audit and business requirements mandate as well as looking at your enterprise asset management strategy. UrbanCode Deploy’s CodeStation solution is built to provide a smart staging area for artifact storage, replication, and governance in conjunction with your automated release process. In that regard it is not optimized to act as your ultimate Definite Software Library/Definitive Media Library in the ITIL sense or provide a permanent source of truth for this information through the organization. Understanding that and using CodeStation accordingly will help guide your decisions and improve the quality of your deployments as they grow in size, scale, and complexity.