Some years ago I was a technical lead for a large migration project (around one billion pieces of data). I’ve previously described the transformation structure and would like to share some further advice: practice, practice, practice! If, like my migration projects, there are a lot of complexities like in-flight direct debits and batch timing issues experience has shown that practice really pays off. What do I mean by practice? Once you have reached a point development of transformations and reconciliations is complete then the whole migration should be run against an accurate target (copies of live systems) and to the intended timing of the live migration (weekends, evening, whatever your choice). Make it as close as you can to the live environment (without actually issuing live transactions, of course… another topic) and I can more-or-less guarantee you will find some issues, but that’s the point: you don’t want any on the actual live run. How many practices will it take? I would suggest two to three but if you are migrating in stages you’ll get better at the practicing (meta practice?) so maybe just one will be enough. Good luck with your migration and simplify that landscape!
Category Archives: Migrations
Migration: Beyond ETL
When approaching a migration it is tempting to try and do the simplest thing that could possibly work, most likely a re-key or one-off ETL , because any code or process created for the migration is probably going to be discarded once complete. Assuming a code solution is proposed then it’s probably going to look like this:
So who is going to do the work? If it’s a single team then maybe this approach will work but if there are two teams: one focussed on the source system and one focussed on the target system then the question is who works on what, most likely the answer is the sending system team gets to do the extract and the target system team does the load but who does the transform?
Working on the transform is probably going to take both teams. This has some problems:
- both teams need to understand each-others data-model which, of course, can be very complex especially if one party forgot the most important step of normalisation.
- codes need to be mapped and this is often not a one-to-one relationship
- the target system team need to understand where data might be missing in the source system
- the target system team need to understand what to do about data quality problems in the source system
- reconciling the migration will be difficult if both data models are complex.
The answer: introduce an intermediate data model
The main design consideration of the intermediate data model is that it should be clear and simple to understand, there should be no ambiguity; it has the following features:
- The structure is as simple as possible: code tables are denormalised and merged back into the major entities; user management data can be removed along with system log or other management tables. Both teams are likely to need to come together to agree the structure because this is the common representation of the data from both systems.
- The structure does not need to match either the source or target system (if it does maybe you don’t need this step).
- All codes are turned into a description, e.g. replace a status code of “1” with ”Active”
- Column names for tables should be as descriptive as possible.
- Optimisation for performance involving changes to structure should only be considered if absolutely essential.
- Missing data should be explicitly marked as such.
- Data quality issues should be resolved by the source system team as part of their ETL.
The advantages of using the intermediate data model are:
- Each team can focus on their area of expertise
- There is less scope for ambiguity leading to mistakes and subsequent rework
- Reconciliation can be split into two, simpler, reconciliations (source to intermediate and intermediate to target)
- The target system team are not bound to the source system team making data available, they should understand the intermediate model well enough to generate test data
- The intermediate data model serves as an archive for subsequent audit or enquiries needing to understand how data was created in the target system
It is especially worth considering using an intermediate data model if the migration is split into phases, or there will be multiple source systems over time, as it can be extended and modified to represent any unique requirements at each phase, or source system, rather than having to understand all of these complexities at one time.
These advantages are also applicable to integrations that follow the ETL model.
Migrations: when not to
The first question to ask about a migration project is do you really want to? Maybe this is a strange question but I don’t think due consideration is given to the benefits of keeping two (or maybe more) solutions with duplicate capabilities.
Migrations broadly come from four sources:
(1) Acquisition: in general the goal of the migration is to move onto systems of either the acquiring or acquired organisation to realise cost-savings
(2) Divesture: sometimes an organisation sells a part of its offering; in this case it is only interested in the migration to remove data and systems it no longer requires
(3) Internal rationalisation of systems with the same motive as (1) but different politics
(4) Moving one system to a new, replacement, system
I don’t address (2) here as there is not an option for keeping more than one system but before undergoing the pain, and expense, of a migration for (1), (3) or (4) consider the benefits of having two:
- I have written about the size of effective groups when it comes to communication. Unless the result of the migration will be a significant reduction in the developers, ops, users, managers, customers, etc. then it becomes harder for the resulting ecosystem to be agile. Keeping two systems, and their surrounding ecosystems could give an organisation two agile components as opposed to a single rather more stodgy one.
- It’s easy to sell one part of the organisation if it is neatly packaged-up around its own system(s)
- The focus of each system can be different, and make use of different skills. For example one system could focus on retaining existing business whilst another can be focussed on acquiring new business.
- Some friendly rivalry between teams can help drive innovation
- Being aware there is another team that could take over the work of a team might help control pay demands but be careful teams don’t become over protective and put-up barriers (e.g. by hiding knowledge)
Also consider that a migration can be a major distraction and there may be other opportunities lost; if the business case is marginal then look for other opportunities before committing resources to a migration.