Data migration is the process of transferring data from one application or format to another. It is often required with the implementation of a new application, which may require data to be moved from an incompatible proprietary data format to a format that is futureproof and can be integrated with new applications.
Considerations and activities
Understand your data and its quality
A data catalogue, an information asset register, a recent data holdings audit or an information review, are all sources for you when you need to migrate. These sources may indicate that data from the old application is low quality or not compatible with the new application. These sources may indicate that data from the old application is low quality or not compatible with the new application. If you don’t have access to current sources such as a catalogue or register, take advantage of automated indexing and discovery tools to speed up a data holdings audit or information review.
TIP: it is critical to understand where the master data or authoritative source is held prior to the data migration to ensure the latest version of the data is being migrated.
Data quality assessment can be undertaken to understand the quality of data and determine the rules and actions required.
Data profiling can help you determine the quality of data, including: relevance, format, consistency, validity, complexity, completeness, accuracy, accessibility, compliance and structure of the data. Automated data profiling tools can be used to streamline this process, especially when there is a large quantity of data to profile.
Identify stakeholders
Prior to starting the transfer of data it is important to ensure the necessary planning, profiling, and migration plans have been created and approved by the business. Data migration can only be a success if business is engaged throughout the migration, and requires business stakeholders ranging from the senior leadership team to data analysts.
You can start by identifying team members that need to be involved in the migration, testing, auditing, review and sign-off stages.
TIP: ensure that end users are also involved in business rule validation and testing throughout the migration process.
Data extraction
This stage involves copying or moving the data from legacy stores to a secure location to have the data prepared for migration. During this process data aggregation may be required to bring together several datasets to make the data more meaningful and fit for purpose for the new application. This may involve the creation of a new combined master dataset which maps out the data linkages between the different datasets to be aggregated.
TIP: ensure the necessary backups have been put in place in the event of data corruption.
Data remediation – improving the quality of data
Data migration processes can provide an opportunity to improve the quality of the data. Once the quality of the data has been measured, certain actions can be applied to remediate data. Data remediation can occur before, during or after the transfer of data to the new application. ETL tools are often used to categorise and improve data quality.
An example of data quality actions before, during or after a data transfer:
Data quality action | Description |
---|---|
No Action | Data issues are small and not meaningful and will not cause a problem post migration. |
Remediate during the Extract Transform Load (ETL) processes | Data issues should be remediated during the ETL transformation of data from the legacy to the new application. |
Remediate in source databases | Data issues should be remediated in the source database. |
Remediate in target application | Data issues should be remediated once the data has been loaded into the application. This may cause some complications as the data may not pass target validation or have errors using the new business rules. |
Data migration techniques
A wide range of techniques can be used to perform a data migration. The level of automation that is possible during a data migration will depend on the maturity of your data. For example it may be necessary to first scan physical documents for capture before Optical Character Recognition (OCR) software can be used to convert scanned copies to digital structured data.
Once data is in a digital format potential data migration technologies could involve:
- Optical Character Recognition (OCR) software for character recognition and digitisation
- Extract Transform Load (ETL) software for data transformation such as format conversions
- OGR2OGR software for spatial data migrations
- Machine Learning (ML) software for automated mapping of data structures for migration
Data can be transferred in one go, or in stages to ensure quality is maintained and can be completed in smaller agile sprints.
Data integrity: validation, auditing and verification
Testing and quality check points should be occurring before, during and after the data migration – with all results recorded for auditing. Unit, application, system and volume tests should be undertaken as early as possible in the migration to ensure there is time to update any code or business rules.
TIP: all new processes and data transformations should be documented to ensure there is traceability and auditability of the data. The migrated data needs to demonstrate the data is still authentic so it can be relied on as evidence.
Automated data validation will help you check the volume, format and quality of the data being migrated to ensure no data is lost and the data is fit for purpose. Ensure any subsequent metadata, lineage and data quality statements are updated with the outcome, but is only suitable for small volumes of data.
To verify the integrity of data, use a checksum. Ensure checksums generated from the original file location are later verified against checksums generated from the final data storage or transfer solution. This will tell you if the data has been modified or corrupted during migration. Checksums generated after data migration to permanent storage should be regularly monitored against existing file data to verify file fixity.
TIP: consider running legacy and new systems concurrently until you are confident with the new process, data and backup capability. When stakeholders are confident with the new application the legacy system can be decommissioned.