What is a Staging Data Area? According to Wikipedia
A staging area, or landing zone, is an intermediate storage area used for data processing during the extract, transform and load (ETL) process. The data staging area sits between the data source(s) and the data target(s), which are often data warehouses, data marts or other data repositories.
Data staging areas are often transient in nature, with their contents being erased prior to running an ETL process or immediately following successful completion of an ETL process. There are staging area architectures, however, which are designed to hold data for extended periods of time for archival or troubleshooting purposes.
Practically speaking this data area can be:
- A network file folder where exported data from a source file can be held
- A Microsoft(R) SQL database where queried or exported data can be sent.
- An external network share like and FTP, Secure FTP, DropBox, or OneDrive.
A few things to be mindful of when setting up this staged data area. First there should be facility to connect back to these records to update the tables with the newly created record ID’s. That is during the process of creating an Account record in SFDC one will want to write the record ID back into the Staged data. This not only indicates the successful migration of a record, as a record that is lacking an ID would not have been “moved” into SFDC, the ID also results now in the ability for children objects to be parented to the proper Account via this Record ID.
The format of these tables and the structure of this data should reflect the basic structure of the target entity relationships. That is to say that these data should contain references to their parent records. As a practical matter the structure of this staging data should be built to replicate each object and field in SFDC. One trick Scribe users can do is to use Scribe Replication services. This is because Scribe Online Replication Services (RS) creates tables in Microsoft SQL or MySQL that mirror the structure and customization of each object.
You may be asking, “Why go through all of this work to manage a 1 time process?” The answer is rooted in the common experience of managing many migrations. That is to say. Most migration projects go through many cycles.
- Developing map, and formulas to migrate each record.
- Unit testing to make sure that all of the fields, relationships, and attachments are done correctly.
- First pass migration. Testing to estimate the total amount of time it is going to take to move all the data over. Depending on the amount of records this may take a few hours, a few days, or several weeks.
- User Acceptance testing to have end users review the migrated data and have it compared to the source applications
- Refresh the staged data. During the development, and testing the core data has changed because users continue to use the source system. So the staging data needs to be updated to reflect the new data.
- Second Pass. In a perfect world this would be the final pass and the data will all tie out properly and the users will be happy with the results. If all is good, your project will be done.
- User Acceptance. This is the final stage of the project is the mappings are correct.
I should mention that in some projects end user SFDC training is taking place as part of this migration. Often migrations happen department by department, or division by division. This presents a unique challenge for migrations because users are still creating new data in the source while we are migrating some data. Which makes a migration project a short term integration project. Scribe is a great platform to deal with complex project issues like this as we are able to manage not only inserting new records, but update the migrated records with new, and changed data records.