Based on our experience, data poses one of the most significant obstacles to the success of a technology project. The importance of data activities such as mapping, extraction, transformation, and loading is often grossly underestimated. We recommend the following:

  • Gather Requirements: It is essential to identify all data elements as well as their interrelationships early in the project lifecycle.
  • Map Data Elements: In our opinion, the importance of data mapping cannot be overstressed. An accurate and thorough data mapping document provides a foundation for all downstream data activities. Also, the mapping document becomes an invaluable project artifact for GUI development, interface development, reporting development, and testing.
  • Identify Data Sources: Data sources often impose constraints on resources and timelines as other organizations are brought into the sphere of the project. Identifying these data sources early in the process provides greater lead-time for extracting, transforming, and loading the data.
  • Extract Data: Data extraction involves designing data formats, determining transport protocols, identifying staging areas, and creating record validation routines.
  • Transform Data: Transformation can often be the most complex data activity. While we feel this complexity can be lessened by an accurate data mapping document, we agree that this activity presents unique challenges. Transformation involves the following: de-duplication, formatting, cleansing, truncating, data type alteration, etc. These activities are frequently accomplished through a combination of programmatic solutions and manual efforts.
  • Load Data: Data loading involves creating data validation routines, designing exception reports, identifying staging areas, and tuning load performance characteristics. Performance tuning is primarily the focus in this data activity.

Below are a breakdown of the above activities and their relative level of effort.

Activity Level of Effort
1. Requirements Gathering 15% of the overall data effort
2. Data Mapping 30% of the overall data effort
3. Data Sources 10% of the overall data effort
4. Extraction 10% of the overall data effort
5. Transformation 25% of the overall data effort
6. Loading 10% of the overall data effort

In summary, we feel that an understanding of the importance and complexity of data activities early in the project lifecycle pays huge dividends in the success and eventual acceptance of the application.