Data Discovery, Analysis and Interpretation – this involves collating, classifying, analyzing and interpreting the data aggregated from a multitude of sources within an organization. We perform trend analysis and present our findings that management can make informed decisions in current and future processes.
Data Modeling and Visualization – our consultants work closely with an organization’s subject matter experts to design dimensional models and visualization solutions that support the organization’s requirements and goals. Rather than designing everything up front, we create designs that are extensible incrementally, enabling our customers to gain access to powerful new visualizations much sooner.
Data Warehousing – our data warehouse experts help an enterprise better manage and leverage its data through integrating multiple sources of data. We provide solutions that are powerful and scalable that will meet the volume, growth and velocity of the organization’s data. We use the Kimball-Corr agile data warehouse method that leverages agile techniques such as model-storming to quickly create a bus matrix that guides and prioritizes our efforts.
Data Cleansing and Conforming – Before data can be transformed into knowledge, i.e. visualizations, tabular reports, analytics, predictive analytics, it must be loaded and prepared via a process often called “Extract Transform Load” or ETL. We suggest a variation called “Extract Load Transform” where data is first staged (loaded) into the data lake and then transformed in multiple passes. We perform steps such as the following:
- Data profiling: checking the volume of data and how quickly it changes and what schema it conforms to.
- Cleaning and conforming: what is the granularity of the data? Do we need to adjust it in order to make certain queries possible? Are there missing records?
- Data cleansing: are there inconsistencies in the data? Does the data violate the schema in some cases?
- Generating metadata: We need to create metadata for the data source itself, and also maintain reference data such as for example aircraft type or maintenance status, so we can make it easier to query later in tools like Apache Atlas.
- Master data management: business rule enforcement, reference data management, hierarchy management, entity resolution.
- Error events: do we need to generate error log events and hold certain records in “suspense” because they do not meet the minimum quality threshold? Are errors escalated appropriately and do alerts get sent out as needed?
- Audit metadata: we need to track the load date, job name and user so we know when and how items are loaded into the data lake. This information is sent to tools such as Apache Atlas.
- De-duplication: are their duplicative records? What are the survivorship rules?
- Unique ID generation and cross referencing: we need to generate unique IDs for columns that warrant them, and create cross references.
- Lineage and dependencies: what is the origin of the data, and does it have any dependencies on other data sets?
Data Migration and Cloud Migration – our consulting team has the breadth of knowledge and industry experience to manage all aspects of migrating legacy systems to the cloud. Setting up a data solution in the cloud may be an excellent way to start an organization’s cloud migration, particularly if the organization has a lot of interconnected legacy systems that may be difficult to migrate.
Data Governance — As new more automated capabilities come online and enhance human decision-making, it becomes ever more critical that the data is at all times consistent, available, and fit for its intended purposes. Within big data environments, this entails navigating complex tradeoffs and ensuring we always have transparency and context (i.e. metadata). For example, we cannot simply enforce perfect data quality, because it could preclude us from accessing legacy data sources that contain dirty (but very valuable) data. We will apply consistent design concepts such as assigning metadata tags to flag data quality issues and accommodating conflicting or inconsistent data so it can be referenced when appropriate. We know that, while making a needed data fix directly to a source system to correct an error is preferable, it is not always possible, so the transformation scripts must be able to consistently reapply fixes, lookups, de-duplication, and other improvements during ELT.