Powering data management on OSDU Data Platform

by Rakesh Sharma and Yuriy Gubanov | on

Overview

Data is an essential asset for energy companies. Having a comprehensive view of trusted data that follows consistent business rules can help businesses to make informed, accurate, and timely decisions, as well as streamline data management practices. Fundamentally, having data at the center of the open-source-driven OSDU Data Platform makes it future proof for digital transformations of the energy industry and aims to create a comprehensive, technology-agnostic data platform to stimulate innovation and reduce time to market for the new solutions. The core services of OSDU Data Platform lay the groundwork for helping energy industry data managers, but it still needs comprehensive tools for the overall data trustworthiness and consistency. QuestLabs Alloy, a data-quality-as-a-service (DQaS) solution built on Amazon Web Services (Amazon Web Services), works together with the OSDU Data Platform and makes it possible for data managers and operational staff alike to view and manage OSDU Data Platform data, as well as facilitate its compliance with organizations’ business rules in an intuitive web-based user interface. The solution helps companies to efficiently build and manage trusted data in their OSDU Data Platform implementation, relying on open-source APIs. This approach demonstrates the flexibility of integrating data management tools with OSDU and proves the overall value of the standards-based data platform when it comes to customizing data foundation to a company’s needs, extensions, and being able to incorporate new domains for new energy streams.

OSDU Data Platform and data management

OSDU Data Platform is delivering an open-source, standards-based, technology-agnostic data platform for the energy industry. The goal is to liberate the data from the applications and focus on the business value and innovation rather than on the common data interoperability and basic data management challenges. The overall conceptual architecture of OSDU Data Platform implementation on Amazon Web Services is shown in figure 1.

Figure 1 Conceptual architecture of OSDU Data Platform implementation on AWS

Figure 1. Conceptual architecture of OSDU Data Platform implementation on Amazon Web Services

Data can be ingested into OSDU using the core services APIs, and it will be catalogued and made available for the consumption by various applications and artificial intelligence (AI) or machine learning (ML) solutions. This greatly simplifies the ability to find and consume data in an organization, but how can one be sure that the data has been validated according to an organization’s business rules? A data management and data quality solution for OSDU data is needed.

QuestLabs Alloy data-quality-as-a-service solution

QuestLabs Alloy DQaS solution, built on the Amazon Web Services Cloud, allows companies to easily configure and validate the OSDU data for trustworthiness and consistency according to an organization’s business rules. The solution can be installed and managed through a web-based user interface and configured to connect to OSDU Data Platform through common OSDU APIs. The architecture of integrating QuestLabs Alloy application with OSDU Data Platform on Amazon Web Services is shown in figure 2.

Figure 2 QuestLabs Alloy integration with OSDU Data Platform on AWS

Figure 2. QuestLabs Alloy integration with OSDU Data Platform on Amazon Web Services

Alloy is deployed on a secure and reliable Amazon Web Services infrastructure. It uses Amazon Elastic Kubernetes Service (Amazon EKS), a managed Kubernetes service, to run Kubernetes in the Amazon Web Services Cloud. Amazon EKS automatically manages the availability and scalability of the application depending on the user or data demands. The authentication is done through Amazon Cognito , a customer identity and access management solution. OSDU Data Platform, also deployed on Amazon Web Services but in a separate Amazon Web Services account, takes full advantage of multiple Amazon Web Services services so that it can scale and accommodate nearly any type and amount of data. Amazon Simple Storage Service (Amazon S3), an object storage service, takes care of storing seismic, well, and other subsurface data types. Amazon DynamoDB — a fully managed, serverless, key-value NoSQL database — is used to store schema definitions and the metadata. Depending on the usage of the OSDU Data Platform, the overall solution can scale on demand.

Building trusted data in the OSDU Data Platform

Alloy relies on OSDU Core Services APIs — such as search, entitlements, schema, legal, and data services — to scan the OSDU data and evaluate it against the data quality rules. The data can be examined pre- or postingestion to the OSDU Data Platform. The application comes with built-in rules that are derived from the Professional Petroleum Data Management Association (PPDM) but adapted to OSDU. Using Alloy’s OSDU Data browser, shown in figure 3, businesses can view completeness, integrity, validity, and accuracy of the OSDU data. It also allows examination if the data quality improves over time.

Figure 3 QuestLabs Alloy OSDU Data browser showing OSDU data scan

Figure 3. QuestLabs Alloy OSDU Data browser showing OSDU data scan

To build and maintain trusted data in the OSDU Data Platform, Alloy has rich sets of APIs and an intuitive user interface to perform quality checks on the data in pre- and postingest stages. In the preingest stage, before ingesting any data to OSDU Data Platform, Alloy APIs can be used as part of the integration to avoid or reduce erroneous data. With multiple data publishers for OSDU, such as domain and data science applications; developer portals; command-line interface (CLI) tools; extract, transform, load (ETL) pipelines; and other mechanisms, Alloy APIs help evaluate the quality of data and, eventually, fit-for-purpose data can be ingested into OSDU. Also, preingest checks further reduce various enrichment steps to clean, correct, and make the data consumption ready upon the completion of the ingestion.

Figure 4. QuestLabs Alloy preingest data workflow with OSDU Data Platform

Figure 4. QuestLabs Alloy preingest data workflow with OSDU Data Platform

In the postingest stage, Alloy has the capability to run the quality scan of the data that has already been populated into the OSDU Data Platform. This helps data governance teams to check the health of OSDU data. As a follow-up to this evaluation, the strategies can be planned to fix and enrich the data per the organization’s business needs. Data managers can scan the complete dataset defined by OSDU kind definitions or by selected records initiated from the Alloy user interface.

As part of the data observability feature, Alloy also has the capability to tap into the data change events from OSDU Data Platform. In this case, the quality scan will be initiated automatically if a new record is added or an existing one is updated. Organizations’ OSDU data health can be monitored, and relevant action can be taken by governance teams to build and maintain fit-for-purpose data. This way, exact issues can be seen, as shown in figure 5; also see the suggestions on how to fix and enrich data.

Figure 5. QuestLabs Alloy quality scan results

Figure 5. QuestLabs Alloy quality scan results

Conclusion

OSDU Data Platform is offering a new way to store, manage, and consume energy industry data. However, knowing the trustworthiness and consistency of the data can still be a challenge for the energy industry, even if it already adopted OSDU. QuestLabs Alloy, built on Amazon Web Services, helps alleviate many of these challenges and takes full advantage of the modern OSDU reference architecture. Deployed together with OSDU, the overall solution has the following benefits:

  • monitor data across organizations and intervene when required
  • validate data based on out-of-the-box or company-specific rules for conventional and new energy streams
  • deploy workflows for complex corporate-specific data quality and enrichment requirements
  • substantially lower implementation time for new data entities or new domains (for example, solar or wind)
  • build robust applications on the stable data foundation using full data lifecycle APIs and built-in data and business rules
  • spend less time and resources to deploy fit-for-purpose data