Problems with your data? You need DataOps*
[*DataOps = Proactive and preventive data governance]
The manifesto DataOps brings together a series of practices that were published in 2017 in an attempt to solve problems related to the inefficiency of data generation and processing, as well as data quality in relation to inconsistencies and incoherencies between data. Despite what one might initially think, DataOps is not just DevOps for data, although the idea does stem from applying this concept, which is widespread and well established in the field of software development and operation, to the field of data.
What the manifesto says DataOps, complemented by the initiative launched in 2018 called ‘The philosophy of DataOps’, is that DataOps is a combination of agile methodologies, DevOps concepts, and everything known as Lean Manufacturing.. In this way, in addition to DevOps concepts, management concepts more closely related to agile methodologies and other concepts closer to the world of industry and manufacturing and production processes are also incorporated.
Thus, the purpose of DataOps consists of efficiently managing all aspects of DevOps (idea, development, software production) together with the value chain and data lifecycle. In this context, DataOps is a series of techniques, methodologies, tools, and processes that, when put together, help an organisation or a specific project extract greater value from data by automating the processes that occur in the data lifecycle.. All this with the aim of achieving greater project profitability and data analytics initiatives.

From a functional architecture perspective, a platform DataOps pursues:
- Integrate data from different sources, automating its intake, loading and availability.
- Control the storage of data with its different versions over time, recording information and data transformation processes.
- To have a centralised metadata management that serves not only to access the available information but also to activate and configure the platform's processes.
- To have control over all management related to request, authorisation and permissions for data access for consumption and exploitation.
- And finally, apply mechanisms and techniques of analytics, reporting and dashboarding to monitor and track what is happening across the entire platform.
All of this, operated by a DataOps team to facilitate data sharing and the development of projects and initiatives on the platform for information producers and consumers.

Additionally, on the one hand, a fit must be sought with the automation of software deployments to achieve continuous integration, while on the other hand, in relation to data flow, Data pipelines must be orchestrated, tested, automated, and monitored. within the platform itself: as always, moving data from areas where the information is raw, until it is processed, refined and enriched in subsequent layers so that analytics can finally be performed on the information in the exploitation layers.
In this context, the Data Government often positions itself at the end of the value chain and data life cycle, which is completely wrong, and we will explain why.
Throughout the entire data value chain, the following coexist: multitude of roles that will have to collaborate with each other, from developers to business users, including architects, operations teams, systems technicians, etc. Therefore, it is necessary to implement management methodologies (in this case agile) as a basic principle and with a continuous change management, fundamental parts in the Data Governance. Only in this way can we finally achieve the automation of technical processes that will bring us greater efficiency and security, which is one of the main objectives we are pursuing. DataOps.

Additionally, in order to achieve the objectives of DataOps, it is necessary to have a complete control over the data lifecycle, separate information into logical layers and know exactly how it flows throughout the data ecosystem as a whole (lineage and traceability), as well as consider how the data lifecycle fits in with the software life cycle. Finally, we must not forget the importance of achieving a full integration with the technical architecture, since without integrating the different technical parts of it, the corresponding data processes cannot be automated. And, at the end of the day, these are all parts in which the Data Governance has a lot to contribute.
Finally, to achieve complete integration and automation of processes, the following must be incorporated: three essential layers of management:
- Demand management in data projects: changes to existing information, new cases of use and exploitation of information, capture and ingestion of new data, etc.
- Metadata and version management of data structures, since ultimately metadata will be the central element that will enable the automation of many functions related to this concept of DataOps.
- Centralised management of data access permissions, so that ultimately it is possible to know who consumes what information and for what purpose.
Therefore, after having outlined everything necessary to achieve the implementation of a model DataOps, we can understand it as the natural evolution of the Data Government. A Data governance by design (governance-first o governance by design), proactive and preventive, positioned at the beginning of the data value chain and accompanies the different stakeholders throughout the entire data lifecycle, serving as the central axis of the processes and providing a global vision at all times that will enable us to achieve the desired effectiveness and efficiency.

Despite what one might think at first glance, the Proactive and preventive data governance located at the beginning of the data value chain, it deals with minimise bureaucracy with clear and concise policies and procedures, providing flexibility and agility to processes (especially management processes), thereby enabling maximise synergies between the different projects and use cases, promoting reuse both information and existing processes on the platform. In short, thanks to this vision of the Data Government, we are enabling a single point of entry to the data ecosystem for all data stakeholders, also applying a collaborative approach to:
- Integration with demand management.
- Integration between technologies and ecosystem components.
- Automation of common technical processes.
- Incremental and iterative approach by use cases.
- Democratisation and governed data self-service.
- Monitoring for continuous improvement.
All this is only possible by building a single, centralised metadata repository as the centrepiece of the data ecosystem and separating data governance and management from technologies and platforms underlying this is that the Data Government has a very different focus from that of technologies for capturing, storing, treating, processing and exploiting data, which are much more concerned with process performance than with proper data management.
Finally, complementing this repository with a common and unique language based on knowledge of the business and the organisation (known as a Business Glossary) will in turn enable us to build a semantic map of information assets (incorporating the corresponding taxonomies) that will allow us to bringing data closer and closer to business users and continuing to build bridges with technical teams.
If you would like to learn about some DataOps case studies you can see this presentation and if you want to know how you can succeed in implementing a DataOps model thanks to Anjana Data, write to us and we will be delighted to help you 🙂
