TRACING THE DATA LINEAGE CHALLENGE
Heather McKenzie looks at the hurdles and solutions to one of data management’s thornier problems.
Regulations such as the Basel Committee on Banking Supervision’s BCBS 239, the General Data Protection Regulation (GDPR) and the US Federal Reserve’s CFO Attestation require banks to be accountable for their data sets and strengthen their risk data aggregation capabilities and reporting. In response, financial institutions are evaluating and implementing data lineage tools that will enable them to track data throughout the trade lifecycle more effectively.
However, tracking data and ensuring its quality is a difficult undertaking and one that has long troubled these organisations. As Varun Singhal, senior vice-president and product manager at AxiomSL puts it, “At an enterprise level, firms use a combination of various in-house and vendor solutions, meaning that no one technology system hosts the complete data tracking capabilities. Hence, getting data lineage right and in an automated fashion is challenging work. Firms are manually extracting the relevant data lineage information from the various sub-systems to achieve data tracking.”
Richard Hogg, global GDPR specialist at IBM, says the company’s clients are working on their information governance programmes around data. This means compiling a central inventory and policy catalogue as the key framework for information governance. “Organisations are recognising that it is key to tie defensible policies against whatever the legal citations are, across all compliance regulations in all jurisdictions they operate.”
Historically, this was known as the records, or retention, schedule but it is far wider reaching and covers things such as data residency, privacy, and security even through to data breach reporting obligations. “Most recently, the impending GDPR has put an even greater focus on data lineage along with data minimisation (retention), data protection (security/cyber) and processing activities,” adds Hogg. “In parallel, it is key to manage and use a common business language or terminology, visualising the data flow and use across the business, how it is connected across the business, and tracking where it came from and where it ends up.”
The regulatory steer
There has been a ‘continuous drumbeat’ from regulators, rather than a specific set of regulations, that is driving firms to improve their data lineage practices, says Arnold Wachs, principal, practice lead data management, at US-based buyside consultancy Cutter Associates.
In general, the handling of data lineage and data tracking comes under data governance functions. Any data that goes out to regulators or clients must come under this umbrella, including the documentation and ensuring someone is responsible for data quality at all stages of the trade lifecycle.
Singhal adds that as bank executives are responsible for submitting reports to regulators and attesting to the numbers therein, senior management needs to trust their governance processes. “The person responsible for reporting should have confidence that the numbers and positions are correct.”
Wachs notes there is nothing “magical” in solving the challenges of data tracking and data governance: it requires hard work. Approaches to data lineage differ. One large Swiss bank, he says, is putting in place data governance rules that require personnel to obtain data from a specific source. This idea of using an authorised, mandated source of data is getting traction, he says, as firms tire of the challenges inherent in data inconsistencies caused by disparate data sources. “The bigger firms are doing this, and the approach is beginning to trickle down to the medium and smaller buyside firms.”
An integrated approach is required though. Since the same financial instrument, for example loans, can flow to different regulatory reports, firms cannot solve the data lineage problem in isolation for each regulatory report, says Singhal. Rather, they need to take a holistic approach. “Firms are working towards building a more comprehensive solution with the ability to track the financial instrument from the point of origin where the instrument was booked to the final stage where it is reported to the regulator,” he adds.
Hogg says a governance catalogue is the key framework. This provides the shared business terms and inventory of what each type of information, or data, is, where it comes from, how it is used or processed, against what policies apply, and where it ends up. This is intimately linked with the policy catalogue.
For metadata discovery and data actions, organisations should utilise open interchange formats and application programming interfaces. Aligning data, policies and automation trust is a key cultural essential for success, Hogg adds. “With this in place, for GDPR or any other regulation, you then have a defensible, authentic source of data across all the stakeholders, regardless of their function and differing needs. This allows everyone, up through to the chief executive, to answer the who, where, when, why – and most importantly, what – is the purpose and processing of the data depending on its lineage, business value, usage and policy decisions, transparently.”
Handing over the data challenge
Many firms have outsourced data management and Wachs says the third parties that provide services are having to “step up their game”. There is even more push from clients for third-party providers to deliver clean, scrubbed data and to show the lineage of that data. He believes suppliers such as State Street, BNY Mellon and JP Morgan are responding by increasing the transparency of their underlying data structure.
David Pagliaro, EMEA head of State Street’s data analytics arm, Global Exchange, says a hybrid approach will emerge among buyside firms. Certain data and content sets will be outsourced to firms such as his while other, more sensitive, data will remain inhouse. Under GDPR rules, there will be some content such as operational data related to trade management and fund accounting that service providers may not want to take on because of the potential fines for violating information privacy rules. More standard data could be moved to managed services where a firm will benefit from cost and scale. This would include operational, regulatory and risk data related to trade management and fund accounting.
The key to a hybrid model will be how the two models work together, he says. “In the past, firms have had very structured data warehouses and the cost of managing data was high. Now as we are moving to the cloud-enabled systems, the cost of data management is very much lower. A potential model is that the sensitive information sits within the client’s internal cloud, while standard data sets reside within the managed services cloud. There could be cross-pollination between the two, but a firewall that prevents the third-party from seeing the sensitive data.”
Singhal says understanding the origin of the data, its flows and the transformations it undertakes, provides an organisation with a complete picture that ensures data integrity. For regulatory disclosure, it is critical to have the ability to quickly respond to enquiries relating to numbers on the regulatory-facing reports. Also, complying with constantly changing legislation and technology improvements requires using tools which demonstrate the impact of those changes across all systems/reports using them.
“A successful data lineage solution provides the end user – including business users, such as a report owner, and technology users, such as a system owner – the ability to track the lifecycle of a data element from the point of origin to the end report,” he says.
Wachs adds that many firms now realise that getting data ‘right’ for regulatory reporting also benefits client reporting. “Firms are now documenting the data lineage of individual fields; this has never happened before. It represents a huge improvement and means firms have people who really understand the data they have and can start to use that data to address day to day business problems.”