Missing the functional piece in a data project puzzle€¦ · Missing the functional piece in a data project puzzle The financial industry is going through a disruptive phase, in

Missing the functional piece in a data project puzzle

The financial industry is going through a disruptive phase, in which buzzwords such as blockchain,big data and deep learning are enticing financial institutions to ride the technological wave.Solid data management is the foundation of these developments. Financial institutions not only havean internal drive to create this foundation, as this improves analysis and decision-making, but alsoface challenging regulations imposed by national and international supervisors. Regulatory focus has become more stringent and more data-intensive, thereby challenging the capabilities of financialinstitutions to perform timely and accurate data aggregation while maintaining consistent risk and finance reporting. A failure to produce timely andaccurate risk and finance reports can ultimately lead to financial penalties or additional capital charges that directly impact the profitability of the firm.

Functional data management: What makes dataprojects excessively costly and never-ending?

Quite often, data transformation projects are seen as a technical exercise to bring data from source systems to end users. This undercuts the required focus for the functional part of this process, where a lot of added value can be gained.

In this paper, we advocate a functional-driven data flow throughout the whole reporting chain. In addition to the technical perspective, the functional perspec-tive ensures focus on the long-term strategy and business as usual, while having a foundation that can adapt to new regulations and the changing business priorities.

Furthermore, we outline the principles for a founda-tion to build a resilient and robust finance and risk data landscape. A clear functional data flow is defined with in-depth analyses on the application and imple-mentation of the flow.

Assessment of current data landscapeThe increased focus on quantifying risk substantiated

with reliable data, and even more on being in control

of the risk figures, has required financial institutions to

move their focus to a more data-driven environment.

The road to this robust data landscape is theoretically

sound and rational, but the execution is always difficult

as financial institutions face several challenges:

• A landscape full of legacy systems and long (manual)

data chains, making change difficult.

• Ever-changing regulatory requirements that frequently

derail the strategic roadmap.

• A multidisciplinary set of endusers, each with their

own very specific requirements and definitions.

• Increasing integration over different functional

domains emphasize the need for consistent data over

different end users.

This can also be seen, for example, with the imple-

mentation of the PERDARR BCBS 239)1 . PERDARR is a

principles-based guideline for banks, emphasizing the

importance of data-related topics, such as achieving the

desired data quality, data definitions, data availability,

data accountability, as well as the data storage and

retrieval process.2

From small local banks to the global systemically

important banks (G-SIBs), countless programs and

projects have been initiated in order to tackle the data

management challenges financial institutions are

facing. But countless programs and projects have also

been terminated before objectives were met.

On the flip side of these huge challenges, there are

also huge benefits. Regulators have pushed financial

institutions to align the use of data within their orga-

nization over different departments. Recent examples

are the alignment between credit risk and finance (e.g.

IFRS 9) and ALM/market risk and finance (e.g. IRRBB,

EBA stress test). However, compliance with regulatory

requirements is not the only driver for a solid data

foundation. Benefits include institutions that have

perfect reconciliation of data, end users who obtain a

better insight into their risk positions, spend far less

time on periodical reconciliation and are less prone to

operational risks.

Furthermore, they are far better positioned to adopt and

implement the next (regulatory) change in their organi-

zation, benefit from increased client analysis potential

and improved input for management decisions. Growth

is achieved more easily on a scalable data landscape, as

is evidenced by the emergence of fintechs who have the

luxury of not having any legacy systems.

The case for functional data managementThe key to achieving these benefits is to ensure

involvement of key persons with functional knowledge

in setting up the IT landscape (systems, applications,

databases, etc.) which can support the entire risk and

finance reporting and analytics data chain.

Functional knowledge is essential in the design phase

of data models and data flows to create a resilient

data landscape which is scalable and flexible enough

for future developments in regulations and changes

in business strategy of the firm. This enables financial

institutions to swiftly adapt to new regulations such as

IFRS 9 or IFRS 17, new requirements for the stress tests

or a data request for ANACredit. Our belief is supported

by the latest assessment of EBA on the progress of

banks adopting the PERDARR guidelines.3

Three of the key features that the EBA identifies in the

failure to comply with PERDARR are:

1. Incomplete integration and implementation of bank-wide data architecture and frameworks (e.g.

data taxonomies, data dictionaries, risk data policies)

This is a direct consequence of not having a holistic

and functional view over the complete chain.

Alignment between all layers in the chain is bound to

lead to failure if different data taxonomies and risk

data policies are used and different quality standards

are adhered to between layers or even between

business units.

2. Flaws in data quality controls (e.g. reconciliation,

validation checks, data quality standards)

Business must be involved in data quality controls, to

extend technical data quality controls with functional

1Source: The Principles of Effective Risk Data Aggregation and Risk Reporting - BCBS 239, Bank of International Settlement, Jan 2013, https://www.bis.org/publ/bcbs239.pdf2For more information about the introduction of BCBS 239 see also: Why is implementing BCBS 239 so challenging? https://zanders.eu/en/latest-insights/why-is-implementing- bcbs-239-so-challenging/3Source: Progress in adopting the Principles for effective risk data aggregation and risk reporting, Bank of International Settlement, March 2017, https://www.bis.org/bcbs/publ/d399.pdf

“On the flip side of the huge challenges, there are also huge benefits”

data quality controls based on business logic.

EBA states that data quality is often deemed

insufficient for regulatory reporting.

3. Over-reliance on manual processes and interventions to produce risk reports As a result of the second point, many manual proces-

ses and interventions are created in order to produce

risk reports. Consequently, the quality of the end-to-

end reporting cannot be guaranteed if just a slight

change is made at the start of the chain, a so-called

snowball effect. In relation to this, due to many

manual adjustments by different users in the chain,

numbers in end reports cannot be reconciled anymore.

These findings can be attributed to a missing functional

perspective in data management and the holistic view

of the entire risk and finance reporting chain. The re-

porting unit has thorough knowledge of the regulatory

requirements and the creation of the risk and finance

reports, while the architects and developers creating

the data landscape have in-depth knowledge on data

management from a technical point of view. This creates

a gap in overall data management, increasing the risk

on misalignment. The positioning of key persons with a

functional background overlooking the entire data chain

will bridge this gap.

Involvement of these key persons starts at the founda-

tion, when defining the single source of truth (SSOT).

We apply a framework with five crucial principles to set

a solid foundation for a robust data landscape and

address the key drivers of failures identified by EBA.

Key principles of the single source of truthThe key principles are the starting points when defining

the data landscape to adhere throughout the whole

reporting chain. The most essential element in this

chain is the introduction of the single source of truth,

a Generic Data Layer (GDL) that forms the basis for all

data deliveries to end users. Benefits of a SSOT is that

all reports and analytics are based on a single version

of the truth, hence no reconciliation, definition or timing

differences between reports.

Involvement of functional knowledge starts when the

internal and external data requirements and definitions

are defined. This is crucial for building a robust and

resilient SSOT. This functional view creates a level of

comprehension on how to structure and design the data

landscape. The requirements must provide a clear over-

view of the known and expected future risk and finance

attributes. Moreover, as development of the regulatory

landscape is always ongoing, an effective SSOT in line

with the key principles will be able to absorb the chan-

ging requirements without affecting the chosen setup

and structure.

At Zanders we have defined the following key principles

for an effective SSOT:

1. Data in the GDL must be stored at the lowest possible

level of granularity, making aggregation and deri-

vation of calculated information further down the line

more structured and interpretable across reporting

purposes. This avoids the inclusion of redundant

information in the SSOT.

2. Generic external data must be stored in the GDL.

This entails data such as economic variables

(e.g. interest rates and bond prices), ensuring that

the same market data is used throughout the

organization.

3. A unique key must be generated for all loans,

counterparties and other instruments in the GDL.

This is paramount in the data lineage process and

reporting alignment.

4. The source systems are the owners of the data and

not the GDL. No corrections are executed, and data

quality issues or data gaps are restored at the source.

Data enrichments in subsequent layers are owned by

the specific layer.

5. The GDL combines all requirements from all end

users, resulting in a generic setup of definitions,

dimensions and dictionaries that is understood by all

the end users and, more importantly, is accepted by

the end users. Involvement of key persons with func-

tional knowledge is crucial in this step.

These key principles should be governed by a clear

interdepartmental operating model and roles, respon-

sibilities and accountabilities must be made explicit to

have an effective data management process.

Graph 1.

A functional

data flow

from source

to output

4Note: Functional work flow in this context means the logical flow of data and information which is independent of any particular storage technology or data warehouse and its technical implementation

“The single source of truth is a Generic Data Layer (GDL) that forms the basis for

all data deliveries to end users”

Ultimately, depending on the organization structure,

the chief data or risk officer is the owner and account-

able for the data and data quality of the whole reporting

chain.

With these key principles in mind, the functional data flow4

can be further outlined when setting out the data landscape.

Functional data flowIn a typical data work flow for finance and risk, five

distinctive layers can be distinguished, each with their

different purpose.

1. Source layer The source layer contains raw data from all sources

of all assets, liabilities and off-balance sheet items.

The data is source system-specific and often does

not align with other source systems, which limits the

possibilities to perform calculations directly on the

data. Hence, all source data should be loaded into

the GDL. A distinction can be made between data

from within the bank, i.e. from its own IT systems, or

external data from market data vendors or

subsidiaries.

• Internal

Position data: This layer contains all position data

regarding the asset and liability portfolios of a bank

that are administered in front office systems such as

loan origination systems and deal capture systems.

• External

Product related data: External instrument related

data, i.e. market prices, trade volume.

Subsidiary data: For larger institutions it is common

that subsidiaries, entities specialized in fields such

as real estate, leasing or securitizations, have their

own data landscape which needs to be consolidated

with the institutions’ balance sheet.

Generic external data: Next to external position

data, other generic data is required such as interest

rates, FX rates or macroeconomic indexes, e.g.

CBOE Volatility Index (VIX) and the Gross Domestic

Product (GDP).

2. Generic Data Layer (GDL) All data from the source systems is transformed to fit

the target data model and are integrated into the data

layer. Limiting the data flow to one recipient, the GDL,

creates clarity and decreases the operational burden

for the source systems, while also ensuring minimum

vulnerabilities in the distribution of data. The setup

focuses on durability and stability as this is the core

of the data landscape and changes will be difficult

and costly to implement.

To achieve a clear communication between the delive-

ring parties and the GDL, a set of agreements must be

in place. This ensures that both parties know what to

expect and the delivery to the GDL is the only expec-

ted delivery. This set of agreements is based on the

input requirement throughout the data chain. A clear

owner of the agreements must be specified.

At the GDL the key principles are essential. This is

where the data is stored at the lowest possible level

of granularity and a unique key is generated for all

loans, counterparties and other instruments availa-

ble. The GDL is not the owner of the data itself, as this

is still with the source system, however the GDL is the

owner of data definitions, dimension, dictionaries and

transformation. All historical data as per availability

Graph 1.

A functional

data flow

from source

to output

is stored. In the setup of the GDL, the alignment

between the definitions across sources should be

monitored and tested, ensuring all sources provide

the same data. It is essential that the definitions and

dimensions as defined in the GDL are well docu-

mented, understood and clear to the entire chain.

Ambiguity at this stage leads to inconsistent results

further downstream in the data chain.

3. Business Information Layer (BIL) For each specific internal and external requirement,

specific data requirements exist which can range from

different categorization of counterparties to specific

risk metrics that aren’t relevant in other regulatory

reports. Each data hub in the BIL is filled with only

the relevant data attributes from the GDL, preparing

the data for the calculation or reporting layer.

Within the BIL, data is enriched and specified to the

requirements, which also implies that, potentially

within the BIL, calculations take place to prepare risk

factors. The BIL is owner and responsible for all data

enrichments in this layer. There might be overlap-

ping factors and enrichment steps between data

hubs; these should be shared in order to ensure

consistent treatment.

An exception to the above is, for example, the internal

and external stress test requirements. This stress test

overlay is dependent on multiple data hubs to per-

form the overall risk calculations. The external regula-

tor has its own set of requirements for calculations,

but the input data should align with the BILs for re-

conciliation purposes and alignment of the results.

The macro-economic factors are the drivers for the

shocks and are, therefore, separately added to the

regular data stream and the calculator for stress

test purposes.

4. Calculation Layer In the calculation layer, specific risk models are ap-

plied to calculate the required metrics, for example

the lifetime expected credit loss calculator for

IFRS 9. The input to the calculation and the output

that is expected from this calculation engine is the

driving force behind the data requirements. Without

the correct level and accuracy of the input data, the

quality of the output cannot be guaranteed, poten-

tially leading to greater risks or actual losses.

5. Reporting Layer The final layer of the data flow is the reporting layer,

consisting of a reporting cube and a reporting engine.

The reporting cube is filled with the results from the

calculation layer and directly from the BIL if no calcu-

lations are required. Reports, both internal and exter-

nal, are compiled by the reporting engine on the

reporting cube. In all cases, reports should solely be

based on the reporting cube and should not be filled

with data from a different layer. This ensures trans- parency of the chain and consistency in the reports.

Relevant information, which is (re)used in other risk

calculations or financial processes, such as IFRS 9

provisions or the predicted cash flows of certain

instruments, is fed back to the GDL or BIL.

As a minimum requirement, data validation and quality

controls should be in place in the first three layers

(source systems, GDL and BIL). When data reaches

the calculation engine, the data is already validated

and checked for specific calculation purposes. At each

layer, the nature of data quality controls can differ, as

data quality controls at the BIL are undeniably more

functional in nature with more business logic. Note that

data quality issues are not solved in each layer, the

pitfall of current landscape as mentioned above, but

should be looped back to the source. Furthermore, data

quality controls between all layers should be in place

checking on the completeness, correctness and lineage

of the data.

Implementation of the functional data flowA key factor in the implementation of a functional data

flow for a reporting chain is being able to maintain an

overview of the entire chain and considering all the

principles defined in the previous section. Alignment

with all the stakeholders is one of the major hurdles

that needs to be tackled, where an understanding of all

parties’ needs should be clearly in scope. Implementa-

tion of the functional data model and flow requires all

stakeholders, functional and technical, to collaborate

and bridge the gaps in terms of understanding the

Key elements for a successful data project

• Add functional knowledge to the team, specifically

to govern the entire data chain

• Actively involve the end user in the setup of a

data model

• Set up strict data governance rules and follow through

• Create a single source of truth and assign data

owner(s)

• Implement data validation and quality controls

between all layers

• Create a set-up that is adaptable to regulatory

changes or new end users

• Create clear requirement documentation starting at

the end-user perspective

requirements. While risk and finance should be closely

involved in drafting the requirements for the data

model, flow and controls, the IT manager should be

more involved in the actual purpose and have more

functional knowledge about usage of data.

This implementation process is not straightforward

and does imply a considerable amount of effort from

all stakeholders prior to the realization of potential

benefits.

Besides bridging the functional gap between stake-

holders, the introduction of functional and technical

data validation and quality controls at each of the layers

will ensure that the data is in line with the requirements.

This provides a direct lineage overview on where

the data is coming from and any alterations or

business logic.

To ensure a form of standardization and feasible level

of implementation, principles need to be set for the data

flow as new issues and challenges will arise and will

require solutions within the existing data landscape.

It is important to keep the five layers consistent with the

intended purpose of each layer.

The last piece of the puzzleThe pitfall of current large transformation projects is

the lack of functional knowledge throughout the whole

reporting chain. Functional knowledge is key at all

levels and layers: on a strategic level where the

strategic roadmap of the IT landscape is defined, but

also on the lowest level when determining the

requirements for the GDL or a specific BIL.

Zanders believes that functional knowledge over the

whole reporting chain is the last piece of the puzzle for

completing the finance and risk architecture and

corresponding IT landscape.

Jasper van [email protected]

Scott Lee

[email protected]

Vincent [email protected]

Save time and moneyDo you want to get in control, stay in control, and save time and money on your (big) data processes?

Contact us:

The added value of Zanders

Our track record and expertise can help you overcome

challenges in the areas of:

• Managing large transformation projects

• Improving functional knowledge through the entire

data flow

• Translating regulatory guidelines into functional

requirements and data models

• Creating risk processes and reports

• Adaptation to changing regulatory requirements

• Optimizing risk and finance models

• Modeling assets and liabilities at financial institutions

• Data validation and data quality controls

mailto:j.van.eijk%40zanders.eu?subject=

mailto:s.lee%40zanders.eu?subject=

mailto:v.tijsterman%40zanders.eu?subject=

Documents

Missing the functional piece in a data project puzzle€¦ · Missing the functional piece in a data project puzzle The financial industry is going through a disruptive phase, in