The goal of this article is to help answer questions about:
- When a master data management solution, also known as master data hub, is needed
- Why it normally makes sense to introduce it in a complex enterprise ecosystem
- How to address it as an OutSystems application.
A second version of this technical article will establish additional considerations for an MDM in an enterprise grade environment, where extra technical challenges may arise, like massive request throughput or real time master data updates, and also why these topics may become mandatory to achieve some kind of business purposes.
What is master data?
People talk about master data, but what is really master data? Is it a name for all the relevant data for a company and specially for a business? No. Although master data is relevant, it is much more than that.
Master data defines the critical objects for a business, and it is normally grouped into four different areas: entities, things, locations and concepts. Entities could apply, for instance, to a customer, a supplier, an employee or a company. When we talk about things, we mean, for example, a product, a car or a building. Locations define place nouns, like city, country or any other term that may reference a geographical location. By concepts we refer to abstractions that do not fit into the previous categories, like a contract, a license or a requirement.
Just to be clear, here are types of data that are not master data: unstructured data, transactional data, metadata, and hierarchical data. Unstructured data is not formatted; examples are document files or emails. Transactional data is often thought to be master data, but because it is events and not objects, it is not. Metadata characterizes data stored by the company but is not business objects, nor is hierarchical data, which defines relationships between every type of data.
What master data should I manage?
Master data is found throughout company information systems, which enable company business to run efficiently, but not all of it must be managed. Instead, master data should be refined by:
- Business value: Master data has to be of great value for a company’s business; otherwise it doesn’t make sense to make the effort to manage it in a very dedicated and centralized way.
An example of master data with business value could be the storage locations in a logistics business case because the place that holds inventory is crucial to running the business.
- Cardinality: The amount of elements contained in a specific data type is directly proportional to the probability it is master data. So, if a company only sells one product, even if it is crucial for the business, probably it would not make sense to have it as master data because one is a very low cardinality.
- Complexity: The more simple data is, the less likely becomes master data because it would probably do not justify the effort to manage it as master data. In a telecommunications company, the different types of voice and data services available to customers are quite complex and a very good example of a concept that should be defined as master data.
- Lifespan: Master data typically lives for long periods of time, typically months or years, even though it can change over time. In the case of an aircraft flight system, information about the flights would not be considered master data because they are short in duration and more likely to be transactional.
- Reusability: Data that is likely to be reused often should be defined as master data. This is one of the biggest reasons to establish certain information as master data, because if there’s no need to reuse, it can simply be defined in the specific context where it is required. Imagine the case of a shopping center: the stores available for rent are a basic concept for reuse in the different systems needed to manage this kind of business.
- Volatility: If data changes frequently, it is most likely transactional and not master data.
And, when data is almost static, changing only rarely, it is probably more correct to define it as metadata than master data. The timezones for the world are not a good example of master data, but of metadata instead, since they have a high probability of not changing in the future.
By contrast, a checking account is something that changes so often that it should also not be considered master data but, in this case, transactional data instead.
Why do I need a master data management solution?
Imagine your company just grew significantly, and it already has a vast application ecosystem that encompasses several different company departments and business units. Most critical business concepts touch these business areas and their applications. Under these conditions, it is crucial for the business to base decisions on the same critical data, for the sake of achieving customer satisfaction and improved business results.
In another context, maybe, the customer concept is already spread over several different systems with all kinds of different information, and some of it is in contention with other data. In this scenario, a single view of the company customer is essential for achieving the goal every corporation pursues, like being profitable or growing in a sustainably way.
Thus, dispersion and unsynchronized duplication of critical data could jeopardize the success of any business or any enterprise. Consistency, consolidation, uniqueness and reliability of master data are the key aspects for qualifying for a master data management (MDM) solution approach. MDM is a prime foundation for trusted data and more efficient business processes.
OutSystems-based MDM solution
After identifying the need for an MDM solution, companies face a make or buy decision, so analysis is necessary to determine which fits their needs. For this decision to be made, a lot of aspects must be taken into account, such as:
- The costs of buying and configuring a new and less flexible MDM product
- The costs for creating and maintaining a new customized MDM system
- The complexity of the master data hub needed
- All non-functional requirements expected to be part of the solution
If the decision is to make and not to buy, OutSystems Platform is option to be considered, especially if you already have OutSystems Platform and several applications on top of it. So, in this scenario, you can use it to build your own MDM solution without acquiring third-party tools.
When designing an MDM-based solution with OutSystems Platofrm, there are architecture best practices to consider that include the development of OutSystems applications.and the relevant standard requirements of a standard master data hub. The following MDM design solution considers a hybrid model approach, which assumes that some master data has external ownership and, at the same time, some master data domains may be of total responsibility of the MDM itself.
Therefore the OutSystems-based MDM solution consists of the following essential functionalities:
- Master data catalogue, which consists of all meta information related to each master data domain
- Master data repository, which includes all the database model that supports the master data being managed
- Exposed services, which allow master data to be accessed, created or edited from external applications
- Synchronization engine, which enables external ownership master data to be synchronized with the master data hub
- MDM BackOffice, which grants access to manage master data directly by business or technical roles
Identifying master data
As explained in What master data should I manage?, it’s important to decide which master data to manage. This can be difficult, and, for this reason, it is vital to involve all key business and technical stakeholders to achieve the best result possible. Remember, a broad knowledge of the application ecosystem must be gathered to find the optimal set of master data domains we should include in the final MDM solution.
MDM implementation in every organization is a transformational path, which should be seen as incremental on the way to a notable objective. So, even if the task to define the subset of data to manage is arduous, the end result is worth it.
One way to accomplish this task is to accumulatively identify and gather the following information in a spreadsheet:
- Domains (see details below)
- Domain ownership and responsibility
- Domain hybrid ownership
- Domain criticality and priority
- Relationships between entities
- Applications dependent on each identified entity
- Additional entity management information to be managed in the MDM
- Sources of information in the case of domain external ownership
Apart from being essential for implementing the MDM, this information defines a transformational plan for introducing the MDM in the enterprise.
OutSystems uses an architecture best practice approach called 4 Layer Canvas, which is detailed in the knowledge base topic Designing the architecture of your OutSystems applications.
The recommended 4LC architecture for the OutSystems based MDM solution is the following:
The Back-Office is the only end-user module of the MDM, it is the one implementing the UI to manage master data entities.
In the Core Business layer the main functionalities are present, catalogue and repositories, along with the exposed API services and synchronization engine.
The sources based synchronization and the application theme are the modules included in the Library layer.
Consumers of the MDM
Consumers are all the applications and external systems that, in some way, may need to interact with the master data hub, and because of this, they will depend on the MDM solution.
There will be 3 different types of MDM consumers, the Replicator, the Caller and the Referrer, as explained afterwards and shown in the picture below:
As the name indicates, this type of MDM dependent application or system, typically needs to replicate total or part of the master data domains available. We may characterize the following way:
- Calls the MDM API periodically to get the master data and stores it locally (persistently or cached)
- Needs to implement a local synchronization mechanism
- SOA ready because it enables loosely coupled integration based on services (in our case using MDM API), independency of components and, this way, facilitating a building block architecture that can be managed without affecting the overall system functioning
In this case, we have a consumer that simply makes service requests to the MDM API every time it needs, so this one is defined with this attributes:
- Calls the MDM API to get the master data every time it needs
- Depends more on enterprise latency due to remote communication to fetch master data
- Should be avoided in case of constant master data dependency
- SOA ready as explained before
Last and definitely the one we should avoid in case we need a really decoupled MDM solution, as we should desire in a best practice scenario context. Here it goes the main facets:
- OutSystems consumer that references directly entities from the MDM repositories
- The best to minimize impacts on migrating existing OutSystems applications to use the new MDM
- Enables query directly master data entities
- Should be avoided because it does not allow us to have a decoupled and self contained MDM
- Compromised MDM SOA architecture approach
The Catalogue of the master data hub contains all relevant metadata that characterizes the data managed by the MDM.
All definitions of master data domains should be placed In the catalogue, entities relations, ownership, how data is fetched whenever it makes sense, what additional information may be added each domain, and so on.
This module is where MDM metadata is defined and where configurations to control the desired behavior from each domain we can exercise.
Here follows some details on each area that should be addressed by the MDM Catalogue:
Domains correspond to conceptual groups of entities of the master data, i.e., the logical way to separate different sets of master data entities, which should be decoupled from one another to enable data physical separation and maintenance, along with a consistent MDM technical architecture.
It also defines the set of master data domains that are controlled by the hub.
It defines who can do what on each domain based on OutSystems roles.
For each domain there should be one and only one repository, where all the entities will be stored.
Entities and domain hierarchy should also be defined in the catalogue to ease master data synchronization and visualization.
Each domain must have an ownership definition showing exactly which scenario is to be handled on each case: internal (local MDM ownership), external (external system ownership) or hybrid (both MDM and external system ownership).
An hybrid scenario means that some physical entities of a specific domain may be extended by information that will be controlled by master data hub itself.
A domain with External ownership may have more than one external system to query for data, that will be orchestrated at the integration level.
- Pooling synchronization
An external ownership domain must define the pooling synchronization details (frequency, connection details, lag between request, …) to accomplish this operation.
- Extended Information
In case of hybrid ownership, it should be defined which additional information is to be managed by the MDM itself.
For caching purposes, like we’ll see on section Caching, there should be defined a version for each master data domain that will enable cache invalidation.
The MDM Repositories are the modules where master data entities are stored and grouped as master data domains. There should be one and only one Repository, configured at the Catalogue, for each master data domain defined there.
These repository modules must also support all interoperability and business logic that feeds the MDM exposed services and synchronization engine, in case of hybrid or external ownership. The reason for this is that they must implement and address all the local specifics of each logical domain in a loosely coupled way, this way avoiding other more generic modules to become tight to the particularities of a special domain.
The way master data entities are grouped into the same MDM repository depends on how related these entities are, how volatile master data is and how decoupled do we need them to be.
Different repositories cannot reference each other, since they may be stored in different database catalogues.
Master data entities should only be exposed as read-only and change operations should be made available as public user actions, as good OutSystems architecture practices recommend.
Finally, master data entities should comprise a versioning feature and, in some cases, auditing may also be necessary.
Exposing services as REST API
MDM should deliver services for external systems to interact with it in a SOA based architecture. These services should be exposed as REST API in order to enable synchronous access to the catalogue and repository master data. This way, as there is no fixed contract available between the client and the service provider, changes to the API can be restricted to the master data domain they apply to.
MDM API should be the standard way to retrieve master data from the MDM, and it is also how Replicator and Caller consumers do it, as we showed before in section Consumers of the MDM.
In the following example, you can see API methods to retrieve all or a specific record of the catalogue or repository.
In case there is a need to change master data from outside the hub, it may also be possible to include create, update and delete methods at the API level.
The way MDM API exposes synchronous REST API services should be totally agnostic from the master data entities that are being handled. The conversion logic should be transferred to the repository modules.
The flow of control should be like shown in the picture and explained below:
- Consumer requests all supplier master data via REST API
- MDM API asks MDM Catalogue which repository is the responsible for these entities
- MDM Catalogue returns the responsible repository to the MDM API
- MDM API requests supplier entities from the correct repository (MDM Repository Supplier)
- MDM Repository Supplier returns all supplier entities from the repository as JSON documents (repository is responsible to create JSON documents consumer will know how to read) to the MDM API
- MDM API sends the response (JSON documents with all entities) to the consumer
External data synchronization
One of the core features of the MDM is the ability to synchronize master data that is sourced outside the hub.
Two major modules are dedicated to the synchronization of the data contained in the external system against the MDM: the Sync Engine and the Repository Integration Service.
The Sync Engine is where the pooling mechanism, based on BPT OutSystems technology, is defined and it should work as a mere synchronization orchestrator, not knowing at all how to update the master data itself. It should work in an agnostic way, knowing when to start synchronizing, when to stop it, the amount of data to include on each iteration, from whom it should request the data (specific MDM Repository Integration Service) and to whom the requested data should be sent for update on the MDM (correspondent MDM Repository).
The Repository Integration Service is the integration orchestrator, knowing from which connectors the data should be requested (could be more than one) and how to assemble this data in one single business vision.
In the picture below, the ballpark of the data synchronization flow is detailed:
- MDM Sync Engine starts by requesting all necessary sync metadata from MDM Catalogue
- For each logical entity that needs to synchronize starts requesting the MDM Repository IS for entities that need to be updated
- MDM Repository IS orchestrates and requests all necessary information from each specific connector (SAP, GO, ...)
- Connectors request data from each external system source
- After getting all refreshed data from the sources, MDM Sync Engine sends it to the correspondent MDM Repository (repository should be responsible to synchronize all the master data taking into account metadata defined in the catalogue)
There should be a user interface module, the Back-Office, where catalogue configuration and repository maintenance should be made.
For the configuration of the catalogue, it should be possible to define the Repository and Integration Service modules associated to each master data domain, along with other specific information defined as explained before in Catalogue.
Regarding the maintenance of the repository, which is, in fact, the same as master data entities maintenance, the Back-Office module should enable the creation, update and deletion over the master data owned by the MDM, or the extended information available for entities with external responsibility, and also include view capabilities over all master data.
What can be done by whom and on each domains and entities should be controlled with OutSystems roles.
Setting up a cache has the goal to speed up the access to the master data hub, this way reducing the impact on external consumers.
Since caching may also have a negative impact on the system, like high memory consumption on the server, it should be applied with some extra care, taking the following aspects in consideration:
- only use it on MD Repository Modules read user actions
- only apply it to MD Repository Modules that may take real advantage of caching, i.e, do not change constantly and may be reused several times during the caching timeframe
Platform allows you to define cache for each action, by doing it you are keeping in memory the output for the this action for a call with the same parameters:
- per each set of input parameters a output is cached
- run the logic once and, if no errors occur, cache the output
This way, in order to invalidate the cache whenever master data is changed on a domain level, the use of the master data domain version should be of great help. Here follows what should be done in order to apply this cache invalidation pattern:
- Include a version field at the master data domain level in the Catalogue
- Increment the version of the master data domain every time a master data entity belonging to this domain is changed in the repository
- Add a version input parameter to the Repository read user actions used by the MDM API
- Always call the Repository read user actions using the latest domain version contained in the Catalogue
We all know an MDM system is to be normally used by plenty different external systems / applications, and that some master data domains should be completely managed separately. With this in mind, in this section we explain how this can be achieved exploring OutSystems Zones and Multiple Database Catalogues.
Infrastructure improvement with Zones
Two different zones may be defined to enable future horizontal escalation of the MDM solution.
- MDM API Zone
- MDM Sync Zone
The MDM API Zone should be created to have dedicated front-ends to serve the exposed MDM services, giving the solution the following benefits:
- Enable future escalation by increasing FEs that serve MDM API zone
- Separate MDM API requests from the MDM Back-Office and MDM Sync load
The MDM Sync Zone should be created to have dedicated front-ends to synchronize data contained in MDM Repositories, granting also the improvements below:
- Enable future escalation by increasing FEs that serve MDM Sync zone
- Separate MDM Sync load from MDM Back-Office and MDM API requests
Infrastructure improvement with Multiple Database Catalogues
Addressing a Multiple Database Catalogue approach for the MDM repositories, introduces a set of database management advanced topics that will be of great benefit for the whole solution:
- Improve database backup policies per application
- Improve database maintenance plans per application
- Optimize I/O performance by splitting the application‘s data across storage systems
- Smaller database sizes through data distributed by several databases
- The possibility of restoring non-BPM databases without affecting other applications
- The ability to have different database transaction logging policies
Extending master data
The MDM is an ongoing and extendable solution, which means it may be improved with new domains, entities or even extra MDM managed information. Of course, to enlarge the solution, the necessary changes must be implemented without the need of redesigning or re-architecting the whole system.
Here is a short list of activities focused on what should be performed to enhance the MDM master data:
- Analyse the new master data and identify its module (existing or new one) and entity (existing or new one)
- Configure the catalogue in case the master data should be defined in a new domain
- Create the new MDM Repository (case of new domain) or extend the existing one (case of extending existing domain):
- Add logic to serve the MDM API
- Add logic to enable synchronization, in case of external ownership of the new domain
- Create the new MDM Repository IS or extend the existing one, in case of external ownership
- Add logic to get data from each external connector
- Create a new external connector in case data comes from a new source, or extend an existing external connector in case the type of data is not yet being held
How to succeed when putting MDM in place?
As a centralized and core piece of an enterprise ecosystem, the transformation to include an MDM solution into the enterprise is a hard task with lots of different kind of impacts that should not be disparaged.
Please take care of the below topics when thinking about putting an MDM solution in practice in your organization for a great success.
- Find an executive sponsorship for the MDM implementation
- Identify correctly the impacted business use cases
- Set up the MDM as a transformation path and not as a “big bang”
- Plan the impact on your organization and take technology, organization and processes into account
- Define a data governance group to clearly define what master data to manage and how to do so
- Define metrics to measure MDM transformation success
To make this article even more rich and helpful the following references were taken as an inspiration.
- The What, Why, and How of Master Data Management
- Master Data Management (MDM) Hub Architecture
- The 8 Worst Practices in Master Data Management and How to Avoid Them