What is the difference between ETL and MDM?

Qual a diferença entre ETL e MDM?
ETL

ETL is a well-known concept. ETL stands for Extract – Transform – Load, or, in “almost good Portuguese”: Extract, Transform, and Load. In recent years, some vendors, notably IBM, have introduced “several variations,” such as TEL and T-ETL, for example. But at its core, “ETL,” despite the meaning of its combined letters, is a data migration or transport process.

The main goal of an ETL tool is to transport data from one or more source origins (A) to a destination (B). Normally, (A) is a relational database and/or text files (xml, csv, “plain text”). And the destination (B) is usually a Data Mart or Data Warehouse (relational database, NoSQL, Big Data, etc.). Obviously, there are other more specific (and even cooler) uses for ETL.

So, in an ETL process, in summary:

– Phase 1 – Extraction (Extract):

Data is extracted from one or more sources/origins (A)

– Phase 2 – Transformation (Transform):

  • Applying “mapping” rules: SP to São Paulo, RJ to Rio de Janeiro, or 1 to Male, 2 to Female, and 3 to … 🙂
  • Calculations: Total Sales Price = Quantity x Unit Price in USD x Dollar Exchange Rate on the sale date
  • Grouping: Total by Category, Demographic density of a macro region
  • Transpose/Rotation: transforming rows into columns and vice versa
  • Selection/Exclusion of data, for example, only customers from the state of SP will be delivered to the destination
  • Creation and/or replacement of keys (ids – surrogate keys)
ETL

Some standardization may occur: Refrigerator = Fridge, SPaulo, São Paulo, S.Paulo, San Paolo all become São Paulo, for example.

– Phase 3 – Load

Load the data into the destination (Data Warehouse (DW), Data Mart, Big Data (NoSQL), etc.)

Note that all the magic happens in phase 2, the “T” of “ETL,” however, all phases are equally important because a failure in any of them can negatively affect the entire process outcome.

So, I have no hesitation in defining ETL as a major data movement process, which can indeed happen fully, partially, incrementally, or differentially. And, since there is a source (A), it is possible at any time (given the necessary scope and effort) to recreate the destination (B). Therefore, (A) is fundamentally more important than (B).

Usually, ETL tools are used mostly operated through creating data flows. Among the best-known tools: InfoSphere DataStage (IBM), ODI – Oracle Data Integrator (Oracle), OWB – Oracle Warehouse Builder (Oracle), PowerCenter (Informatica), CloverETL (Clover), and free ones such as Open Studio for Data Integration (Talend), CloverETL Community (Clover), Kettle (Pentaho).

An MDM (Master Data Management) process contains ETL within it, although many specialists disagree. So, within an MDM process, data is also collected… extracted from one or more sources (A), many transformations occur, data is transported… and finally, delivered to one or more destinations (B).

ETL

The big difference now lies in the T… Ah, this little “t” works a lot inside an MDM process.

In the T, besides everything done in a traditional ETL, the following is also applied (in MDM’s case):

– Geolocation: address correction and adjustment (for example: Rua Paulista becomes Avenida Paulista), ZIP codes, city names, states, streets, neighborhoods are corrected. Address numbering is checked against the street’s number range, and geographic coordinates are assigned: Latitude and Longitude

– Standardization: product description, category, classification, person names, measurements, etc.

– Cleansing:

– Some business rules are applied

– Data grouping

– Data deduplication: elimination of identical (or similar as needed) data.

What is MDM?

Master Data Management or Master Data Governance. “Master data” refers to all non-transactional (movements) data that are important for the company. Example: People (customers, fornecedores, parceiros, revendedores, distribuidores, colaboradores, etc), Regras de Negócios, Items (produtos e suas definições).

In Brazil, we have been led to believe that non-important registries, but rather the transactions (non-master data): sales, purchases, inventory, accounts payable and receivable, etc.

ETL

Is the customer registry not important? Let’s see:

– Imagine launching a campaign for articles for pregnant women targeting 15-year-old girls? (This happened in the United States with the giant Target.com. It resulted in several lawsuits and thousands of dollars wasted.)

– Imagine one of the largest private banks in the country opening two branches in a region it believes has many potential customers? It turns out their geolocation data was flawed. This happened with a large national bank, which was forced to close both branches and implement one of the largest MDM projects in Latin America.

– What if a well-known retail chain decides to run a major promotion, investing thousands of reais in TV campaigns, aiming for a sales increase of over 25% in a certain period, with a 7% profit increase for the same period? After the campaign, it was analyzed that sales actually increased by a record 37% in that period. But soon came the harsh reality: backlog of orders (deliveries stuck), and a loss of nearly 15%. Reason: business rules, freight, and poorly defined items – flawed data.

I’m not saying transactional data isn’t important. Indeed, it is. A lot. But my generation, which started in IT (formerly Data Processing) in the 1980s, was taught, indoctrinated, motivated, and misled to believe that master data (aka registries) are not that important, as they are “recoverable.” “Oh! If we lose the customer registry, just get Joãozinho and Zezinho to type it back in.” No, it’s not that simple.

Today, increasingly complex and intelligent systems, digital robots, behavior analyzers, sensors capture “transactional” data, but are guided and make decisions based on “information” extracted from master data (again: registries).

Therefore, “flaw data,” flawed data, can lead intelligent and complex systems to make or lead to equally flawed decisions.

ETL

We then have that the MDM – Master Data Management process came precisely to manage these “registries,” improving the quality of the data contained therein, so that they can produce high-quality information.

MDM and Big Data may or may not go hand in hand. It is incorrect to say that MDM only coexists in a Big Data environment. They are totally different concepts that may or may not complement each other. However, MDM is much more connected with the RDBMS (Relational Data Base Management System) world – that is, the good old relational database (Oracle, MySQL, DB2, Sybase, SQL Server, etc).

MDM is not a process that ends. It is a continuous, infinite process. As long as there is entry and/or alteration of master data, the MDM process will exist. Including, feeding back data that has already passed through the process.

Will MDM replace ETL? I don’t think so. ETL is more about transporting any kind of data; MDM is about improving data quality.

ETL can transform data, but the simple ETL process does not concern itself with the quality of data delivered to the destination. MDM is a process that uses ETL tools to improve data quality through enrichment databases.

ETL is a tool in the MDM process, a small part of a complex process involving tools, people, and processes.

I led the first MDM project in 2008. And there have been almost six completed projects since then. It’s not possible to segment, as each client belonged to a different segment. However, if we look at the graph, we will see the largest global consumers of MDM:

The demand for MDM is still incipient in Brazil. There are no trained professionals, and most solo attempts result in irreparable losses, both financial and reputational, for companies. These ill-fated attempts to implement MDM with a “just-go-for-it” approach have contributed to tarnishing the image of the involved teams and damaging the concept of MDM, creating true “Frankensteins” that verge on uselessness.

MDM is a complex process. Often expensive. It can be applied in any segment, in any size company or government. However, only medium-sized or larger companies will be able to implement a successful MDM and achieve returns from it.

Schedule a meeting here

Visit our Blog

Learn more about databases

Learn about monitoring with advanced tools

ETL

Have questions about our services? Visit our FAQ

Want to see how we’ve helped other companies? Check out what our clients say in these testimonials!

Discover the History of HTI Tecnologia

Compartilhar:

Quem somos

Somos especialistas em otimizar e proteger bancos de dados. Evoluímos para oferecer as soluções mais inovadoras em consultoria, monitoramento e sustentação de sistemas de dados. Nosso compromisso é garantir que sua infraestrutura tecnológica proporcione vantagens estratégicas.

Acesse a página

Compromisso com a Inovação

Foco na Segurança

Parcerias Estratégicas

Soluções Personalizadas

Expertise Comprovada

Nossa História

Mergulhe na nossa jornada, conheça os marcos que definiram nosso caminho e descubra como nos tornamos líderes em tecnologia da informação.

Acesse a página

Blog

Nossos especialistas compartilham estratégias e práticas recomendadas para otimizar a gestão dos seus dados, garantindo segurança, eficiência e inovação.

Acesse a página

FAQ

Tem dúvidas sobre nossos serviços? Confira nossa seção de Perguntas Frequentes para obter respostas detalhadas sobre sustentação, monitoramento e consultoria de banco de dados.

Acesse a página

Sustentação

Monitoramento

Consultoria

Testemunhos

Nossos clientes destacam nossa dedicação, expertise e a qualidade das soluções oferecidas, reforçando nosso compromisso com a excelência e a inovação.

Acesse a página