Objective :
Let's think of a large scale enterprise where we have a heterogeneous data stores such as SQL servers, No-SQL stores, Big data stores like ADL, ADF..etc spreads across different business groups.
Our objective is to build a lineage service at an enterprise level, which will provide insights on the data lineage between different data store instances within an enterprise. This will help us in gaining insights on the data proliferation within the enterprise.
For this, as a phase 1 we want to take SQL server and want to build the lineage within and across SQL servers and It's databases.
Is there a way we can build / extract (if available) the data lineage (table & Column level) from a SQL server by leveraging SQL profiling or transaction logs within or across instances.
I looked at Determining Impact and Data Lineage and looks like the SQL server should have the Dependency Services enabled.
As a enterprise level service, I may have access to the SQL Servers and may not have control over the sql server instances on enabling any services.
Could anyone share any insights or experiences with respect to building or extracting Data Lineage within or across SQL Servers.
Thanks in Advance !
Mydeen
2条答案
按热度按时间68bkxrlz1#
You may export database structure into a SQL file which includes CREATE TABLE/CREATE VIEW/ CREATE PROCEDURE and other SQL statements. Then using a SQL parser to analyze those scripts to get metadata from those SQL scripts and build data lineage of the table/columns.
Take this SQL for example:
You will get the data flow like: deptsal.salary depends on emp.sal, emp.com
yv5phkfx2#
Hello Peer Mohamed Mydeen, Have you ever been considering to use a third-party tool to do that? As far as I know, there are plenty of professional data lineage tools if you just Google "sql data lineage tool". In case of you choose to use these existing data lineage tool, what you need to do is just provide some access to the lineage tool and let it finish the rest of your work. It would be easy and time-saving.
You can check this page for some paid-tool: https://www.dpriver.com/blog/2022/05/11/best-data-lineage-tools/
You can also check this blog for open-source tool: https://www.dpriver.com/blog/2022/07/29/best-open-source-data-lineage-tools/