如何使用python分析sql查询?

iezvtpos  于 2021-08-01  发布在  Java
关注(0)|答案(1)|浏览(402)

我想分析sql查询以知道哪些硬编码值属于哪个列?例如,我有一个query:-

SELECT *
FROM (
      SELECT DISTINCT id 
              ,substring([data], 0, 497) AS [Instructions]
              ,'500' AS [Care_Code]
              ,cast(id AS VARCHAR) + cast(number AS VARCHAR) + 'pp' AS key
      FROM people
      WHERE ([data] LIKE '%communicated %')

      UNION ALL

      SELECT DISTINCT Patientid
              ,substring(pp, 0, 497) AS [Instructions]
              ,'500' AS [Care_Code]
              ,cast(id AS VARCHAR) + cast(number AS VARCHAR) + 'aa' AS key
      FROM people
      WHERE  Instructions LIKE '%[A-Z]%'

我希望输出是like:-

Harcoded_value    Column_Name
500               Care_Code
%communicated %   data
%[A-Z]%           Instructions

例2:-
查询:-

select distinct eid, count(distinct d.pid)   
 from SOAP s      
inner join demographics d on s.pid=d.pid     
inner join PS p on p.providerId=s.pid     
where      
p.npi in ('1316987761','1437366473','1912915638','1740253822')    
and Convert(datetime,Convert(varchar,EncounterDate,101)) >='08/01/2016'    
and  Convert(datetime,Convert(varchar,EncounterDate,101)) <= '07/31/2017'   
group by eid

预期output:-

Harcoded_value                                              Column
  ('1316987761','1437366473','1912915638','1740253822')       p.npi
ttcibm8c

ttcibm8c1#

你可以试试

import re
import pandas as pd

s = """SELECT *
FROM (
      SELECT DISTINCT id 
              ,substring([data], 0, 497) AS [Instructions]
              ,'500' AS [Care_Code]
              ,cast(id AS VARCHAR) + cast(number AS VARCHAR) + 'pp' AS key
      FROM people
      WHERE ([data] LIKE '%communicated %')

      UNION ALL

      SELECT DISTINCT Patientid
              ,substring(pp, 0, 497) AS [Instructions]
              ,'500' AS [Care_Code]
              ,cast(id AS VARCHAR) + cast(number AS VARCHAR) + 'aa' AS key
      FROM people
      WHERE  Instructions LIKE '%[A-Z]%'
      and p.npi in ('1316987761','1437366473','1912915638','1740253822')
           """

results = {}
for value in re.findall(r"(([A-Za-z.]+ in )*(((\[.*\]|\w*) LIKE )*\(*'%*.+%*'\)*( AS (\w|\[.*\])*)*))", s):
    splited_values = value[0].split(" ")
    val = "".join(splited_values[2:])
    if "AS" in value[0] and splited_values[2] != "key":
        results[re.sub("\'|\"", "", splited_values[0])] = re.sub(r"\W", "", val)
    elif "LIKE" in value[0] or "in" in value[0]:
        val = val[:-1] if val[-1] == ")" and val[0] != "(" else val
        results[re.sub("\'|\"", "", val)] = re.sub(r"\[|\]", "", splited_values[0])

df = pd.DataFrame(results.items(), columns=["Harcoded_value", "Column_Name"])
print(df)

输出

Harcoded_value       Column_Name
                                            500         Care_Code
                                  communicated%              data
                                        %[A-Z]%       nstructions
   (1316987761,1437366473,1912915638,1740253822)            p.npi

此代码将从查询中提取包含“”的所有值,以及like之前或as之后的单词,而不是名称 key 并将其存储在结果字典中。
在收集了所有的值之后,它将创建一个包含列的Dataframe "Harcoded_value" 以及 "Column_Name"

相关问题