因此,我尝试优化给定的SQL查询,如下所示:
SELECT EMPLOYEE.EMPNO, POSITION
FROM EMPLOYEE E, JOBHISTORY J
WHERE E. EMPNO = J. EMPNO
AND STARTDATE <= ENDDATE
AND SALARY <= 3000;
在这里,我想删除开始日期和结束日期,以便用户只需按如下所示键入,然后即可获得结果:
SELECT EMPLOYEE.EMPNO, POSITION
FROM EMPLOYEE E, JOBHISTORY J
WHERE E. EMPNO = J. EMPNO
AND SALARY <= 3000;
我最初开始用Python编写自己的解析器,但真的被难住了,决定四处看看,找出了“sqlparse”库,在库中他们有函数get_names()是令牌函数的一部分,但是在我的代码中实现它时,我一直收到错误AttributeError 'Token'对象没有属性'get_name',所以我不确定我做错了什么。下面是我的Python代码:
import sqlparse
import networkx as nx
import matplotlib.pyplot as plt
# example query
query = "SELECT EMPLOYEE.EMPNO, POSITION FROM EMPLOYEE E, JOBHISTORY J WHERE E.EMPNO = J.EMPNO AND STARTDATE <= ENDDATE AND SALARY <= 3000"
# parse the query
parsed_query = sqlparse.parse(query)[0]
select_stmt = parsed_query.tokens[0]
# extract the tables and conditions from the query
tables = []
conditions = []
for token in parsed_query.tokens:
if isinstance(token, sqlparse.sql.IdentifierList):
for T in token.get_identifiers():
tables.append(T.get_name())
elif isinstance(token, sqlparse.sql.Where):
for condition in token.tokens:
if isinstance(condition, sqlparse.sql.Comparison):
conditions.append(condition)
# remove unnecessary conditions
new_conditions = []
for condition in conditions:
if "startdate" not in condition.normalized:
new_conditions.append(condition)
conditions = new_conditions
# generate query tree
G = nx.Graph()
for table in tables:
G.add_node(table)
for condition in conditions:
table1 = condition.left.get_name()
table2 = condition.right.get_name()
G.add_edge(table1, table2)
# visualize query tree
nx.draw(G, with_labels=True)
plt.show()
# generate optimized query
new_query = select_stmt.to_unicode()
new_query += " FROM " + ", ".join(tables)
new_query += " WHERE " + " AND ".join([str(condition) for condition in conditions])
print(new_query)
1条答案
按热度按时间sg24os4d1#
因为Token没有get_name属性,所以要做一个判断,比如POSITION是sqlparse.sql.Token而不是sqlparse.sql.Identifier
条件。左条件。右条件也是出于类似的原因