pandas read_sql_query带有与多个列匹配的参数

r1zk6ea1 于 2022-11-27 发布在其他

关注(0)|答案(2)|浏览(124)

我正在尝试使用pandas.read_sql_query查询一个表，其中我想将多个列与作为param参数传入的python列表进行匹配。在尝试完成此操作时遇到各种psycopg2错误。
理想情况下，我会提供一个可重复的示例，但不幸的是，由于SQL连接要求，这在这里是不可能的。如果有某种方法可以提供一个可重复的示例，请告诉我，我将编辑下面的代码。假设col1的条目是字符串，col2的条目是数字值。
请注意，我试图确保col1和col2的每一行都与list1和list2的相应组合相匹配，因此不可能为每一行（即where col1 = any(%(list1)s) and col2 = any(%(list2)s)）执行单独的where子句。
首先，我尝试将列表作为单独的参数传递，然后在SQL查询中将它们组合成一个数组：

import pandas as pd
list1 = ['a', 'b', 'c']
list2 = [1,2,3]

pd.read_sql_query(
    """
    select * from table
    where (col1, col2) = any(array[(%(list1)s, %(list2)s)])
    """,
    con = conn,
    params = {'list1': list1, 'list2':list2}

)

当我尝试这个时，我得到Datatype Mismatch: cannot compare dissimilar columns of type text and text[] at column 1。
还尝试了以下变体，其中我将一个列表列表传递给param：

pd.read_sql_query(
    """
    select * from table
    where (col1, col2) = any(%(arr)s)
    """,
    con = conn,
    params = {'arr': [[x,y] for x,y in zip(list1,list2)]}

)

在这里，我得到了DataError: (psycopg2.errors.InvalidTextRepresentation) in valid input syntax for integer: "a"。
尝试了上面的一些其他小的变体，但是每次尝试都抛出某种错误。那么，为了完成这个任务需要什么语法呢？

**编辑：**包括一个可重现的示例：

import numpy as np
import pandas as pd
from sqlalchemy import create_engine

list1 = ["a", "b", "c"]
list2 = [1, 2, 3]
n = 100

engine = create_engine(
    "postgresql+psycopg2://postgres:password@localhost:5432/database"
)

np.random.seed(2022)
df = pd.DataFrame(
    {
        "col1": np.random.choice(list1, n, replace=True),
        "col2": np.random.choice(list2, n, replace=True),
    }
)

# write table to database
df.to_sql("toy_table", engine, if_exists="replace", index=False)

# query with where any
df_query = pd.read_sql_query(
    """
    select * from toy_table
    where col1 = any(%(list1)s) and col2=any(%(list2)s)
    """,
    con=engine,
    params={"list1": list1, "list2": list2},
)

# expected output
rows = [(x, y) for x, y in zip(list1, list2)]
df_expected = df.loc[df.apply(lambda x: tuple(x.values) in rows, axis=1)]

# Throws assertion error
assert df_expected.equals(df_query)

pandas

来源：https://stackoverflow.com/questions/74575599/pandas-read-sql-query-with-params-matching-multiple-columns

2条答案

按热度按时间

v8wbuo2f1#

要对精确的对进行比较，可以将数组转换为字典，然后再转换为JSON，利用PostgreSQL JSON函数，如下所示：

#combine lists into a dictionary then convert to json
json1 = json.dumps(dict(zip(list1, list2)))

则查询请求应该被

df = pd.read_sql_query(
            """
            select * 
            from "table"
            where concat('{"',col1,'":',col2,'}')::jsonb <@ %s::jsonb
            """,
            con = conn,
            params = (json1,)
        )

使用python 3.10.6进行测试

赞(0）回复(0）举报 2022-11-27

exdqitrt2#

找到了一个适用于任意数量输入列表的解决方案：

df_query = pd.read_sql_query(
    """
    select * from toy_table
    join (select unnest(%(list1)s) as list1, unnest(%(list2)s) as list2) as tmp
    on col1 = list1 and col2 = list2
    """,
    con=engine,
    params={"list1": list1, "list2": list2},
)

# expected output
rows = [(x, y) for x, y in zip(list1, list2)]
df_expected = df.loc[df.apply(lambda x: tuple(x.values) in rows, axis=1)]

assert df_query.equals(df_expected)

赞(0）回复(0）举报 2022-11-27

我来回答

pandas read_sql_query带有与多个列匹配的参数

2条答案

相关问题

热门标签

最新问答