我想使用SqlAlchemy ORM将整个数据库表加载到Pandas DataFrame中。我已经成功地查询了表中的行数,如下所示:
from local_modules import RemoteConnector
from sqlalchemy import Integer, Column
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.automap import automap_base
import pandas as pd
Base = automap_base()
class Calculations(Base):
__tablename__ = "calculations"
id = Column("ID", Integer, primary_key=True)
Base.prepare()
connection = RemoteConnector('server', 'calculations_database')
connection.connect()
Session = sessionmaker(bind=connection.engine)
session = Session()
result = session.query(Calculations).count()
print('Record count:', result)
输出:
Record count: 13915
Process finished with exit code 0
如果可能的话,我想使用sqlalchemy. ext. automap中的automap_base来定义表,而不必手动声明每一列。我使用'id'这样做是因为我有一个错误,要求我设置一个主键(有没有更好的方法来做到这一点?).
为了得到任何结果,我已经能够做到以下几点:
results = session.query(Calculations).all()
输出:
[<__main__.Calculations object at 0x000001AF2324F510>, <__main__.Calculations object at 0x000001AF2324F6D0>, <__main__.Calculations object at 0x000001AF2324F810>, <__main__.Calculations object at 0x000001AF2324F910>, <__main__.Calculations object at 0x000001AF2324FA50>, <__main__.Calculations object at 0x000001AF2324FB90>, <__main__.Calculations object at 0x000001AF2324FCD0>, <__main__.Calculations object at 0x000001AF2324FE10>, <__main__.Calculations object at 0x000001AF2324FF50>, <__main__.Calculations object at 0x000001AF22CD40D0>, <__main__.Calculations object at 0x000001AF22CD4210>, <__main__.Calculations object at 0x000001AF22CD4350>, <__main__.Calculations object at 0x000001AF22CD4490>, <__main__.Calculations object at 0x000001AF22CD45D0>, <__main__.Calculations object at 0x000001AF22CD4710>, <__main__.Calculations object at 0x000001AF22CD4850>, <__main__.Calculations object at 0x000001AF22CD4990>, <__main__.Calculations object at 0x000001AF22CD4AD0>, <__main__.Calculations object at 0x000001AF22CD4C10>, <__main__.Calculations object at 0x000001AF22CD4D50>, <__main__.Calculations object at 0x000001AF22CD4E90>, <__main__.Calculations object at 0x000001AF22CD4FD0>, <__main__.Calculations object at 0x000001AF22CD5110>, <__main__.Calculations object at 0x000001AF22CD5250>, <__main__.Calculations object at 0x000001AF22CD53D0>, <__main__.Calculations object at 0x000001AF22CD5510>, <__main__.Calculations object at 0x000001AF22CD5650>, <__main__.Calculations object at 0x000001AF22CD5790>, <__main__.Calculations object at 0x000001AF22CD58D0>, <__main__.Calculations object at 0x000001AF22CD5A10>, <__main__.Calculations object at 0x000001AF22CD5B50>, <__main__.Calculations object at 0x000001AF22CD5C90>, <__main__.Calculations object at 0x000001AF22CD5DD0>, <__main__.Calculations object at 0x000001AF22CD5F10>, <__main__.Calculations object at 0x000001AF22CD6050>, <__main__.Calculations object at 0x000001AF22CD6190>, <__main__.Calculations object at 0x000001AF22CD62D0>, <__main__.Calculations object at 0x000001AF22CD6410>, <__main__.Calculations object at 0x000001AF22CD6550>, <__main__.Calculations object at 0x000001AF22CD6690>, <__main__.Calculations object at 0x000001AF22CD67D0>, <__main__.Calculations object at 0x000001AF22CD6910>, <__main__.Calculations object at 0x000001AF22CD6A50>, <__main__.Calculations object at 0x000001AF22CD6B90>, <__main__.Calculations object at 0x000001AF22CD6CD0>, <__main__.Calculations object at 0x000001AF22CD6E10>, <__main__.Calculations object at 0x000001AF22CD6F50>, <__main__.Calculations object at 0x000001AF22CD7090>]
这将表中的所有列显示为一个对象。我提取这些值的最佳尝试是:
for result in results:
print(result.__dict__)
输出:
{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x00000232E0A91730>, 'id': 1.0}
{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x00000232E0A90E90>, 'id': 2.0} ... and so on
我不仅没有得到值,而且它不打印列,只打印我在类中定义的ID。我以为当我执行automap_base时,它会自动传输。当我定义它们时,它们会出现,像这样:
class Calculations(Base):
__tablename__ = "Calculations"
id = Column("Trade ID", Integer, primary_key=True)
Amount = Column("Amount", Integer)
Yield = Column("Yield", Integer)
输出:
{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x000001BFD2092090>, 'Amount': 34303.0, 'Yield': 0.01141, 'id': 1.0}
{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x000001BFD2091010>, 'Amount': 10000.0, 'Yield': 0.01214, 'id': 2.0}
{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x000001BFD2090FB0>, 'Amount': 43515.0, 'Yield': 0.01206, 'id': 3.0}
... and so on
我最终想做的是SQLAlchemy ORM conversion to pandas DataFrame中建议的事情:
df = pd.read_sql_query(sql=session.query(Calculation).all(), con=connection.engine)
但我得到以下错误:
raise exc.ObjectNotExecutableError(statement) from err
sqlalchemy.exc.ObjectNotExecutableError: Not an executable object: [<__main__.CALC_TFSB_INVESTMENTS object at 0x000001FF42966E50>, ... an so on
我也试过:
df = pd.read_sql_query(sql=select(Calculations), con=connection.engine)
print(df.head())
如何加载DataFrame?我如何自动化模式检测,我想是使用automap_base?我该如何改进我的代码,还有其他东西我可以添加,也许dunder字段来使事情变得更好?
2条答案
按热度按时间k10s72fa1#
如果要从一个或多个表中加载所有记录,可以使用
read_sql_table
而不是read_sql_query
:如果要从数据库加载所有表:
如果只想从数据库中加载一个表:
smdncfj32#
答案是
这就管用了
Corralien的回答要详细得多。