pandas simple_salesforce python中的父子关系查询,从有序字典中提取

plicqrtu  于 2023-05-05  发布在  Python
关注(0)|答案(3)|浏览(144)

我尝试使用python中的simple_salesforce包从salesforce查询信息。
问题是,它是将作为父子关系一部分的字段嵌套到有序字典中
我想从Opportunity对象中找到与该记录关联的id和accountid。
SOQL查询可能看起来像..

query = "select id, account.id from opportunity where closedate = last_n_days:5"

在SOQL(Salesforce对象查询语言)中,点表示数据库中的父子关系。所以我尝试从机会对象中获取id,然后从该记录上的帐户对象中获取相关id。
出于某种原因,Id可以很好地使用,但是account.id嵌套在有序dict中的有序dict中:

q = sf.query_all(query)

这拉回了一个有序的字典。

OrderedDict([('totalSize', 455),
             ('done', True),
             ('records',
              [OrderedDict([('attributes',
                             OrderedDict([('type', 'Opportunity'),
                                          ('url',

我将提取ordereddictrecords部分以创建df

df = pd.DataFrame(q['records'])

这给了我3列,一个有序的dict称为'attributes'Id和另一个有序的dict称为'Account'。我正在寻找一种方法,从嵌套的有序dict 'Account'中提取('BillingCountry', 'United States')片段

[OrderedDict([('attributes',
               OrderedDict([('type', 'Opportunity'),
                            ('url',
                             '/services/data/v34.0/sobjects/Opportunity/0061B003451RhZgiHHF')])),
              ('Id', '0061B003451RhZgiHHF'),
              ('Account',
               OrderedDict([('attributes',
                             OrderedDict([('type', 'Account'),
                                          ('url',
                                           '/services/data/v34.0/sobjects/Account/001304300MviPPF3Z')])),
                            ('BillingCountry', 'United States')]))])

编辑:澄清我在寻找什么。
我想以一个 Dataframe 结束,其中每个查询字段都有一列。
当我使用df = pd.DataFrame(sf.query_all(query)['records'])'records'片段放入DataFrame时,它会给我:

attributes  Id  Account
OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B003451RhZgiHHF')])    0061B003451RhZgiHHF OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0013000000MvkRQQAZ')])), ('BillingCountry', 'United States')])
OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B00001Pa52QQAR')]) 0061B00001Pa52QQAR  OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0011300001vQPxqAAG')])), ('BillingCountry', 'United States')])
OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B00001TRu5mQAD')]) 0061B00001TRu5mQAD  OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0011300001rfRTrAAE')])), ('BillingCountry', 'United States')])

删除'attributes'列后,我希望输出为

Id BillingCountry
0061B003451RhZgiHHF 'United States'
0061B00001Pa52QQAR 'United States'
0061B00001TRu5mQAD 'United States'
5n0oy7gb

5n0oy7gb1#

Pandas是一个用于表格数据的惊人工具。但是,尽管它可以包含Python对象,但这不是它的最佳位置。我建议您在将数据插入pandas.Dataframe之前,先从查询中提取数据:

提取记录:

提取所需字段作为字典列表是简单的:

records = [dict(id=rec['Id'], country=rec['Account']['BillingCountry'])
           for rec in data['records']]

将记录插入数据框:

有了一个dicts列表,一个dataframe就像下面这样简单:

df = pd.DataFrame(records)

测试码:

import pandas as pd
from collections import OrderedDict

data = OrderedDict([
    ('totalSize', 455),
    ('done', True),
    ('records', [
        OrderedDict([
            ('attributes', OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B003451RhZgiHHF')])),
            ('Id', '0061B003451RhZgiHHF'),
            ('Account', OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0013000000MvkRQQAZ')])),
                                     ('BillingCountry', 'United States')])),
        ]),
        OrderedDict([
            ('attributes', OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B00001Pa52QQAR')])),
            ('Id', '0061B00001Pa52QQAR'),
            ('Account', OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0011300001vQPxqAAG')])),
                                     ('BillingCountry', 'United States')])),
        ]),
        OrderedDict([
            ('attributes', OrderedDict([('type', 'Opportunity'), ('url', '/services/data/v34.0/sobjects/Opportunity/0061B00001TRu5mQAD')])),
            ('Id', '0061B00001TRu5mQAD'),
            ('Account', OrderedDict([('attributes', OrderedDict([('type', 'Account'), ('url', '/services/data/v34.0/sobjects/Account/0011300001rfRTrAAE')])),
                                     ('BillingCountry', 'United States')])),
        ]),
    ])
])

records = [dict(id=rec['Id'], country=rec['Account']['BillingCountry'])
           for rec in data['records']]
for r in records:
    print(r)

print(pd.DataFrame(records))

测试结果:

{'country': 'United States', 'id': '0061B003451RhZgiHHF'}
{'country': 'United States', 'id': '0061B00001Pa52QQAR'}
{'country': 'United States', 'id': '0061B00001TRu5mQAD'}

         country                   id
0  United States  0061B003451RhZgiHHF
1  United States   0061B00001Pa52QQAR
2  United States   0061B00001TRu5mQAD
0dxa2lsx

0dxa2lsx2#

Pandas能读有序的字典。

import pandas as pd
from simple_salesforce import Salesforce

sf = Salesforce(username='your_username',   
                password='your_password',
                security_token='your_token')

query = "select id, account.id from opportunity where closedate = last_n_days:5"
df = pd.DataFrame(sf.query_all(query)['records']).drop(columns='attributes')
5m1hhzi4

5m1hhzi43#

对于任何仍然对这个主题感兴趣的人,我推荐burner_process包。它可以递归地取消嵌套,这样你就可以进行多级关系查询(而不仅仅是两级),并得到一个普通的 Dataframe 作为输出。
pip install simpler-sf

import simpler_sf
simpler_sf.simple_salesforce()
import simple_salesforce
sf = simple_salesforce.Salesforce(...)
query = 'SELECT Contact.Id, Account.Name, Campaign FROM CampaignMember'
df = sf.smart_query(query)
print(df)

输出:

Contact.FirstName   Account.Name           Campaign
0   Emily                     Amazon   CampaignA_2023Q2
1   Jasmine                   Amazon   CampaignA_2023Q2
2   Míng                   Microsoft   CampaignB_2022Q4
3   Magdalena                 Google   CampaignC_2023Q1

GitHub repo

相关问题