python 列表解析中的多个条件

w1jd8yoj  于 2023-02-15  发布在  Python
关注(0)|答案(4)|浏览(138)

我有一个嵌套字典的列表,如下所示:

messages_all = [{'type': 'message',
      'subtype': 'bot_message',
      'text': "This content can't be displayed.",
      'ts': '1573358255.000100',
      'username': 'Userform',
      'icons': {'image_30': 'www.example.com'},
      'bot_id': 'JOD4K22SJW',
      'blocks': [{'type': 'section',
        'block_id': 'yCKUB',
        'text': {'type': 'mrkdwn',
         'text': 'Your *survey* has a new response.',
         'verbatim': False}},
       {'type': 'section',
        'block_id': '37Mt4',
        'text': {'type': 'mrkdwn',
         'text': '*Thanks for your response. Where did you first hear about us?*\nFriend',
         'verbatim': False}},
       {'type': 'section',
        'block_id': 'hqps2',
        'text': {'type': 'mrkdwn',
         'text': '*How would you rate your experience?*\n9',
         'verbatim': False}},
       {'type': 'section',
        'block_id': 'rvi',
        'text': {'type': 'mrkdwn', 'text': '*city*\nNew York', 'verbatim': False}},
       {'type': 'section',
        'block_id': 'q=L+',
        'text': {'type': 'mrkdwn',
         'text': '*order_id*\n123456',
         'verbatim': False}}]},

{'type': 'message',
  'subtype': 'channel_join',
  'ts': '1650897290.290259',
  'user': 'T01CTZE4MB6',
  'text': '<@U03CTDZ4MA6> has joined the channel',
  'inviter': 'A033AHJCK'},

{'type': 'message',
  'subtype': 'channel_leave',
  'ts': '1650899175.290259',
  'user': 'T01CTZE4MB6',
  'text': '<@U03CTDZ4MA6> has left the channel',
  'inviter': 'A033AHJCK'},

{'client_msg_id': '123456jk-a19c-97fe-35c9-3c9f643cae19',
  'type': 'message',
  'text': '<@ABC973RJD>',
  'user': 'UM1922AJG',
  'ts': '1573323860.000300',
  'team': 'B09AJR39A',
  'reactions': [{'name': '+1', 'users': ['UM1927AJG'], 'count': 1}]},

{'client_msg_id': '1234CAC1-FEC8-4F25-8CE5-C135B7FJB2E',
  'type': 'message',
  'text': '<@UM1922AJG> ',
  'user': 'UM1922AJG',
  'ts': '1573791416.000200',
  'team': 'AJCR23H',
  'thread_ts': '1573791416.000200',
  'reply_count': 3,
  'reply_users_count': 2,
  'latest_reply': '1573829538.002000',
  'reply_users': ['UM3HRC74J', 'UM1922AJG'],
  'is_locked': False,
  'subscribed': False}

]

我希望能够使用以下内容过滤掉词典

client_msg_id
channel_join
channel_leave
reply_users_count

这样做的代码是:

filtered_messages = [elem for elem in messages_all if not elem.get('client_msg_id')
                     or (elem.get('type') == 'message' and elem.get('subtype') == 'channel_join') 
                     or (elem.get('type') == 'message' and elem.get('subtype') == 'channel_leave')
                     or (elem.get('type') == 'message' and elem.get('reply_users_count') == 2)
                ]

从测试来看,似乎只有client_msg_id被过滤掉了,其他的都没有。
有人能帮助我理解这个列表解析的语法吗?

8cdiaqws

8cdiaqws1#

IIUC,你只是缺少了圆括号来否定所有条件的并集:

filtered_messages = [elem for elem in messages_all if not (elem.get('client_msg_id')
                     or (elem.get('type') == 'message' and elem.get('subtype') == 'channel_join') 
                     or (elem.get('type') == 'message' and elem.get('subtype') == 'channel_leave')
                     or (elem.get('type') == 'message' and elem.get('reply_users_count') == 2))
                ]

这将只保留示例中输入的第一个元素。
输出:

[{'type': 'message', 'subtype': 'bot_message', 'text': "This content can't be displayed.", 'ts': '1573358255.000100', 'username': 'Userform', 'icons': {'image_30': 'www.example.com'}, 'bot_id': 'JOD4K22SJW', 'blocks': [{'type': 'section', 'block_id': 'yCKUB', 'text': {'type': 'mrkdwn', 'text': 'Your *survey* has a new response.', 'verbatim': False}}, {'type': 'section', 'block_id': '37Mt4', 'text': {'type': 'mrkdwn', 'text': '*Thanks for your response. Where did you first hear about us?*\nFriend', 'verbatim': False}}, {'type': 'section', 'block_id': 'hqps2', 'text': {'type': 'mrkdwn', 'text': '*How would you rate your experience?*\n9', 'verbatim': False}}, {'type': 'section', 'block_id': 'rvi', 'text': {'type': 'mrkdwn', 'text': '*city*\nNew York', 'verbatim': False}}, {'type': 'section', 'block_id': 'q=L+', 'text': {'type': 'mrkdwn', 'text': '*order_id*\n123456', 'verbatim': False}}]}
]
dfuffjeb

dfuffjeb2#

考虑到生成的listcomp的长度,我将改为如下形式:

def filterdict(d):
    subtypes = {"channel_join", "channel_leave"}
    return any(
        test(d)
        for test in (
            lambda d: d["type"] == "message" and d.get("subtype") in subtypes,
            lambda d: d["type"] == "message" and d.get("reply_user_count") == 2,
            lambda d: d.get("client_msg_id"),
        )
    )

msgs = [x for x in messages_all if not filterdict(x)]

在此表单中:

  • 我们有一个过滤器fn,它为感兴趣的消息返回False,因此我们可以在本地将其与itertools.filterfalse一起使用
  • 这些条件都列明了
  • lambdas和all的使用确保了测试的封装---一个放错的括号不会导致激发这个问题的那种问题
  • 我们将两个功能相同的测试封装在一个成员资格测试中,这样更清晰,更容易阅读。

一个人是否喜欢这种东西最终将是一个品味的问题。

jvlzgdj9

jvlzgdj93#

我发现get方法比检查键是否在字典中要慢得多,所以如果你有大数据,检查字典中现有的键会更快:

filtered_messages = [elem for elem in messages_all
                     if "client_msg_id" not in elem
                     and not ("type" in elem
                              and not ('subtype' in elem
                                       and not (elem['subtype'] in ['channel_join', 'channel_leave']
                                                or ('reply_users_count' in elem
                                                    and elem['reply_users_count'] == 2))))]
lnvxswe2

lnvxswe24#

就像@mozway说的,有些括号根本就没了。
我个人会更进一步,为这么大的if条件创建一个函数:

def my_filter(elem):
    if not (elem.get('client_msg_id') 
      or (elem.get('type') == 'message' and elem.get('subtype') == 'channel_join') 
      or (elem.get('type') == 'message' and elem.get('subtype') == 'channel_leave') 
      or (elem.get('type') == 'message' and elem.get('reply_users_count') == 2)):
      return True
    return False

filtered_messages = [elem for elem in messages_all if my_filter(elem)]

编辑:删除额外的布尔变量

相关问题