如何在pandas中对逗号分隔的时间值进行排序?

kyvafyod  于 2023-03-28  发布在  其他
关注(0)|答案(2)|浏览(103)

我有下面的值。我想排序每行这是时间升序方式为2023-02-24,然后2023-02-25
我尝试使用我的代码获得结果,但它生成了如下错误。

df['request_time_list']

//Result looks as below
0     2023-02-25,2023-02-24
1     2023-02-25,2023-02-24
2     2023-02-24,2023-02-25
3     2023-02-24,2023-02-25
4     2023-02-24,2023-02-25
5     2023-02-24,2023-02-25
6     2023-02-25,2023-02-24
7     2023-02-24,2023-02-25
8     2023-02-25,2023-02-24
9     2023-02-25,2023-02-24
10    2023-02-24,2023-02-25



df['request_time_list'].apply(lambda x: ','.join(sorted(x.split(','))))


Fail to execute line 6: df['request_time_list'].apply(lambda x: ','.join(sorted(x.split(','))))
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-6146174346709974932.py", line 380, in <module>
    exec(code, _zcUserQueryNameSpace)
  File "<stdin>", line 6, in <module>
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/series.py", line 4357, in apply
    return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/apply.py", line 1043, in apply
    return self.apply_standard()
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/apply.py", line 1101, in apply_standard
    convert=self.convert_dtype,
  File "pandas/_libs/lib.pyx", line 2859, in pandas._libs.lib.map_infer
  File "<stdin>", line 6, in <lambda>
  File "/usr/local/lib/python3.7/dist-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py", line 6242, in sort_values
    keys = [self._get_label_or_level_values(x, axis=axis) for x in by]
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py", line 6242, in <listcomp>
    keys = [self._get_label_or_level_values(x, axis=axis) for x in by]
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py", line 1779, in _get_label_or_level_values
    raise KeyError(key)
KeyError: '2023-02-25'

如何实现我的问题?
谢谢。

sz81bmfz

sz81bmfz1#

对于2个日期,您可以用途:

dt = df['request_time_list'].str.extract('(?P<dt1>[^,]+),(?P<dt2>.+)')
df['request_time_list'] = dt.min(axis=1) + ',' + dt.max(axis=1)
print(df)

# Output
        request_time_list
0   2023-02-24,2023-02-25
1   2023-02-24,2023-02-25
2   2023-02-24,2023-02-25
3   2023-02-24,2023-02-25
4   2023-02-24,2023-02-25
5   2023-02-24,2023-02-25
6   2023-02-24,2023-02-25
7   2023-02-24,2023-02-25
8   2023-02-24,2023-02-25
9   2023-02-24,2023-02-25
10  2023-02-24,2023-02-25

上面的代码避免了apply方法。但是,您的代码可以完美地与您的示例一起使用:

>>> df['request_time_list'].apply(lambda x: ','.join(sorted(x.split(','))))

0     2023-02-24,2023-02-25
1     2023-02-24,2023-02-25
2     2023-02-24,2023-02-25
3     2023-02-24,2023-02-25
4     2023-02-24,2023-02-25
5     2023-02-24,2023-02-25
6     2023-02-24,2023-02-25
7     2023-02-24,2023-02-25
8     2023-02-24,2023-02-25
9     2023-02-24,2023-02-25
10    2023-02-24,2023-02-25
Name: request_time_list, dtype: object
wpx232ag

wpx232ag2#

您也可以使用numpy来重新排序子字符串:

df['request_time_list'] = (
 pd.DataFrame(np.sort(df['request_time_list'].str.split(',', expand=True), axis=1),
              index=df.index).agg(','.join, axis=1)
)

输出:

request_time_list
0   2023-02-24,2023-02-25
1   2023-02-24,2023-02-25
2   2023-02-24,2023-02-25
3   2023-02-24,2023-02-25
4   2023-02-24,2023-02-25
5   2023-02-24,2023-02-25
6   2023-02-24,2023-02-25
7   2023-02-24,2023-02-25
8   2023-02-24,2023-02-25
9   2023-02-24,2023-02-25
10  2023-02-24,2023-02-25

相关问题