我有一个巨大的数据集(104259行)和列中的某个地方 game
,有一个或多个值具有多个 " - "
我正试图将这些列拆分为。
我的示例 Dataframe 是:
df:
+-----+--------------+----------------+-------------+--------+---------------------------------------+----------+-------------+-------------+-------------+-----------+------------------------------+
| | Unnamed: 0 | Unnamed: 0.1 | date | time | game | score | home_odds | draw_odds | away_odds | country | league |
+=====+==============+================+=============+========+=======================================+==========+=============+=============+=============+===========+==============================+
| 0 | 0 | 0 | nan | 15:30 | Iliria Kruja - Cerrik | 0:3 | - | - | - | Albania | First Division |
+-----+--------------+----------------+-------------+--------+---------------------------------------+----------+-------------+-------------+-------------+-----------+------------------------------+
| 1 | 1 | 1 | 25 Jul 2020 | 15:30 | Elbasani - Devolli | 3:1 | - | - | - | Albania | First Division |
+-----+--------------+----------------+-------------+--------+---------------------------------------+----------+-------------+-------------+-------------+-----------+------------------------------+
| 2 | 2 | 2 | 11 Jul 2020 | 15:30 | Beselidhja Lezha - Kastrioti | 2:0 | 1.46 | 3.80 | 6.40 | Albania | First Division |
+-----+--------------+----------------+-------------+--------+---------------------------------------+----------+-------------+-------------+-------------+-----------+------------------------------+
| 3 | 3 | 3 | 05 Jul 2020 | 15:30 | Lushnja - Apolonia Fier | 1:2 | 2.39 | 3.56 | 2.44 | Albania | First Division |
+-----+--------------+----------------+-------------+--------+---------------------------------------+----------+-------------+-------------+-------------+-----------+------------------------------+
当我运行这部分代码时:
df[['home_team', 'away_team']] = df['game'].str.split(' - ', expand=True)
我得到这个错误:
Traceback (most recent call last):
File "C:/Users/harsh/AppData/Roaming/JetBrains/PyCharmCE2021.1/scratches/scratch_37.py", line 22, in <module>
df[['home_team', 'away_team']] = df['game'].str.split(' - ', expand=True)
File "C:\Python\lib\site-packages\pandas\core\frame.py", line 3160, in __setitem__
self._setitem_array(key, value)
File "C:\Python\lib\site-packages\pandas\core\frame.py", line 3189, in _setitem_array
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
而我怀疑是否有一行或多行 str.split
但是,我不确定是哪一行。
现在,我可以选择:
如果有较少的行包含此类数据(少于10),我可以安全地选择删除或删除它们
如果有超过10行,我可以在第一次遇到此分隔符时拆分它们。
我只是不知道如何在代码方面做到这一点。
我如何检查和处理这个问题?
1条答案
按热度按时间zlhcx6iw1#
可能您的某些行不止一行
" - "
一串如果您对结构非常确定,并且希望在第一次遇到分隔符时分割,请使用以下命令n
参数,如下所示:这里有文档。