pandas 如何删除非法字符以便数据框可以写入Excel

nimxete2 于 2023-03-28 发布在其他

关注(0)|答案(8)|浏览(381)

我尝试使用ExcelWriter将数据框写入Excel电子表格，但它不断返回错误：

openpyxl.utils.exceptions.IllegalCharacterError

我猜在数据框中有一些ExcelWriter不喜欢的字符。这看起来很奇怪，因为数据框是由三个Excel电子表格组成的，所以我看不出怎么会有Excel不喜欢的字符！
有没有什么方法可以遍历一个 Dataframe ，替换ExcelWriter不喜欢的字符？我甚至不介意它只是删除它们。
从 Dataframe 中删除或替换非法字符的最佳方法是什么？

pandas

来源：https://stackoverflow.com/questions/42306755/how-to-remove-illegal-characters-so-a-dataframe-can-write-to-excel

8条答案

按热度按时间

hk8txs481#

根据苏海鹏的回答，我添加了一个功能：

dataframe = dataframe.applymap(lambda x: x.encode('unicode_escape').
                 decode('utf-8') if isinstance(x, str) else x)

基本上，如果unicode字符存在，它会转义它们。它工作了，我现在可以再次写入Excel电子表格了！

赞(0）回复(0）举报 2023-03-28

klsxnrf12#

同样的问题也发生在我身上，我是这样解决的：
1.安装python包xlsxwriter：

pip install xlsxwriter

1.将默认引擎'openpyxl'替换为'xlsxwriter'：

dataframe.to_excel("file.xlsx", engine='xlsxwriter')

赞(0）回复(0）举报 2023-03-28

ubby3x7f3#

尝试不同的excel作家引擎解决了我的问题。

writer = pd.ExcelWriter('file.xlsx', engine='xlsxwriter')

赞(0）回复(0）举报 2023-03-28

um6iljoc4#

如果您不想安装其他Excel编写器引擎（例如xlsxwriter），您可以尝试通过查找导致IllegalCharacterError错误的模式来删除这些非法字符。
打开/path/to/your/python/site-packages/openpyxl/cell/中的cell.py，查找check_string函数，你会看到它使用了一个定义好的正则表达式模式ILLEGAL_CHARACTERS_RE来查找那些非法字符。试图找到它的定义，你会看到这行：
ILLEGAL_CHARACTERS_RE = re.compile(r'[\000-\010]|[\013-\014]|[\016-\037]')
这一行是你需要删除这些字符。复制这一行到你的程序，并执行下面的代码之前，你的dataframe写入Excel：
dataframe = dataframe.applymap(lambda x: ILLEGAL_CHARACTERS_RE.sub(r'', x) if isinstance(x, str) else x)
上面的一行将删除每个单元格中的这些字符。
但是这些字符的来源可能是一个问题。正如您所说，数据框来自三个Excel电子表格。如果源Excel电子表格包含这些字符，您仍然会面临这个问题。因此，如果您可以控制源电子表格的生成过程，请尝试从那里开始删除这些字符。

赞(0）回复(0）举报 2023-03-28

jm2pwxwz5#

当我把 Dataframe 写到html或者csv的时候，我也遇到了一些奇怪的字符，例如，带重音的字符，我不能写到html文件，所以我需要把这些字符转换成没有重音的字符。
我的方法可能不是最好的，但它帮助我将unicode字符串转换为ascii兼容。

# install unidecode first 
from unidecode import unidecode

def FormatString(s):
if isinstance(s, unicode):
  try:
    s.encode('ascii')
    return s
  except:
    return unidecode(s)
else:
  return s

df2 = df1.applymap(FormatString)

在您的情况下，如果您只是想通过将return unidecode(s)更改为return 'StringYouWantToReplace'来消除非法字符。
希望这能给予我一些想法来处理你的问题。

赞(0）回复(0）举报 2023-03-28

gdx19jrr6#

你可以使用内置的strip()方法来处理python字符串。
对于每个单元格：

text = str(illegal_text).strip()

对于整个 Dataframe ：

dataframe = dataframe.applymap(lambda t: str(t).strip())

赞(0）回复(0）举报 2023-03-28

0sgqnhkj7#

如果你仍然在努力清理字符，这对我来说很有效：

import xlwings as xw
import pandas as pd
df = pd.read_pickle('C:\\Users\\User1\\picked_DataFrame_notWriting.df')
topath = 'C:\\Users\\User1\\tryAgain.xlsx'
wb = xw.Book(topath)
ws = wb.sheets['Data']
ws.range('A1').options(index=False).value = df
wb.save()
wb.close()

赞(0）回复(0）举报 2023-03-28

ru9i0ody8#

我遇到了和你一样的问题。我需要遍历一个文件夹，从.csv复制数据，然后粘贴到一个具有相同文件名的.xlsx中。我需要覆盖数据，而不是创建一个新的选项卡。
try/catch解决了我的IllegalCharacter错误

import os
import pandas as pd
import openpyxl.utils.exceptions as xlerr

# Define the folder path containing the CSV files
csv_folder = (r"C:\folderpath")

# Loop through each CSV file in the folder
for filename in os.listdir(csv_folder):
    if filename.endswith(".csv"):

# Define the output Excel file path
        output_file = 'C:\Temp/'+filename.split('.')[0]+'.xlsx'

        # Read the CSV file into a pandas dataframe
        csv_path = os.path.join(csv_folder, filename)
        df = pd.read_csv(csv_path,encoding='cp1252',header=None)
    
        # Write the selected data to the output Excel file
        try:
            with pd.ExcelWriter(output_file, mode='a', engine='openpyxl', if_sheet_exists='overlay') as writer:
                selected_data.to_excel(writer, sheet_name=filename.split('.')[0]+"", startrow=1, index=False, header=False)
        except xlerr.IllegalCharacterError as e:
            print(f"Error: {e}")

赞(0）回复(0）举报 2023-03-28

我来回答

pandas 如何删除非法字符以便数据框可以写入Excel

8条答案

相关问题

热门标签

最新问答