Pandas截断numpy列表中的字符串

7kqas0il 于 2023-03-23 发布在其他

关注(0)|答案(2)|浏览(132)

考虑以下最小示例：

@dataclass
class ExportEngine:

    def __post_init__(self):
        self.list = pandas.DataFrame(columns=list(MyObject.CSVHeaders()))

    def export(self):
        self.prepare()
        self.list.to_csv("~/Desktop/test.csv")

    def prepare(self):
        values = numpy.concatenate(
            (
                numpy.array(["Col1Value", "Col2Value", " Col3Value", "Col4Value"]),
                numpy.repeat("", 24),
            )
        )
        for x in range(8): #not the best way, but done due to other constraints
            start = 3 + (x * 3) - 2
            end = start + 3
            values[start:end] = [
                "123",
                "some_random_value_that_gets_truncated",
                "456",
            ]
        self.list.loc[len(self.list)] = values

当调用export()时，some_random_value_that_gets_truncated被截断为some_rando：

['Col1Value', '123', 'some_rando', '456', '123', 'some_rando', '456', '123', 'some_rando', '456', '123', 'some_rando', '456', '123', ...]

我尝试设置以下内容：
pandas.set_option("display.max_colwidth", 10000)，但这并没有改变任何东西...
为什么会发生这种情况，如何防止截断？

numpy

来源：https://stackoverflow.com/questions/75806509/pandas-truncates-strings-in-numpy-list

2条答案

按热度按时间

3qpi33ja1#

因此，numpy将默认选择合适的固定长度的unicode格式。
请注意dtype：

In [1]: import numpy

In [2]: values = numpy.concatenate(
   ...:     (
   ...:         numpy.array(["Col1Value", "Col2Value", " Col3Value", "Col4Value"]),
   ...:         numpy.repeat("", 24),
   ...:     )
   ...: )

In [3]: values
Out[3]:
array(['Col1Value', 'Col2Value', ' Col3Value', 'Col4Value', '', '', '',
       '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
       '', '', '', ''], dtype='<U10')

你可能不应该直接使用numpy，但一个快速解决方案是替换：

values = numpy.concatenate(
    (
        numpy.array(["Col1Value", "Col2Value", " Col3Value", "Col4Value"]),
        numpy.repeat("", 24),
    )
)

与：

values = np.array(
    ['Col1Value', 'Col2Value', ' Col3Value', 'Col4Value', *[""]*24], 
    dtype=object
)

注意dtype=object，它只使用指向pythonstr对象的指针，因此对字符串的长度没有限制

赞(0）回复(0）举报 2023-03-23

pdsfdshx2#

这里有一个代码的替代方案，它不依赖于numpy（从而避免了numpy的固定宽度unicode字符串类型）：

@dataclass
class ExportEngine:

    def __post_init__(self):
        # changed in this example to use numeric column labels equal in number:
        #self.list = pd.DataFrame(columns=list(MyObject.CSVHeaders()))
        self.list = pd.DataFrame(columns=range(4 + 24))

    def export(self):
        self.prepare()
        self.list.to_csv("~/Desktop/testXYZ.csv")

    def prepare(self):
        values = ["Col1Value", "Col2Value", " Col3Value", "Col4Value"] + [""] * 24
        values[1:1 + 24] = [
                "123",
                "some_random_value_that_gets_truncated",
                "456"
            ] * 8
        self.list.loc[0] = values

赞(0）回复(0）举报 2023-03-23

我来回答

Pandas截断numpy列表中的字符串

2条答案

相关问题

热门标签

最新问答