如何将csv格式的数据重新整形为结构化格式?

qrjkbowd  于 2023-02-01  发布在  其他
关注(0)|答案(2)|浏览(163)

我有一些. csv文件是由计算流体动力学模拟生成的。它们包含空间中给定点的速度、压力、密度等值。对于每个点,其坐标和该点的字段值都打印在csv文件中的一行上。对于x值为1,2,3,y值为4,5,6的2D网格,数据以如下方式排列:

X Y (field variables)
1 4         :
2 4         :
3 4         :
1 5         :
2 5         :
3 5         :
1 6         :
2 6         :
3 6         :

我们从最小的y值开始,循环所有的x值,然后转到下一个y值并重复。
我想做的是将这些数据转换成结构化格式,也就是说,我想将这些数据放入一个使用x和y值作为坐标轴的xarray数据集中,或者将这些值放入一个具有适当形状的numpyndarray中(在本例中为3X3.)。我可以将文件加载到Pandas Dataframe 中,然后使用for循环手动重组数据,但是这对于即使是中等大小的数据文件也是非常慢的。我想要一种更快的方法,使用Pandas,numpy和xarray库中的内置函数。
有人有什么想法吗?

vyswwuz2

vyswwuz21#

我相信这可以通过一个for循环手动完成,该循环只迭代你的状态变量列表(即rho)。

# https://stackoverflow.com/questions/75278985/how-can-i-reshape-data-in-a-csv-into-a-structured-format
import time
start = time.time()

import numpy as np
import pandas as pd

df = pd.read_csv('test_data.csv')

min_x_coor = min(df['x'])
min_y_coor = min(df['y'])

x_dim = max(df['x']) - min_x_coor + 1
y_dim = max(df['y']) - min_y_coor + 1

rho_array = np.zeros((x_dim, y_dim))
for p in range(0, len(df['rho'])):
   x_coor = df['x'][p] - min_x_coor
   y_coor = df['y'][p] - min_y_coor

   rho_array[x_coor][y_coor] = df['rho'][p]

print(rho_array)
print(time.time() - start)

对于3x3数据:

x,y,rho
1,4,0.503
2,4,0.642
3,4,0.041
1,5,0.340
2,5,0.269
3,5,0.288
1,6,0.511
2,6,0.732
3,6,0.195

输出:

[[0.503 0.34  0.511]
 [0.642 0.269 0.732]
 [0.041 0.288 0.195]]
0.31889796257019043

对于4x4数据:

x,y,rho
1,4,0.503
2,4,0.642
3,4,0.041
4,4,0.964
1,5,0.340
2,5,0.269
3,5,0.288
4,5,0.702
1,6,0.511
2,6,0.732
3,6,0.195
4,6,0.226
1,7,0.957
2,7,0.032
3,7,0.304
4,7,0.607

输出:

[[0.503 0.34  0.511 0.957]
 [0.642 0.269 0.732 0.032]
 [0.041 0.288 0.195 0.304]
 [0.964 0.702 0.226 0.607]]
0.48914408683776855
chy5wohz

chy5wohz2#

下面是一个同时使用Xarray和Pandas的示例:

from io import StringIO

import pandas as pd
import xarray as xr

s = StringIO("""x,y,rho
1,4,0.503
2,4,0.642
3,4,0.041
4,4,0.964
1,5,0.340
2,5,0.269
3,5,0.288
4,5,0.702
1,6,0.511
2,6,0.732
3,6,0.195
4,6,0.226
1,7,0.957
2,7,0.032
3,7,0.304
4,7,0.607
""")

# open csv, create a MultiIndex from x/y columns
df = pd.read_csv(s, index_col=['x', 'y'])

# convert to Xarray Dataset
ds = df.to_xarray()

# copy over the multiindex
ds['points'] = df.index

# use ds.unstack to reshape
ds = ds.unstack()

这将生成如下所示的Xarray数据集:

<xarray.Dataset>
Dimensions:  (x: 4, y: 4)
Coordinates:
  * x        (x) int64 1 2 3 4
  * y        (y) int64 4 5 6 7
Data variables:
    rho      (x, y) float64 0.503 0.34 0.511 0.957 ... 0.964 0.702 0.226 0.607

相关问题