文件csv表只有2列,PythonPandaspd .pd

0aydgbwb  于 2022-12-09  发布在  Python
关注(0)|答案(3)|浏览(132)

我得到了.csv文件,其中有这样几行:

result,table,_start,_stop,_time,_value,_field,_measurement,device
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:35Z,44.61,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:40Z,17.33,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:45Z,41.2,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:51Z,33.49,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:56Z,55.68,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:12:57Z,55.68,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:02Z,25.92,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0
,0,2022-10-23T08:22:04.124457277Z,2022-11-22T08:22:04.124457277Z,2022-10-24T12:13:08Z,5.71,power,shellies,Shelly_Kitchen-C_CoffeMachine/relay/0

我需要让它们看起来像这样:

time  value
0  2022-10-24T12:12:35Z  44.61
1  2022-10-24T12:12:40Z  17.33
2  2022-10-24T12:12:45Z  41.20
3  2022-10-24T12:12:51Z  33.49
4  2022-10-24T12:12:56Z  55.68

我将需要我的异常检测代码,所以我不必手动删除列等。至少不是所有的。我不能这样做的程序,与mashine收集瓦数信息。我尝试了这个,但它doeasnt工作不够:

df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv')
df['_time'] = pd.to_datetime(df['_time'], format='%Y-%m-%dT%H:%M:%SZ')
df = pd.pivot(df, index = '_time', columns = '_field', values = '_value')
df.interpolate(method='linear') # not neccesary

它给出以下输出:

0
9      83.908
10     80.342
11     79.178
12     75.621
13     72.826
...       ...
73522  10.726
73523   5.241
flvtvl50

flvtvl501#

这是一种典型的方法,可以将数据向下投影到Pandas生态系统中的一个列子集上。

df = df[['_time', '_value']]
zkure5ic

zkure5ic2#

You can simply use the keyword argument usecols of pandas.read_csv :

df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv', usecols=["_time", "_value"])

NB: If you need to read the entire data of your ( .csv ) and only then select a subset of columns, Pandas core developers suggest you to use pandas.DataFrame.loc . Otherwise, by using df = df[subset_of_cols] synthax, the moment you'll start doing some operations on the (new?) sub-dataframe, you'll get a warning :

SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
So, in your case you can use :

df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv')
df = df.loc[:, ["_time", "_value"]] #instead of df[["_time", "_value"]]

Another option is pandas.DataFrame.copy ,

df = pd.read_csv('coffee_machine_2022-11-22_09_22_influxdb_data.csv')
df = df[["_time", "_value"]].copy()
gpnt7bae

gpnt7bae3#

.read_csv有一个usecols参数,用于指定DataFrame中需要哪些列。

df = pd.read_csv(f,header=0,usecols=['_time','_value'] )
print(df)

                  _time  _value
0  2022-10-24T12:12:35Z   44.61
1  2022-10-24T12:12:40Z   17.33
2  2022-10-24T12:12:45Z   41.20
3  2022-10-24T12:12:51Z   33.49
4  2022-10-24T12:12:56Z   55.68
5  2022-10-24T12:12:57Z   55.68
6  2022-10-24T12:13:02Z   25.92
7  2022-10-24T12:13:08Z    5.71

相关问题