我有一个104 x 4的 Dataframe ,但是第3、4列携带的是看起来像索引数据的数据,我需要删除它。下面是来自104 x 4 Dataframe 的数据示例。以279978开头的号码,然后是568296等,需要删除。我试过用drop =True重置索引,但它什么也没做。谢谢大家
week_number plant_name normalized_wind_speed normalized_temperature
0 52 MONTAGUE "0 0.701286
1 0.767225
2 0.789204
3 0.921082
4 1.074940
...
279978 -1.101045
279979 -0.969167
279980 -0.947187
279981 -1.035106
279982 -1.057085
Name: wind_speed_ms, Length: 5520, dtype: float64" "0 0.933228
1 0.951533
2 1.043057
3 1.043057
4 0.988143
...
279978 -2.746031
279979 -2.746031
279980 -2.727726
279981 -2.764335
279982 -2.855859
Name: air_temp_c, Length: 5520, dtype: float64"
1 52 STAR POINT "288264 -0.131748
288265 -0.078411
288266 0.054931
288267 -0.051743
288268 0.454959
...
568296 -1.411837
568297 -1.251826
568298 -1.331832
568299 -1.385169
568300 -1.305163
Name: wind_speed_ms, Length: 5520, dtype: float64" "288264 0.969358
288265 0.988484
288266 0.988484
288267 0.950232
288268 0.873729
...
568296 -2.836685
568297 -2.817559
568298 -2.741056
568299 -2.721930
568300 -2.721930
Name: air_temp_c, Length: 5520, dtype: float64"
下面是我在编辑器中看到的数据图片:
以下是显示更多数据的结果:
print(norm_data.head().to_dict("list"))
{'week_number': [52, 52, 51, 51, 50], 'plant_name': ['MONTAGUE', 'STAR POINT', 'MONTAGUE', 'STAR POINT', 'MONTAGUE'], 'normalized_wind_speed': [0 0.701286
1 0.767225
2 0.789204
3 0.921082
4 1.074940
279978 -1.101045
279979 -0.969167
279980 -0.947187
279981 -1.035106
279982 -1.057085
Name: wind_speed_ms, Length: 5520, dtype: float64, 288264 -0.131748
288265 -0.078411
288266 0.054931
288267 -0.051743
288268 0.454959
568296 -1.411837
568297 -1.251826
568298 -1.331832
568299 -1.385169
568300 -1.305163
Name: wind_speed_ms, Length: 5520, dtype: float64, 137 -0.437615
138 -0.574035
139 -0.733191
140 -0.801401
141 -0.733191
279738 1.040267
279739 0.972058
279740 0.858374
279741 0.790164
279742 0.767428
Name: wind_speed_ms, Length: 5544, dtype: float64, 288376 -1.123354
288377 -1.038684
288378 -1.123354
288379 -1.038684
288380 -0.925791
568357 -1.716045
568358 -0.869344
568359 -0.700004
568360 -0.558887
568361 -0.530664
Name: wind_speed_ms, Length: 5544, dtype: float64, 224 1.381725
225 1.559191
226 1.537008
227 1.448275
228 1.359542
280304 -0.259837
280305 -0.503853
280306 -0.636953
280307 -0.636953
280308 -0.171104
Name: wind_speed_ms, Length: 5544, dtype: float64], 'normalized_temperature': [0 0.933228
1 0.951533
2 1.043057
3 1.043057
4 0.988143
279978 -2.746031
279979 -2.746031
279980 -2.727726
279981 -2.764335
279982 -2.855859
Name: air_temp_c, Length: 5520, dtype: float64, 288264 0.969358
288265 0.988484
288266 0.988484
288267 0.950232
288268 0.873729
568296 -2.836685
568297 -2.817559
568298 -2.741056
568299 -2.721930
568300 -2.721930
Name: air_temp_c, Length: 5520, dtype: float64, 137 -0.428008
138 -0.375312
139 -0.498269
140 -0.691487
141 -0.884705
279738 0.239473
279739 0.221907
279740 0.186777
279741 0.186777
279742 0.221907
Name: air_temp_c, Length: 5544, dtype: float64, 288376 -2.485504
288377 -2.688291
288378 -2.872642
288379 -3.038558
288380 -3.167603
568357 -3.591611
568358 -3.683786
568359 -3.720657
568360 -3.646916
568361 -3.462565
Name: air_temp_c, Length: 5544, dtype: float64, 224 0.445641
225 0.408846
226 0.335257
227 0.224873
228 0.059297
280304 1.199931
280305 1.181534
280306 1.181534
280307 1.236726
280308 1.236726
Name: air_temp_c, Length: 5544, dtype: float64]}
这是生成如上所示的df的循环结构:
for week in week_numbers:
# Get the data for the current week number
current_week_data = ncDatad[ncDatad['week'] == week]
# Loop over each plant name
for site in sites:
# Get the data for the current plant name
current_plant_data = current_week_data[ncDatad['plant_name'] == site]
# Calculate the mean and standard deviation for wind speed
wind_speed_mean = current_plant_data['wind_speed_ms'].mean()
wind_speed_std = current_plant_data['wind_speed_ms'].std()
# Calculate the mean and standard deviation for temperature
temperature_mean = current_plant_data['air_temp_c'].mean()
temperature_std = current_plant_data['air_temp_c'].std()
# Normalize the wind speed values
normalized_wind_speed = (current_plant_data['wind_speed_ms'] - wind_speed_mean) / wind_speed_std
# Normalize the temperature values
normalized_temperature = (current_plant_data['air_temp_c'] - temperature_mean) / temperature_std
# Create a new row for the current plant name
new_row = {
'week_number': week,
'plant_name': site,
'normalized_wind_speed': normalized_wind_speed,
'normalized_temperature': normalized_temperature
}
# Add the new row to the dataframe
norm_data = norm_data.append(new_row, ignore_index=True)
# Print the normalized values
norm_data = norm_data.reset_index(drop=True)
print(norm_data)
1条答案
按热度按时间xuo3flqw1#
IIUC,你需要稍微修改你的代码(通过使用
to_list
)来获得一个列表而不是Series:输出: