我有一个数据框架(df2),它由30年的每日气象数据组成。多次运行时重复此数据(请参阅运行文件年)。以下是 Dataframe 的示例:
Date DHI ... WD run_file_year
Date ...
1991-01-01 00:00:00 01/01/1991 00:00:00 0.0 ... 281.70 1991_r1_r10i2p1
1991-01-01 01:00:00 01/01/1991 01:00:00 0.0 ... 281.01 1991_r1_r10i2p1
1991-01-01 02:00:00 01/01/1991 02:00:00 0.0 ... 274.43 1991_r1_r10i2p1
1991-01-01 03:00:00 01/01/1991 03:00:00 0.0 ... 280.94 1991_r1_r10i2p1
1991-01-01 04:00:00 01/01/1991 04:00:00 0.0 ... 272.53 1991_r1_r10i2p1
... ... ... ... ... ...
2021-12-31 19:00:00 31/12/2021 19:00:00 0.0 ... 289.06 2021_r5_r9i2p1
2021-12-31 20:00:00 31/12/2021 20:00:00 0.0 ... 301.39 2021_r5_r9i2p1
2021-12-31 21:00:00 31/12/2021 21:00:00 0.0 ... 301.30 2021_r5_r9i2p1
2021-12-31 22:00:00 31/12/2021 22:00:00 0.0 ... 313.21 2021_r5_r9i2p1
2021-12-31 23:00:00 31/12/2021 23:00:00 0.0 ... 313.29 2021_r5_r9i2p1
我当前的代码如下(请参见>>>>>>了解需要注意的具体行):
df2 = pd.DataFrame(df2, columns=['dry_bulb_temp', 'dew_point_temp','WS','GIR','max_temp','min_temp','max_dew_point','min_dew_point','max_wind'])
for i in range(12):
c, Q = selectYear(df2, i + 1, config)
def selectYear(d, m, config):
"""
Use the Sandia method, to select the most typical year of data
for the given month
"""
>>>>d = d[d.index.month == m]<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
n_bins = config['cdf_bins']
weights = dict(config['weights'])
total = weights.pop('total')
score = dict.fromkeys(d.index.year, 0)
fs = dict.fromkeys(weights)
cdfs = dict.fromkeys(weights)
i = 0
x2 = np.zeros((len(weights), 30))
for w in weights:
cdfs[w] = dict([])
fs[w] = dict([])
# Calculate the long term CDF for this weight
cdfs[w]['Long-Term'], bin_edges = cdf(d, w, n_bins)
x = bin_edges[:-1] * np.diff(bin_edges) / 2
x2[i, :] = x
i += 1
>>>>>>>>for yr in set(d.index.year):<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
dy = d[d.index.year == yr]
#print(dy)
# calculate the CDF for this weight for specific year
cdfs[w][yr], b = cdf(dy, w, bin_edges)
# Finkelstein-Schafer statistic (difference between long term
# CDF and year CDF
fs[w][yr] = np.mean(abs(cdfs[w]['Long-Term'] - cdfs[w][yr]))
# Add weighted FS value to score for this year
score[yr] += fs[w][yr] * weights[w] / total
# select the top 5 years ordered by their weighted scores
top5 = sorted(score, key=score.get)[:5]
目前,我的代码按月对数据进行索引,然后比较每年的数据。换句话说,每年的1月份都要进行评估(计算cdf),然后进行排名。
出现的问题是,由于存在多个运行,因此存在多个2001年1月。我的代码当前合并了它们的数据,而不是将2001年1月的运行1与2001年1月的运行2视为单独的实体进行比较。我的问题是,有没有一种方法可以使用我的列“run\u file\u year”(一个字符串)进行索引,并让代码在所有run\u file\u year列中运行(而不列出它们)?
目前, Dataframe d(按月索引)随后按年索引。我想知道,我是否可以通过某种方式通过run_file_year列进行索引,而不是按年份进行索引,而无需迭代中的所有项目?
暂无答案!
目前还没有任何答案,快来回答吧!