debugging 固定井分析

dfddblmv 于 2023-06-30 发布在其他

关注(0)|答案(1)|浏览(73)

所以，我现在被一个bug卡住了。
我正在处理一个包含以下信息的巨大数据集：
关于许多威尔斯的多个示例的信息，每个孔都标有其自己的唯一孔ID号、镭污染水平和取样日期。
例如：

Well ID: AT091
Radium Level: 44.9
Sample Date: 3/18/2015

Well ID: AT091
Radium Level: 50.2
Sample Date: 2/18/2015

Well ID: AT091
Radium Level: 33.7 PCI/L
Sample Date: 7/28/2020

我被要求编写一个Python脚本，从原始数据集中过滤出数据，并根据以下条件创建一个新的Excel工作表：
对于每口井，如果该井每年取样一次，则保留该井。对于每口井，如果该井在一年内多次取样，则保留污染水平最高的取样日期。
例如，如果一个孔被取样三次：

Well ID: AT091
Radium Level: 44.9
Sample Date: 3/18/2015

Well ID: AT091
Radium Level: 50.2
Sample Date: 2/18/2015

Well ID: AT091
Radium Level: 33.7 PCI/L
Sample Date: 7/28/2020

代码应使用以下内容更新电子表格：

Well ID: AT091
Radium Level: 50.2
Sample Date: 2/18/2015

Well ID: AT091
Radium Level: 33.7 PCI/L
Sample Date: 7/28/2020

下面是我写的代码：

def wells_sampled_once_per_year(well_numbers, formatted_dates, concentration):
    well_count = {}
    max_contamination = {}

    for well, date, conc in zip(well_numbers, formatted_dates, concentration):
        if date is None:
            continue
        try:
            year = pd.to_datetime(date).year
        except AttributeError:
            continue
        well_year = (well, year)
        if well_year in well_count:
            well_count[well_year] += 1
            max_contamination[well_year] = max(max_contamination[well_year], conc)
        else:
            well_count[well_year] = 1
            max_contamination[well_year] = conc

    sampled_once_per_year = [
        (well, date, conc, max_contamination[(well, pd.to_datetime(date).year)])
        for well, date, conc in zip(well_numbers, formatted_dates, concentration)
        if well_count[(well, pd.to_datetime(date).year)] == 1
    ]
    return sorted(sampled_once_per_year)

def wells_sampled_multiple_times_per_year(well_numbers, formatted_dates, concentration):
    well_count = {}
    max_contamination = {}
    
    for well, date, conc in zip(well_numbers, formatted_dates, concentration):
        if date is None:
            continue
        try:
            year = pd.to_datetime(date).year
        except AttributeError:
            continue
        well_year = (well, year)
        if well_year in well_count:
            well_count[well_year] += 1
            if conc > max_contamination[well_year]:
                max_contamination[well_year] = conc
        else:
            well_count[well_year] = 1
            max_contamination[well_year] = conc
    
    sampled_multiple_times_per_year = [
        (well, date, conc, max_contamination[(well, pd.to_datetime(date).year)])
        for well, date, conc in zip(well_numbers, formatted_dates, concentration)
        if well_count[(well, pd.to_datetime(date).year)] > 1 and conc == max_contamination[(well, pd.to_datetime(date).year)]
    ]
    
    # Remove duplicates from the list
    sampled_multiple_times_per_year = list(set(sampled_multiple_times_per_year))
    
    return sorted(sampled_multiple_times_per_year)

debugging

来源：https://stackoverflow.com/questions/76570230/fix-well-analysis

1条答案

按热度按时间

yb3bgrhw1#

在for循环之后，max_contamination包含了几乎所有需要的信息，除了日期。为了简化返回值i的构造，我在循环中添加了它。e.将循环的最后五行改为

…
            if conc > max_contamination[well_year][1]:  # [1]: conc
                max_contamination[well_year] = (date, conc)
        else:
            well_count[well_year] = 1
            max_contamination[well_year] = (date, conc)
    return [(well, date, conc) for (well, _), (date, conc) in max_contamination.items()]

(or如果需要的话，进行排序）。

赞(0）回复(0）举报 2023-06-30

我来回答

debugging 固定井分析

1条答案

相关问题

热门标签

最新问答