我有一个 Dataframe ,其中有一列地理坐标列表(按它们所属的输电线路分组)和 Dataframe 外的另一个坐标列表。如果样本编号为DS_3,我需要找到 Dataframe 中每个列表坐标的位置,但由于几乎有6000个坐标,这需要花费太长的时间。我敢打赌,有一种向量化的方法可以做到这一点,因为我已经了解到循环遍历 Dataframe 是不切实际的,但我对向量方法了解不多。
下面是 Dataframe :
id coordinates voltage length voltage x length
0 36 [[-81.443569, 28.470022], [-81.446726, 28.4740... 230 2788.450481 6.413436e+05
1 69 [[-82.402208, 27.907588], [-82.406592, 27.9084... 69 1486.634968 1.025778e+05
2 87 [[-80.38392, 25.748665], [-80.383758, 25.74358... 69 3395.795388 2.343099e+05
3 128 [[-81.423956, 28.410278], [-81.424811, 28.4053... 69 5231.189711 3.609521e+05
4 138 [[-81.843314, 30.572359], [-81.844404, 30.5685... 230 2716.984353 6.249064e+05
... ... ... ... ... ...
3061 68184 [[-81.251491, 28.53718], [-81.250396, 28.53283... 69 19243.512450 1.327802e+06
3062 68189 [[-82.669886, 28.961533], [-82.664782, 28.9615... 230 27463.901761 6.316697e+06
3063 68196 [[-81.157196, 29.000982], [-81.157041, 28.9958... 500 90524.038042 4.526202e+07
3064 68199 [[-80.549594, 28.481094], [-80.551733, 28.4857... 115 7185.881445 8.263764e+05
3065 68211 [[-80.44025, 25.81403]] 115 673.881802 7.749641e+04
字符串
下面是坐标列表的开始:
[[-81.708274, 31.095992], [-81.708763, 31.090911], [-81.709349, 31.085841], [-81.710002, 31.080779], [-81.710627, 31.075713], [-81.711167, 31.070638], [-81.711649, 31.065557], [-81.712316, 31.060497], [-81.713036, 31.055444], [-81.713757, 31.050391], [-81.714478, 31.045338], [-81.715199, 31.040285], [-82.184384, 31.058297], [-82.188367, 31.061488], [-82.192045, 31.065027], [-82.195735, 31.068554], [-82.199426, 31.072079], [-82.20315, 31.075567], [-82.207127, 31.078767], [-82.211101, 31.081969], [-82.215077, 31.08517], [-82.219057, 31.088366], [-82.223033, 31.091567], [-82.227002, 31.094776], [-82.230978, 31.097977], [-82.234959, 31.101171], [-82.238934, 31.104373], [-82.242912, 31.107571], [-82.24689, 31.110769], [-82.250862, 31.113975], [-82.255483, 31.116065], [-82.260227, 31.117947], [-82.26497, 31.119832], [-82.269711, 31.121722], [-82.274457, 31.123602], [-82.279199, 31.125489], [-82.283947, 31.127364], [-82.28869, 31.129249], [-82.293435, 31.131131], [-82.298182, 31.133006], [-82.30292, 31.134905] ...
型
这是我目前使用的代码;任何提示将不胜感激!
for index, row in df.iterrows():
for i in range(len(coordinates)):
if coordinates[i] in row['coordinates'] and sample[i]['sample_0'] == 'DS_3':
if row['id'] in damaged_lines:
break
else:
damaged_lines.append(row['id'])
型
更新代码:
for i in range(len(coordinates)):
for index, row in df.iterrows():
if coordinates[i] in row['coordinates'] and sample[i]['sample_0'] == 'DS_3':
if row['id'] in damaged_lines:
break
else:
damaged_lines.append(row['id'])
break
型
1条答案
按热度按时间4c8rllxm1#
下面是一个如何通过二进制搜索实现这一点的示例。这不会检查'DS_3',但添加起来很容易。我生成了100行随机数量的坐标对。然后,我从中提取一个坐标列表沿着它们来自的行的索引。然后,我从中提取一个随机样本进行查找,然后对提取的样本进行简单的循环,对列表进行二分搜索。
字符串
输出量:
型