我有一个这样的数据框
Chr Start Ref Alt Revel_Score RefSeq CHR BP SNP P
21 9 133710834 A C 0.429 NM_005157.6 9 133710834 1 0.571001
22 9 133710834 A G 0.424 NM_005157.6 9 133710834 1 0.576001
23 9 133710834 A T 0.432 NM_005157.6 9 133710834 1 0.568001
24 9 133710835 T A 0.395 NM_005157.6 9 133710835 1 0.605001
25 9 133710835 T C 0.386 NM_005157.6 9 133710835 1 0.614001
26 9 133710835 T G 0.389 NM_005157.6 9 133710835 1 0.611001
27 9 133710836 G A 0.429 NM_005157.6 9 133710836 1 0.571001
28 9 133710836 G C 0.418 NM_005157.6 9 133710836 1 0.582001
29 9 133710836 G T 0.418 NM_005157.6 9 133710836 1 0.582001
30 9 133710837 T A 0.381 NM_005157.6 9 133710837 1 0.619001
31 9 133710837 T G 0.278 NM_005157.6 9 133710837 1 0.722001
32 9 133710838 T C 0.378 NM_005157.6 9 133710838 1 0.622001
33 9 133710838 T G 0.327 NM_005157.6 9 133710838 1 0.673001
34 9 133710839 G C 0.352 NM_005157.6 9 133710839 1 0.648001
35 9 133710839 G T 0.352 NM_005157.6 9 133710839 1 0.648001
# The same in a dict to allow to work
IGV_table_limited.head(15).to_dict()
{'Chr': {21: '9',
22: '9',
23: '9',
24: '9',
25: '9',
26: '9',
27: '9',
28: '9',
29: '9',
30: '9',
31: '9',
32: '9',
33: '9',
34: '9',
35: '9'},
'Start': {21: 133710834,
22: 133710834,
23: 133710834,
24: 133710835,
25: 133710835,
26: 133710835,
27: 133710836,
28: 133710836,
29: 133710836,
30: 133710837,
31: 133710837,
32: 133710838,
33: 133710838,
34: 133710839,
35: 133710839},
'Ref': {21: 'A',
22: 'A',
23: 'A',
24: 'T',
25: 'T',
26: 'T',
27: 'G',
28: 'G',
29: 'G',
30: 'T',
31: 'T',
32: 'T',
33: 'T',
34: 'G',
35: 'G'},
'Alt': {21: 'C',
22: 'G',
23: 'T',
24: 'A',
25: 'C',
26: 'G',
27: 'A',
28: 'C',
29: 'T',
30: 'A',
31: 'G',
32: 'C',
33: 'G',
34: 'C',
35: 'T'},
'Revel_Score': {21: 0.429,
22: 0.424,
23: 0.432,
24: 0.395,
25: 0.386,
26: 0.389,
27: 0.429,
28: 0.418,
29: 0.418,
30: 0.381,
31: 0.278,
32: 0.378,
33: 0.327,
34: 0.352,
35: 0.352},
'RefSeq': {21: 'NM_005157.6',
22: 'NM_005157.6',
23: 'NM_005157.6',
24: 'NM_005157.6',
25: 'NM_005157.6',
26: 'NM_005157.6',
27: 'NM_005157.6',
28: 'NM_005157.6',
29: 'NM_005157.6',
30: 'NM_005157.6',
31: 'NM_005157.6',
32: 'NM_005157.6',
33: 'NM_005157.6',
34: 'NM_005157.6',
35: 'NM_005157.6'},
'CHR': {21: '9',
22: '9',
23: '9',
24: '9',
25: '9',
26: '9',
27: '9',
28: '9',
29: '9',
30: '9',
31: '9',
32: '9',
33: '9',
34: '9',
35: '9'},
'BP': {21: 133710834,
22: 133710834,
23: 133710834,
24: 133710835,
25: 133710835,
26: 133710835,
27: 133710836,
28: 133710836,
29: 133710836,
30: 133710837,
31: 133710837,
32: 133710838,
33: 133710838,
34: 133710839,
35: 133710839},
'SNP': {21: 1,
22: 1,
23: 1,
24: 1,
25: 1,
26: 1,
27: 1,
28: 1,
29: 1,
30: 1,
31: 1,
32: 1,
33: 1,
34: 1,
35: 1},
'P': {21: 0.571001,
22: 0.5760010000000001,
23: 0.5680010000000001,
24: 0.605001,
25: 0.614001,
26: 0.611001,
27: 0.571001,
28: 0.5820010000000001,
29: 0.5820010000000001,
30: 0.619001,
31: 0.722001,
32: 0.622001,
33: 0.6730010000000001,
34: 0.648001,
35: 0.648001}}
我想做下面这件事
如果Chr、Start、Ref和Revel_Score列中的值相同,则通过添加0.01到0.1之间的随机数来修改P列中的值
因此,条件发生在第28、29和34、35行,所以0.08 + 0.582001、0.06 + 0.582001、0.05 + 0.648001和0.07 + 0.648001也是如此
Chr Start Ref Alt Revel_Score RefSeq CHR BP SNP P
21 9 133710834 A C 0.429 NM_005157.6 9 133710834 1 0.571001
22 9 133710834 A G 0.424 NM_005157.6 9 133710834 1 0.576001
23 9 133710834 A T 0.432 NM_005157.6 9 133710834 1 0.568001
24 9 133710835 T A 0.395 NM_005157.6 9 133710835 1 0.605001
25 9 133710835 T C 0.386 NM_005157.6 9 133710835 1 0.614001
26 9 133710835 T G 0.389 NM_005157.6 9 133710835 1 0.611001
27 9 133710836 G A 0.429 NM_005157.6 9 133710836 1 0.571001
28 9 133710836 G C 0.418 NM_005157.6 9 133710836 1 0.662001
29 9 133710836 G T 0.418 NM_005157.6 9 133710836 1 0.642001
30 9 133710837 T A 0.381 NM_005157.6 9 133710837 1 0.619001
31 9 133710837 T G 0.278 NM_005157.6 9 133710837 1 0.722001
32 9 133710838 T C 0.378 NM_005157.6 9 133710838 1 0.622001
33 9 133710838 T G 0.327 NM_005157.6 9 133710838 1 0.673001
34 9 133710839 G C 0.352 NM_005157.6 9 133710839 1 0.698001
35 9 133710839 G T 0.352 NM_005157.6 9 133710839 1 0.718001
我知道如何应用这个条件
IGV_table_limited[IGV_table_limited.duplicated(subset=['Chr','Start', 'Ref','Revel_Score' ], keep=False)]
我知道如何在那一列做数学运算,但我不知道如何得到原始表格的变化
1条答案
按热度按时间8e2ybdfx1#
使用
numpy.random.normal
和DataFrame.loc
,对于计数True
,使用sum
:如果确实需要步骤
0.01
,则使用numpy.random.choice
:我测试
P
列中索引28
发生变化且无负值的数据: