我一直在尝试使用python和bioinfokit在excel文件中创建基因表达数据的火山图。我使用Pandas创建了一个 Dataframe ,然后消除了一些负值。然后我尝试在最后一行代码中创建火山图。
import pandas as pd
import numpy as np
import bioinfokit
from bioinfokit import analys, visuz
panda_brie = pd.read_csv("C:\\Users\\amorgan\\Documents\\brie_gRNA_stats.csv", encoding='ISO-8859-1', low_memory=False)
shape = panda_brie.shape
print(shape)
panda_brie = panda_brie.loc[(panda_brie[("fold_change")] > 0)]
shape = panda_brie.shape
print(shape)
bioinfokit.visuz.gene_exp.volcano(df=panda_brie, lfc="log_fold_change", pv="log_p_value")
我收到以下错误,不确定该怎么办。
Traceback (most recent call last):
File "C:/Users/amorgan/AppData/Local/Programs/Python/Python39/graphing brie data.py", line 19, in <module>
bioinfokit.visuz.gene_exp.volcano(df=panda_brie, lfc="log_fold_change", pv="log_p_value")
File "C:\Users\amorgan\AppData\Local\Programs\Python\Python39\lib\site-packages\bioinfokit\visuz.py", line 397, in volcano
df.loc[(df[lfc] >= lfc_thr[0]) & (df[pv] < pv_thr[0]), 'color_add_axy'] = color[0] # upregulated
File "C:\Users\amorgan\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py", line 69, in new_method
return method(self, other)
File "C:\Users\amorgan\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arraylike.py", line 52, in __ge__
return self._cmp_method(other, operator.ge)
File "C:\Users\amorgan\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py", line 5501, in _cmp_method
res_values = ops.comparison_op(lvalues, rvalues, op)
File "C:\Users\amorgan\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py", line 284, in comparison_op
res_values = comp_method_OBJECT_ARRAY(op, lvalues, rvalues)
File "C:\Users\amorgan\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py", line 73, in comp_method_OBJECT_ARRAY
result = libops.scalar_compare(x.ravel(), y, op)
File "pandas\_libs\ops.pyx", line 107, in pandas._libs.ops.scalar_compare
TypeError: '>=' not supported between instances of 'str' and 'int'
我的panda Dataframe 的标题,以防有帮助
Unnamed: 0 control.avg ... log_fold_change log_p_value
0 Syt15_GGTACCACAAATGGTACACT 7.80 ... 0.421772618 9.665546
1 Fbxo21_CTTGTGTGCAAAACCCTCCG 3.67 ... 0.678371984 8.397940
2 Irgc1_GAGGCCCTCGGGTTTCAGCG 3.10 ... 0.736525011 8.151195
3 Ttll12_CCTGTGTCTAGGTCCCTTAG 3.98 ... 0.622833399 9.659556
4 Kdm4b_ATGTCATCATACGTCTGCCG 4.41 ... 0.545893109 9.899629
panda_brie.info()的输出
[5 rows x 24 columns]
<class 'pandas.core.frame.DataFrame'>
Int64Index: 50629 entries, 0 to 53135
Data columns (total 24 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 50629 non-null object
1 control.avg 50629 non-null float64
2 Tg50.avg 50629 non-null float64
3 Tg100.avg 50629 non-null float64
4 Tg150.avg 50629 non-null float64
5 Tg250.avg 50629 non-null float64
6 Treated.vs.Nontreated.p 50629 non-null float64
7 Treated.vs.Nontreated.FDR 50629 non-null float64
8 Treated.vs.Nontreated.logFC 50629 non-null float64
9 Treated.vs.Nontreated.FC 50629 non-null float64
10 Dose.Regression.p 50629 non-null float64
11 Dose.Regression.FDR 50629 non-null float64
12 Dose.Regression.Slope 50629 non-null float64
13 gene 50629 non-null object
14 gRNASeq 50629 non-null object
15 Unnamed: 15 0 non-null float64
16 Unnamed: 16 0 non-null float64
17 Unnamed: 17 13 non-null object
18 Unnamed: 18 3 non-null object
19 Unnamed: 19 3 non-null object
20 Unnamed: 20 1 non-null object
21 fold_change 50629 non-null float64
22 log_fold_change 50629 non-null object
23 log_p_value 50629 non-null float64
dtypes: float64(16), object(8)
memory usage: 9.7+ MB
暂无答案!
目前还没有任何答案,快来回答吧!