pandas 在python中删除多行字符的txt文件

bttbmeg0  于 2023-06-20  发布在  Python
关注(0)|答案(3)|浏览(138)

我有多个这样的txt文件:
https://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs001672/analyses/phs001672.pha004730.txt
文件保存在C:\Users\test.txt中
我们如何删除第一行注解(假设20行),并保存一个新的csv文件,只与python中的表?

c7rzv4ha

c7rzv4ha1#

您可以将read_table与自定义comment一起使用:

url = "https://ftp.ncbi.nlm.nih.gov/dbgap/studies/"
      "phs001672/analyses/phs001672.pha004730.txt"
​
df = pd.read_table(url, comment="#")

输出:

print(df)

              ID  Analysis ID       SNP ID  ...  Coded Allele  Sample size  Bin ID
0      506214698         4730    rs1300646  ...             A         8542       6
1      506218329         4730   rs76749734  ...             A          942     158
2      506216207         4730   rs80286553  ...             A        90924      26
...          ...          ...          ...  ...           ...          ...     ...
31662  506245867         4730   rs71334010  ...             A       317118    1422
31663  506245880         4730  rs113480342  ...             A       314121    1422
31664  506245884         4730  rs140069817  ...             T       307546    1422

[31665 rows x 22 columns]
kd3sttzy

kd3sttzy2#

只需使用comment参数:

df = pd.read_csv('C:/Users/test.txt', sep='\t', comment='#')

输出:

>>> df
              ID  Analysis ID       SNP ID       P-value   Rank  Plot data  Chr ID  ...  Call Rate |β|      SE R-Squared Coded Allele Sample size Bin ID
0      506214698         4730    rs1300646  9.550000e-05  31125          5       1  ...   0.401995   2.8862  0.7397       NaN            A        8542      6
1      506218329         4730   rs76749734  4.012000e-05  22885          5       2  ...   0.404813   9.1915  2.2381       NaN            A         942    158
2      506216207         4730   rs80286553  1.016000e-05  14751          5       1  ...   0.412902   0.6668  0.1511       NaN            A       90924     26
3      506225782         4730  rs149962677  1.248000e-05  15682          5       5  ...   0.425870   8.8466  2.0249       NaN            A         991    462
4      506237176         4730  rs544433886  7.388000e-05  28473          5      10  ...   0.409761   0.4791  0.1209       NaN            A      125985    885
...          ...          ...          ...           ...    ...        ...     ...  ...        ...      ...     ...       ...          ...         ...    ...
31660  506245847         4730   rs35219138  8.893000e-07   8179          7      21  ...   0.994380   0.1380  0.0281       NaN            A      316702   1422
31661  506245862         4730   rs60998147  7.986000e-07   7981          7      21  ...   0.995193   0.1409  0.0286       NaN            T      316961   1422
31662  506245867         4730   rs71334010  8.449000e-07   8091          7      21  ...   0.995686   0.1383  0.0281       NaN            A      317118   1422
31663  506245880         4730  rs113480342  3.812000e-07   6650          7      21  ...   0.986276   0.1440  0.0283       NaN            A      314121   1422
31664  506245884         4730  rs140069817  4.145000e-07   6808          7      21  ...   0.965632   0.1428  0.0282       NaN            T      307546   1422

[31665 rows x 22 columns]
smdnsysy

smdnsysy3#

打开文件,用换行符'\n'分割,你将在数组中一行一行地得到你的文件。如果行包含'#',则遍历数组,删除该行。你知道有一个没有注解行的数组。然后你可以用数组重写文件。
for csv

相关问题