R语言 从csv文件(从vcf提取)提取汇总统计(beta、SE、EA、非EA、EA频率)不工作

b1payxdu  于 2023-10-13  发布在  其他
关注(0)|答案(1)|浏览(266)

我下载了一个VCF文件+索引(https://gwas.mrcieu.ac.uk/datasets/ebi-a-GCST90001942/),我试图在R上打开它(见下面的脚本)。但是,R只会加载而不会打开我的文件。我将VCF解压缩为CSV文件,R可以读取。然而,现在我需要获得汇总统计数据(beta,SE,EA,非EA,EA频率。我正在做以下工作

install.packages("tidyverse")
install.packages("remotes")
install.packages("gwasrapidd")

library(tidyverse)
library(remotes)
library(gwasrapidd)

remotes::install_github("mrcieu/ieugwasr")
remotes::install_github("MRCIEU/TwoSampleMR")
remotes::install_github("MRCIEU/gwasvcf")
remotes::install_github("MRCIEU/gwasglue")

library(ieugwasr)
library(TwoSampleMR)
library(gwasvcf)
library(gwasglue)

使用GWAS VCF文件

library(gwasvcf)
set_bcftools('/path/to/bcftools')
set_plink('/path/to/plink')

remotes::install_github('mrcieu/genetics.binaRies', force = TRUE)
set_plink()
set_bcftools()

suppressWarnings(suppressPackageStartupMessages({
  library(gwasvcf)
  library(VariantAnnotation)
  library(dplyr)
  library(magrittr)
}))

set_bcftools()

阅读在一切
要读取整个数据集,请使用readVcf函数。

vcffile <- "/Users/path/Desktop/ebi-a-GCST90001942.vcf.gz"
vcf_data <- readVcf(vcffile)

R永远加载并且不打开...

bfrts1fy

bfrts1fy1#

我还没有尝试过gwasvcf包,但确实尝试过转换MungeSumstats R包提供的摘要vcf文件,它工作得很好。
安装MungeSumstats包后,您可以简单地

library(MungeSumstats)
MungeSumstats::format_sumstats("path/ebi-a-GCST90001942.vcf", ref_genome = "GRCh37")

该软件包自动将vcf转换为标准的汇总统计文件。

Dropping 1 duplicate column(s).
1 sample detected: ebi-a-GCST90001942
Constructing ScanVcfParam object.
VCF contains: 15,147,117 variant(s) x 1 sample(s)
Reading VCF file: single-threaded
Converting VCF to data.table.
Expanding VCF first, so number of rows may increase.
Dropping 1 duplicate column(s).
Checking for empty columns.
Unlisting 3 columns.
Time difference of 14.5 mins
VCF data.table contains: 15,141,982 rows x 11 columns.
Time difference of 18 mins
Renaming ID as SNP.
VCF file has -log10 P-values; these will be converted to unadjusted p-values in the 'P' column.
No INFO (SI) column detected.
Standardising column headers.
First line of summary statistics file: 
SNP chr BP  end REF ALT FILTER  AF  ES  SE  LP  P   
Summary statistics report:
   - 15,141,982 rows
   - 15,141,982 unique variants
   - 7 genome-wide significant variants (P<5e-8)
   - 22 chromosomes

最终的输出是你所期望的

Done munging in 37.326 minutes.
Successfully finished preparing sumstats file, preview:
Reading header.
            SNP CHR    BP A1 A2   END FILTER    FRQ    BETA      SE       LP          P
1: rs1238646298   1 10472  G  C 10472   PASS 0.0622 -0.1599 0.08462 1.229440 0.05896034
2: rs1434325972   1 10711  A  G 10711   PASS 0.9958  0.1305 0.22700 0.247567 0.56550051
3: rs1476353024   1 12673  G  A 12673   PASS 0.0006 -0.4995 0.59420 0.397181 0.40069968
4:   rs62028691   1 13118  A  G 13118   PASS 0.9958 -0.1151 0.19130 0.261616 0.54749984
Returning path to saved data.

希望对你有帮助。

相关问题