R语言 从输入注解文件中解析标记并粘贴到最终输出文件

zdwk9cvp  于 2023-11-14  发布在  其他
关注(0)|答案(1)|浏览(78)

我的输入注解文件是这样的

GEO_ID  Title   Platform    Description Group   Sample_Name
GSM344422   Healthy-8   GPL20301    human Leukocytes_2016-06-01 Leukocytes_at_gestational_age_of_18-23wk_from_healthy_fetal_hearts  Healthy
GSM3444233  R-A7    GPL20301    human Leukocytes_2018-06-04 Leukocytes_at_gestational_age_of_18-23wk_from_healthy_fetal_hearts  Healthy
GSM344434   R2-FITC GPL20301    human Leukocytes_2018-06-04 Leukocytes_at_gestational_age_of_18-23wk_from_healthy_fetal_hearts  Healthy
GSM3444235  CHB-4   GPL20301    human Leukocytes_2016-06-01 Leukocytes_at_gestational_age_of_19-23wk_from_fetal_hearts_with_congenital_heart_block_(CHB)    Heartblock
GSM344236   CHB_Luekocytes  GPL20301    human Leukocytes_2017-11-22 Leukocytes_at_gestational_age_of_19-23wk_from_fetal_hearts_with_congenital_heart_block_(CHB)    Heartblock
GSM344237   CHB-F   GPL20301    human Leukocytes_2018-06-04 Leukocytes_at_gestational_age_of_19-23wk_from_fetal_hearts_with_congenital_heart_block_(CHB)    Heartblock
-----   -----   -----   -----   -----   
Comparison  Trat    Ctrl    Title   Paired  Tags
1   Leukocytes_at_gestational_age_of_19-23wk_from_fetal_hearts_with_congenital_heart_block_(CHB)    Leukocytes_at_gestational_age_of_18-23wk_from_healthy_fetal_hearts  Leukocytes from fetal hearts with congenital heart block _vs_ healthy   FALSE   Primary cells|Leukocytes|Heart|Fetus|Disease vs. normal|Congenital heart block

我的最终输出文件小子集看起来像这样,它是在多个样本比较之间的分析步骤后生成的。

Bioset summary = Neurons_from_dentate_gyrus_of_11wk_old_female_mice_housed_in_enriched_conditions vs. Early_progenitor_cells_(NSPCs)_from_dentate_gyrus_of_11wk_old_female_mice_housed_in_enriched_conditions

Data pre-processing = FASTQ files were downloaded from Sequence Read Archive (SRA). No other preprocessing was performed.

Analysis summary = Reads are aligned to mm10 using Dragen RNA 3.9.5. The differential methylation calling is done by methylKit R package using following parameters: qvalue - 0.01 and methylation difference (%) - 10

Test samples = Neurons_from_dentate_gyrus_of_11wk_old_female_mice_housed_in_enriched_conditions(total replicates = 5)
Control samples = Early_progenitor_cells_(NSPCs)_from_dentate_gyrus_of_11wk_old_female_mice_housed_in_enriched_conditions(total replicates = 5)

chromosome      start   stop    % differential  p-Value q-Value
2       98662301        98663300        11.4222829122048        0       0
2       98662401        98663400        13.6033864634586        0       0

我需要的是,在我的最终输出中,注解文件的Tag列中的任何内容都应该被附加到最终文件中。
因此,我希望这是通用的,这意味着它应该总是根据输入注解选择**Tag**列。

Bioset summary = Neurons_from_dentate_gyrus_of_11wk_old_female_mice_housed_in_enriched_conditions vs. Early_progenitor_cells_(NSPCs)_from_dentate_gyrus_of_11wk_old_female_mice_housed_in_enriched_conditions
    
    Data pre-processing = FASTQ files were downloaded from Sequence Read Archive (SRA). No other preprocessing was performed.
    
    Analysis summary = Reads are aligned to mm10 using Dragen RNA 3.9.5. The differential methylation calling is done by methylKit R package using following parameters: qvalue - 0.01 and methylation difference (%) - 10
    
    Test samples = Neurons_from_dentate_gyrus_of_11wk_old_female_mice_housed_in_enriched_conditions(total replicates = 5)
    Control samples = Early_progenitor_cells_(NSPCs)_from_dentate_gyrus_of_11wk_old_female_mice_housed_in_enriched_conditions(total replicates = 5)

Tags:Primary cells|Leukocytes|Heart|Fetus|Disease vs. normal|Congenital heart block

 
    chromosome      start   stop    % differential  p-Value q-Value
    2       98662301        98663300        11.4222829122048        0       0
    2       98662401        98663400        13.6033864634586        0       0

有什么建议或帮助吗?

ev7lccsx

ev7lccsx1#

如果第一个代码块作为名为dat的 Dataframe 输入,第二个代码块作为名为inp的字符向量输入,那么这将产生所需的结果:

c(inp[1:8], paste0(dat[8:9, "Sample_Name"], collapse=":"), inp[10:length(inp) ] )
#----------------
[1] "Bioset summary = Neurons_from_dentate_gyrus_of_11wk_old_female_mice_housed_in_enriched_conditions vs. Early_progenitor_cells_(NSPCs)_from_dentate_gyrus_of_11wk_old_female_mice_housed_in_enriched_conditions"         
 [2] ""                                                                                                                                                                                                                      
 [3] "Data pre-processing = FASTQ files were downloaded from Sequence Read Archive (SRA). No other preprocessing was performed."                                                                                             
 [4] ""                                                                                                                                                                                                                      
 [5] "Analysis summary = Reads are aligned to mm10 using Dragen RNA 3.9.5. The differential methylation calling is done by methylKit R package using following parameters: qvalue - 0.01 and methylation difference (%) - 10"
 [6] ""                                                                                                                                                                                                                      
 [7] "Test samples = Neurons_from_dentate_gyrus_of_11wk_old_female_mice_housed_in_enriched_conditions(total replicates = 5)"                                                                                                 
 [8] "Control samples = Early_progenitor_cells_(NSPCs)_from_dentate_gyrus_of_11wk_old_female_mice_housed_in_enriched_conditions(total replicates = 5)"                                                                       
 [9] "Tags:Primary cells|Leukocytes|Heart|Fetus|Disease vs. normal|Congenital heart block"                                                                                                                                   
[10] ""                                                                                                                                                                                                                      
[11] "chromosome      start   stop    % differential  p-Value q-Value"                                                                                                                                                       
[12] "2       98662301        98663300        11.4222829122048        0       0"                                                                                                                                             
[13] "2       98662401        98663400        13.6033864634586        0       0"

没有“Tag”列 * 本身 *。具有值“Tag”和标记值的列名是“Sample_Name”。如果您不需要行号,请使用catprint并使用适当的参数将其显示到控制台。行号实际上并不存在于结果对象中。

相关问题