linux 使用awk [duplicate]打印第一列和每第n列

utugiqy6  于 2023-02-07  发布在  Linux
关注(0)|答案(3)|浏览(151)
    • 此问题在此处已有答案**:

Print the 1st and every nth column of a text file using awk(3个答案)
3天前关闭。
我想在一个制表符分隔的文件中打印第一列(gene)和所有raw_counts列。
我试过了

BEGIN {FS = "\t"}
{for (i = 3; i <= NF; i += 1) printf ("%s%c", $i, i + 1 <= NF ? "\t" : "\n");}

但是输出与输入相同。

awk -f prog.awk < input.csv > output.csv

输入数据:
head -3 input.txt

Hybridization REF       TCGA-A3-3306-01A-01R-0864-07    TCGA-A3-3306-01A-01R-0864-07    TCGA-A3-3306-01A-01R-0864-07   TCGA-A3-3307-01A-01R-0864-07     TCGA-A3-3307-01A-01R-0864-07    TCGA-A3-3307-01A-01R-0864-07
gene    raw_counts      median_length_normalized        RPKM    raw_counts      median_length_normalized        RPKM  
?|100130426     1       0.122549019607843       0.0330807728010661      0       0       0

预期输出:

Hybridization REF       TCGA-A3-3306-01A-01R-0864-07      TCGA-A3-3307-01A-01R-0864-07       
gene    raw_counts    raw_counts       RPKM   
?|100130426     1       0
6xfqseft

6xfqseft1#

一些调整:

  • 2启动循环计数器
  • 每次循环计数器递增+3

修改OP的代码:

$ awk 'BEGIN {FS=OFS="\t"} {printf "%s",$1; for (i=2;i<=NF;i+=3) printf "%s%s",OFS,$i; print ""}' input.csv    
gene    raw_counts      raw_counts      raw_counts      raw_counts      raw_counts

对样本输入和预期输出进行多次更改后,最新的:

$ cat input.csv
Hybridization REF       TCGA-A3-3306-01A-01R-0864-07    TCGA-A3-3306-01A-01R-0864-07    TCGA-A3-3306-01A-01R-0864-07    TCGA-A3-3307-01A-01R-0864-07    TCGA-A3-3307-01A-01R-0864-07      TCGA-A3-3307-01A-01R-0864-07
gene    raw_counts      median_length_normalized        RPKM    raw_counts      median_length_normalized        RPKM
?|100130426     1       0.122549019607843       0.0330807728010661      0       0       0

上面的awk生成:

Hybridization REF       TCGA-A3-3306-01A-01R-0864-07    TCGA-A3-3307-01A-01R-0864-07
gene    raw_counts      raw_counts
?|100130426     1       0
velaa5lx

velaa5lx2#

您可以执行以下操作:

awk 'BEGIN{FS=OFS="\t"}
FNR==1{
    header[1]
    for(i=2;i<=NF;i++) if($i=="raw_counts") header[i]
}
{
    for (i=1;i<=NF;i++) 
        if(i in header) {printf("%s%s", sep, $i); sep=OFS}
    print ""
}' file

不过,第一次打印时,它会打印头文件,从那时起,它只打印与这些头文件关联的值。

uqjltbpv

uqjltbpv3#

**更新1:**使整个系统端到端工作(https至关重要):

  • --我使用了bsd-tar而不是gnu-tar *
curl -s -L -f -g '

   https://gdac.broadinstitute.org/runs/stddata__2016_01_28/
                                       data/KIPAN/20160128/
           gdac.broadinstitute.org_KIPAN.
            Merge_rnaseq__illuminahiseq_rnaseq__unc_edu__Level_3
             __gene_expression__data.Level_3.2016012800.0.0.tar.gz' | 

 tar -xvO -f- |

 mawk '{ print $1 '"$( jot -s '' -w ',$%d' - 2 14 3 )"' }' OFS='\t' | 

 gcat -n

不要浪费时间循环列- * 使用seqjot动态生成静态awk代码:*

gawk -be '{ print $1 '"$( jot -s '' -w ',$%d' - 2 14 3 )"' }'

  # gawk profile, created Fri Feb  3 13:08:10 2023

  # Rule(s)

 1  {
 1      print $1, $2, $5, $8, $11, $14
     }

gene raw_counts raw_counts raw_counts raw_counts raw_counts

相关问题