csv 将grep命令的多行输出格式化为列,添加/替换filename作为输出字段

oyxsuwqo  于 2023-01-28  发布在  其他
关注(0)|答案(2)|浏览(189)

我正在尝试将多行egrep查询的输出格式化为CSV兼容格式。
我需要从一个很大的文件列表中抓取一些值(其中一些可能不包含我要查找的值)
我使用的grep命令是:

grep -e Name -e Type -e Schedule -e Pool -e Storage \*|awk -F' = '  '{print $1,$2}'|sort

这将返回如下输出:

IRVLinuxDefault.cfg:  Name "IRVLinuxDefault"
IRVLinuxDefault.cfg:  Pool "IRV_DD890_Full60"
IRVLinuxDefault.cfg:  Schedule "IRV_Backups"
IRVLinuxDefault.cfg:  Storage "IRV_SD_DD890"
IRVLinuxDefault.cfg:  Type "Backup"
LVS_60Day_NDMP_Defs.cfg:  Name "LVS_60Day_NDMP_Defs"
LVS_60Day_NDMP_Defs.cfg:  Pool "LVS_DD_AV_NDMP"
LVS_60Day_NDMP_Defs.cfg:  Schedule "LVS_NDMP_Monthly"
LVS_60Day_NDMP_Defs.cfg:  Storage "LVS_SD_DD990_AV_NDMP"
LVS_60Day_NDMP_Defs.cfg:  Type "Backup"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Name "LVS_60Day_NDMP_NOFileSet_Defs"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Pool "LVS_DD_AV_NDMP"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Schedule "LVS_NDMP_Monthly"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Storage "LVS_SD_DD990_AV_NDMP"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Type "Backup"
LVS_Datalake2_Defs.cfg:  Name "LVS_Datalake2_Defs"
LVS_Datalake2_Defs.cfg:  Pool "LVS_WAS_SD101_13Mo-cloud"
LVS_Datalake2_Defs.cfg:  Schedule "WeeklyCycle"
LVS_Datalake2_Defs.cfg:  Storage "LVS_WAS_SD101_13Mo-cloud"
LVS_Datalake2_Defs.cfg:  Type "Backup"

我尝试以以下格式输出这些值字段:FILE,NAME,NAME,POOL,SCHEDULE,STORAGE,TYPE每列都有一个列标题。如果其中一个文件不包含grepped for值,我希望在该空间输出一个空记录。
输出I * want * 看起来像csv(示例如下),去掉了任何""或:""(注意,所需输出的3底行缺少Pool字段,因此有2个逗号保留空单元格):

FILE,NAME,POOL,SCHEDULE,STORAGE,TYPE  
IRVLinuxDefault.cfg,IRVLinuxDefault,IRV_DD890_Full60,IRV_Backups,IRV_SD_DD890,Backup  
LVS_60Day_NDMP_Defs.cfg,LVS_DD_AV_NDMP,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup  
LVS_60Day_NDMP_NOFileSet_Defs.cfg,,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup

我已经尝试了awk、sed、GNU datamash(转置)的多种方法,但没有什么运气。
有什么建议吗?
一个三个一个x一个四个一个x一个五个一个x一个六个一个

v9tzhpje

v9tzhpje1#

对于未提供预期值的情况,此脚本 * 将允许您指定*要替换的字符串
它还可以适应环境,并允许您指定分隔符(用于输入)以提取所需的变量值。

注意:由于与awk语法冲突,您不能使用单引号/双引号作为split函数的分隔符,因此我在您提供的输入和将其转换为所需输出的脚本之间使用了sed。

#!/bin/bash

### Original command
#grep -e Name -e Type -e Schedule -e Pool -e Storage \*|awk -F' = '  '{print $1,$2}'|sort

sample="grepOutput.txt"

cat >"${sample}" <<"EnDoFiNpUt"
IRVLinuxDefault.cfg:  Name "IRVLinuxDefault"
IRVLinuxDefault.cfg:  Pool "IRV_DD890_Full60"
IRVLinuxDefault.cfg:  Schedule "IRV_Backups"
IRVLinuxDefault.cfg:  Storage "IRV_SD_DD890"
IRVLinuxDefault.cfg:  Type "Backup"
LVS_60Day_NDMP_Defs.cfg:  Name "LVS_60Day_NDMP_Defs"
LVS_60Day_NDMP_Defs.cfg:  Pool "LVS_DD_AV_NDMP"
LVS_60Day_NDMP_Defs.cfg:  Schedule "LVS_NDMP_Monthly"
LVS_60Day_NDMP_Defs.cfg:  Storage "LVS_SD_DD990_AV_NDMP"
LVS_60Day_NDMP_Defs.cfg:  Type "Backup"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Name "LVS_60Day_NDMP_NOFileSet_Defs"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Pool "LVS_DD_AV_NDMP"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Schedule "LVS_NDMP_Monthly"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Storage "LVS_SD_DD990_AV_NDMP"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Type "Backup"
LVS_Datalake2_Defs.cfg:  Name "LVS_Datalake2_Defs"
LVS_Datalake2_Defs.cfg:  Pool "LVS_WAS_SD101_13Mo-cloud"
LVS_Datalake2_Defs.cfg:  Schedule "WeeklyCycle"
LVS_Datalake2_Defs.cfg:  Storage "LVS_WAS_SD101_13Mo-cloud"
LVS_Datalake2_Defs.cfg:  Type "Backup"
EnDoFiNpUt

### cat emulates original grep command output
cat "${sample}" | sed 's+\"+\|+g' |
awk -v delim='|' -v defval="" 'BEGIN{
    printf("FILENAME,NAME,POOL,SCHEDULE,STORAGE,TYPE") ;
    lastFN="" ;
}
{
    pos=index($0,":") ;
    if( pos > 0 ){
        FN=substr($0, 1, pos-1) ;
        split($0, vals, delim );

        if( FN != lastFN ){
            printf("\n%s", FN) ;
            lastFN=FN ;
        } ;
        if( vals[2] == "" ){
            printf(",%s", defval ) ;
        }else{
            printf(",%s", vals[2] ) ;
        } ;
    } ;
}
END{
    print "" ;
}'

输出如下所示:

FILENAME,NAME,POOL,SCHEDULE,STORAGE,TYPE
IRVLinuxDefault.cfg,IRVLinuxDefault,IRV_DD890_Full60,IRV_Backups,IRV_SD_DD890,Backup
LVS_60Day_NDMP_Defs.cfg,LVS_60Day_NDMP_Defs,LVS_DD_AV_NDMP,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup
LVS_60Day_NDMP_NOFileSet_Defs.cfg,LVS_60Day_NDMP_NOFileSet_Defs,LVS_DD_AV_NDMP,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup
LVS_Datalake2_Defs.cfg,LVS_Datalake2_Defs,LVS_WAS_SD101_13Mo-cloud,WeeklyCycle,LVS_WAS_SD101_13Mo-cloud,Backup
hiz5n14c

hiz5n14c2#

一旦awk成为解决方案的一部分,通常就不需要grep了。
将OP的grep|awk|sort输出逆向工程到一些示例文件中:

$ head *.cfg
==> IRVLinuxDefault.cfg <==
  Name = "IRVLinuxDefault"
  Pool = "IRV_DD890_Full60"
  Schedule = "IRV_Backups"
  Storage = "IRV_SD_DD890"
  Type = "Backup"

==> LVS_60Day_NDMP_Defs.cfg <==
  Name = "LVS_60Day_NDMP_Defs"
  Pool = "LVS_DD_AV_NDMP"
  Schedule = "LVS_NDMP_Monthly"
  Storage = "LVS_SD_DD990_AV_NDMP"
  Type = "Backup"

==> LVS_60Day_NDMP_NOFileSet_Defs.cfg <==                   # NOTE: missing an entry for "Pool"
  Name = "LVS_60Day_NDMP_NOFileSet_Defs"
  Schedule = "LVS_NDMP_Monthly"
  Storage = "LVS_SD_DD990_AV_NDMP"
  Type = "Backup"

==> LVS_Datalake2_Defs.cfg <==
  Name = "LVS_Datalake2_Defs"
  Pool = "LVS_WAS_SD101_13Mo-cloud"
  Schedule = "WeeklyCycle"
  Storage = "LVS_WAS_SD101_13Mo-cloud"
  Type = "Backup"

一个awk创意:

awk '

function print_record(  ) {
    if (fname)
        print fname,record["name"],record["pool"],record["schedule"],record["storage"],record["type"]

    delete record                                                   # clear previous line contents
}

BEGIN         { OFS=","

                hdr="FILE,NAME,POOL,SCHEDULE,STORAGE,TYPE"
                print hdr

                n=split(tolower(hdr),a,",")                         # build array of field names
                for (i=2;i<=n;i++)                                  # convert field names to ...
                    fields[a[i]]                                    # associative array indices
              }

FNR==1        { print_record()                                      # print previous file contents
                fname=FILENAME
              }

              { split($0,a,"\"")                                    # split line on double quotes
                key=tolower($1)                                     # need lowercase field name to match fields[] array indices
              }

key in fields { record[key]=a[2] }                                  # if 1st field is an index in fields[] array then save the 2nd double-quote delimited field

END           { print_record()   }                                  # flush last file contents to stdout
' *cfg > all.csv

这将产生:

$ cat all.csv
FILE,NAME,POOL,SCHEDULE,STORAGE,TYPE
IRVLinuxDefault.cfg,IRVLinuxDefault,IRV_DD890_Full60,IRV_Backups,IRV_SD_DD890,Backup
LVS_60Day_NDMP_Defs.cfg,LVS_60Day_NDMP_Defs,LVS_DD_AV_NDMP,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup
LVS_60Day_NDMP_NOFileSet_Defs.cfg,LVS_60Day_NDMP_NOFileSet_Defs,,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup
LVS_Datalake2_Defs.cfg,LVS_Datalake2_Defs,LVS_WAS_SD101_13Mo-cloud,WeeklyCycle,LVS_WAS_SD101_13Mo-cloud,Backup

相关问题