shell 根据值对数据集进行切片并将第一列条目分配为变量

bjp0bcyl  于 2023-05-29  发布在  Shell
关注(0)|答案(2)|浏览(124)

我在csv文件中有一个数据集,格式如下

Category;Name;email;Functions;Owner_ID;Backup_ID
A;Bos;user1@mail.com;Driver;123;321
A;Bos;user1@mail.com;Driver;123;321
A;Bos;user1@mail.com;Driver;123;321
B;Nos;user2@mail.com;Builder;456;654
C;Kos;user2@mail.com;Engineer;789;987
C;Kos;user2@mail.com;Engineer;789;987
D;Los;user3@mail.com;Architect;100;1

我想循环遍历它,首先,得到Category相同的子集。因此,在开始时,我想选择所有具有Category的行作为A。在选择它之后,我想把每一行的第一个条目赋值为一个变量。所以,我想有Category=AName=Bosemail=user1@mail.com等等。这些变量将用于分析。
在完成CategoryA之后,我想移动到CategoryB并执行相同的操作,然后是C,依此类推。
现有代码如下

for file in ${outputfolder}*.csv
do  
    awk -F';' 'FNR==1{split($0, a); next}{for (i=1;i<=NF;i++)print a[i] "=" $i; print ""}' $file
done

对于每一行,它将行的条目作为变量赋给行的Name。
编辑:
在第一次迭代中,我想获取属于类别A的所有项目:

A;Bos;user1@mail.com;Driver;123;321
A;Bos;user1@mail.com;Driver;123;321
A;Bos;user1@mail.com;Driver;123;321

在我只得到这些值之后,我想把第一列的值赋值为变量。
所以Category=AName=Bosemail=user1@mail.comFunctions=DriverOwner_ID=123Backup_ID=321 .
我接到这些任务后,要做一些手术。我只想要属于类别A的条目,而不是其他条目。
然后,我将进入B类并做同样的事情。C类也是如此。
类别的数量不是固定的。它们可以是10个或20个不同的。

gv8xihay

gv8xihay1#

如果是简单文件内容(没有引号或分号):

for f in "$outputfolder"/*.csv; do
    IFS=\; read -a cols <<<$(head -n 1 -- "$f")
    while IFS=\; read ${cols[@]}; do
        # do something here
        declare -p ${cols[@]}
        echo
    done < <(tail -n +2 -- "$f" | sort -t\; -k1,1 -s)
done

-s传递给支持它的排序,并且只对第一个字段进行排序,应该会保留大部分初始行顺序。
我假设头文件名和bash变量名一样法律的。
使用示例数据,将输出:

declare -- Category="A"
declare -- Name="Bos"
declare -- email="user1@mail.com"
declare -- Functions="Driver"
declare -- Owner_ID="123"
declare -- Backup_ID="321"

declare -- Category="A"
declare -- Name="Bos"
declare -- email="user1@mail.com"
declare -- Functions="Driver"
declare -- Owner_ID="123"
declare -- Backup_ID="321"

declare -- Category="A"
declare -- Name="Bos"
declare -- email="user1@mail.com"
declare -- Functions="Driver"
declare -- Owner_ID="123"
declare -- Backup_ID="321"

declare -- Category="B"
declare -- Name="Nos"
declare -- email="user2@mail.com"
declare -- Functions="Builder"
declare -- Owner_ID="456"
declare -- Backup_ID="654"

declare -- Category="C"
declare -- Name="Kos"
declare -- email="user2@mail.com"
declare -- Functions="Engineer"
declare -- Owner_ID="789"
declare -- Backup_ID="987"

declare -- Category="C"
declare -- Name="Kos"
declare -- email="user2@mail.com"
declare -- Functions="Engineer"
declare -- Owner_ID="789"
declare -- Backup_ID="987"

declare -- Category="D"
declare -- Name="Los"
declare -- email="user3@mail.com"
declare -- Functions="Architect"
declare -- Owner_ID="100"
declare -- Backup_ID="1"
8nuwlpux

8nuwlpux2#

我写了一些代码来分割标题行和数据行。但是由于变量名是未知的,它们必须保存在数组中。
下面是代码:

#!/bin/bash

file="input.csv"
if [[ ! -f "$file" ]]
then
    echo "ERROR: $file does not exist."
    exit 1
fi

# Build an array of the desired variables
firstline=$(head -1 "$file")
IFS=';' firstlinearray=($firstline)
numberofheaders=${#firstlinearray[@]}

# DEBUG
echo "----- HEADERS -----" >&2
echo "number of headers: $numberofheaders" >&2
for header in "${firstlinearray[@]}"
do
    echo "1 $header" >&2
done

while IFS= read -r line
do
    IFS=';' linearray=($line)

    # DEBUG
    echo "----------" >&2
    for item in "${linearray[@]}"
    do
        echo "$item" >&2
    done

    # Assign variables
    echo "----------"
    for (( i=0; i<numberofheaders; i++ ))
    do
        echo "${firstlinearray[$i]} == ${linearray[$i]}"

        # Put your code here to use the variables: ${firstlinearray[$i]}
        # and the values: ${linearray[$i]}

    done

done < <(tail -n +2 "$file" | sort)
  • 要正常运行,请将其保存到文件并执行以下操作:$ script.bash 2>/dev/null
  • 如果需要调试日志,请执行以下操作:$ script.bash

其输出为:

----------
Category == A
Name == Bos
email == user1@mail.com
Functions == Driver
Owner_ID == 123
Backup_ID == 321
----------
Category == A
Name == Bos
email == user1@mail.com
Functions == Driver
Owner_ID == 123
Backup_ID == 321
----------
Category == A
Name == Bos
email == user1@mail.com
Functions == Driver
Owner_ID == 123
Backup_ID == 321
----------
Category == B
Name == Nos
email == user2@mail.com
Functions == Builder
Owner_ID == 456
Backup_ID == 654
----------
Category == C
Name == Kos
email == user2@mail.com
Functions == Engineer
Owner_ID == 789
Backup_ID == 987
----------
Category == C
Name == Kos
email == user2@mail.com
Functions == Engineer
Owner_ID == 789
Backup_ID == 987
----------
Category == D
Name == Los
email == user3@mail.com
Functions == Architect
Owner_ID == 100
Backup_ID == 1

编辑5月28日

如果您希望只处理唯一的行,从而消除重复的行,请将上面代码的最后一行替换为这一行,这一行添加了uniq命令。

done < <(tail -n +2 "$file" | sort | uniq)

相关问题