csv MacOS -将多个文本文件合并到一个电子表格中,每列一个文件?

4dbbbstv  于 2023-04-18  发布在  Mac
关注(0)|答案(2)|浏览(208)

使用MacOS终端,是否可以获取文本文件的目录,并将它们全部组合到一个电子表格中(CSV或Numbers格式),以便:

  • 每个文件都在单独的一栏中
  • txt文件的每一行都在单独的行中。
  • 文件按字母顺序(使用文本文件的文件名的第一个字母)放置在电子表格中。

示例1:以下是合并之前我的文本文件的外观:

示例2:以下是合并后我的文本文件在电子表格中的外观:

(这些例子是部分摘录。我实际上有100个文件要合并)。

我尝试过的步骤:

1.我在Stack Overflow上搜索了答案,但关于这个任务的所有其他问题都使用Python或Panda。我更喜欢可以直接从MacOS终端完成的解决方案,而无需安装Python或Panda等软件包。
1.通过研究,我认为可以使用paste命令:
paste -d '\t' *.txt > ^0-merged.csv
但是,当我尝试这样做时,它会产生以下错误消息:paste: Too many open files。它还生成一个完全空白的CSV文件。

u0njafvf

u0njafvf1#

您可以循环追加每个文件。

touch merged.csv
for f in *.txt; do paste -d '\t' $f merged.csv > temp; cp temp merged.csv; done; rm temp

您必须先创建文件,因为如果找不到文件,粘贴将失败。
https://unix.stackexchange.com/questions/205642/combining-large-amount-of-files
为中包含空格的文件添加新的想法。

#!/bin/bash
touch merged.csv

# save and change IFS
OLDIFS=$IFS
IFS=$'\n'
 
# read all file name into an array
fileArray=($(find ./ -name "*.txt" | sort))
 
# restore it
IFS=$OLDIFS
 
# get length of an array
tLen=${#fileArray[@]}
 
# use for loop read all filenames
for (( i=0; i<${tLen}; i++ ));
do
  paste -d '\t' "${fileArray[$i]}" merged.csv > temp; 
  cp temp merged.csv; 
done
rm temp
zbwhf8kr

zbwhf8kr2#

Ruby是MacOS的一部分。
给出:

head -n 3 *.txt
==> GOOD THINGS IN LIFE.txt <==
Art
Fun
Hugs

==> IN THE BACKYARD.txt <==
Hose
Tree
Soil

==> KITCHEN CUPBOARD ESSENTIALS.txt <==
Tea
Rice
Milk

==> KNITTING STITCHES.txt <==
Rib
Dip
Seed

# and the rest of your lines in each case...

您可以:

ruby -e '
a=[]
ARGV.sort.each{|fn|
    a<<[fn]+File.open(fn).read.split(/\R/)
}
a.transpose.each{|sa|
    puts sa.join(",")
}
' *.txt

图纸:

GOOD THINGS IN LIFE.txt,IN THE BACKYARD.txt,KITCHEN CUPBOARD ESSENTIALS.txt,KNITTING STITCHES.txt
Art,Hose,Tea,Rib
Fun,Tree,Rice,Dip
Hugs,Soil,Milk,Seed
Earth,Fence,Salt,Tile
Honor,Porch,Pesto,Linen
Space,Patio,Flour,Cable
Sport,Grass,Honey,Wicker
Intelligence,Wading Pool,Baking Powder,Knotted Boxes
Innovation,Welcome Mat,Vegetable Oil,Chinese Wave
Confidence,Back Stoop,Tomato Paste,Checkerboard
Good Deeds,Fruit Tree,Black Pepper,Herringbone
Creativity,Downspout,Baking Soda,Stockinette
Education,Birdbath,Ketchup,Garter
Kindness,Terrace,Surer,Waffle
Integrity,Planter,Sugar,Puri Ridge
Faith,Carport,Coffee,Netted
Friends,Flowerbed,Cinnamon,Elongated
Respect,Shovel,Cheese,Farrow Rib
People,Hedges,Bread,Plaited
Yourself,Rocks,Olive Oil,Clamshell
Happiness,Lawnmower,Crackers,Bamboo
Heart,Hot Tub,Pasta,English Rib
Religion,Garden,Scissors,Basket
Wisdom,Stoop,Garlic,Raspberry

如果你想要一个“正确的”csv,其中包含带引号的字段,可以更好地与Excel配合使用,你可以使用Ruby附带的CSV模块:

ruby -r csv -e '
a=[]
ARGV.sort.each{|fn|
    a<<[fn]+File.open(fn).read.split(/\R/)
}
a=a.transpose
puts CSV.generate(**{headers:true, quote_empty:true, force_quotes:true}){|csv|
    csv<<a[0]
    a[1..].each{|row|
        csv<<row
    }
}
' *.txt

图纸:

"GOOD THINGS IN LIFE.txt","IN THE BACKYARD.txt","KITCHEN CUPBOARD ESSENTIALS.txt","KNITTING STITCHES.txt"
"Art","Hose","Tea","Rib"
"Fun","Tree","Rice","Dip"
"Hugs","Soil","Milk","Seed"
"Earth","Fence","Salt","Tile"
"Honor","Porch","Pesto","Linen"
"Space","Patio","Flour","Cable"
"Sport","Grass","Honey","Wicker"
"Intelligence","Wading Pool","Baking Powder","Knotted Boxes"
"Innovation","Welcome Mat","Vegetable Oil","Chinese Wave"
"Confidence","Back Stoop","Tomato Paste","Checkerboard"
"Good Deeds","Fruit Tree","Black Pepper","Herringbone"
"Creativity","Downspout","Baking Soda","Stockinette"
"Education","Birdbath","Ketchup","Garter"
"Kindness","Terrace","Surer","Waffle"
"Integrity","Planter","Sugar","Puri Ridge"
"Faith","Carport","Coffee","Netted"
"Friends","Flowerbed","Cinnamon","Elongated"
"Respect","Shovel","Cheese","Farrow Rib"
"People","Hedges","Bread","Plaited"
"Yourself","Rocks","Olive Oil","Clamshell"
"Happiness","Lawnmower","Crackers","Bamboo"
"Heart","Hot Tub","Pasta","English Rib"
"Religion","Garden","Scissors","Basket"
"Wisdom","Stoop","Garlic","Raspberry"

备注:
它还把文件名放在每一列的顶部。有什么方法可以省略文件名吗?还有,它似乎把大写A-Z和小写a-z分开处理(例如,所以A-Z文件名会先出现,然后是a-z文件名)谢谢!
如果你有不同长度的文件,你可以填充较短文件的结尾,这样你仍然有一个合适的矩阵来转置:

ruby -r csv -e '
a=[]
ARGV.sort_by{|s| s.downcase}.each{|fn|
    a<<File.open(fn).read.split(/\R/)
}
max_length=a.max_by{|sa| sa.length}.length
a.each.with_index{|sa,i| 
    if sa.length<max_length then a[i].concat [""]*(max_length-sa.length) end }
a=a.transpose
puts CSV.generate(**{headers:true, quote_empty:true, force_quotes:true}){|csv|
    csv<<a[0]
    a[1..].each{|row|
        csv<<row
    }
}
' *.txt

相关问题