linux 将一个巨大的文件分割成多个文本文件

jvlzgdj9  于 2023-06-21  发布在  Linux
关注(0)|答案(2)|浏览(188)

我有一个巨大的大小的文本文件,我想分裂它的基础上:

  • 大小(每个文件2GB;最后一个文件可能小于2GB)
  • 新行(文件应在新行的末尾拆分)
  • 将新文件作为文本文件
  • 并在每个文件的开头添加this is first line

例如:这样的文件

textdata1
textdata2
textdata3
textdata4
textdata5
textdata6

我希望输出为:textfile_1.txt

this is first line
textdata1
textdata2

可以是-textfile_2.txt

this is first line
textdata3
textdata4
textdata5
textdata6

我尝试使用-b <size>命令,但它在行中间分裂。

bfnvny8b

bfnvny8b1#

如果每个字符都是一个字节,那么你可以这样做(未经测试,使用任何awk):

awk '
    BEGIN {
        maxLgth = 2 * (1000 ^ 2)     # or use 1024 if appropriate
        hdr = "this is first line"
        outLgth = maxLgth + 1        # to ensure "out" gets populated for first line
    }
    {
        lineLgth = length($0) + 1    # +1 for the newline that print adds
        if ( (outLgth + lineLgth) > maxLgth ) {
            close(out)
            out = "out" (++outCnt)
            print hdr > out
            outLgth = length(hdr)
        }
        print > out
        outLgth += lineLgth
    }
' file
nmpmafwu

nmpmafwu2#

这个python脚本可以帮助你

import os
    
    def split_file(file_path, size_limit=2*1024**3, first_line="this is the first line"):
        counter = 1
        output_file = None
    
        with open(file_path, 'r') as f:
            for line in f:
                # If output_file is None or size limit exceeded, create a new file
                if output_file is None or os.path.getsize(output_file.name) + len(line.encode('utf-8')) > size_limit:
                    if output_file is not None:
                        output_file.close()
                    output_file = open(f"textfile_{counter}.txt", 'w')
                    output_file.write(first_line + "\n")
                    counter += 1
                output_file.write(line)
    
        if output_file is not None:
            output_file.close()
    
    # Call the function
    split_file("yourfile.txt")

相关问题