shell 查找以相同的17个字符结尾的文件组

sqserrrh 于 2023-11-21 发布在 Shell

关注(0)|答案(4)|浏览(114)

我正在抓取具有独特和常见模式的文件。我正在尝试匹配常见模式。目前正在尝试使用bash。我可以使用python或其他。

file1_02_01_2021_002244.mp4
file2_02_01_2021_002244.mp4
file3_02_01_2021_002244.mp4
# _02_01_2021_002244.mp4 should be the 'match all files that contain this string'

file1_03_01_2021_092200.mp4
file2_03_01_2021_092200.mp4
file3_03_01_2021_092200.mp4
# _03_01_2021_092200.mp4 is the match
...    
file201_01_01_2022_112230.mp4
file202_01_01_2022_112230.mp4
file203_01_01_2022_112230.mp4
# _01_01_2022_112230.mp4 is the match

字符串
我们的目标是找到所有的匹配从文件的最后回到第一个uniq字符，然后将它们移动到一个文件夹.可操作的部分将很容易.我只是需要帮助与匹配.

find -type f $("all that match the same last 17 characters of the file name"); do
    do things
done

型
这是我的示例目录：

total 28480
drwxr-xr-x  2 user  user    64B Feb 24 10:49 dir1
drwxr-xr-x  2 user  user    64B Feb 24 10:49 dir2
-rw-r--r--  2 user  user   6.8M Feb 24 08:59 file1_02_01_2021_002244.mp4
-rw-r--r--  2 user  user   468K Feb 24 09:06 file1_03_01_2021_092200.mp4
-rw-r--r--  2 user  user   4.5M Feb 24 08:59 file2_02_01_2021_002244.mp4
-rw-r--r--  2 user  user   665K Feb 24 09:06 file2_03_01_2021_092200.mp4
-rw-r--r--  1 user  user     0B Feb 24 10:49 otherfile1
-rw-r--r--  1 user  user     0B Feb 24 10:49 otherfile2

型
我已经得到了它的工作与建议的答案标记为正确的.他们的Python方法可能会更好地工作（特别是与文件名有空格在他们），但我不精通Python足以使它做我想要的一切.完整的脚本可以在下面找到：

#!/usr/local/bin/bash
# this is my solution
# create array with patterns
aPATTERN=($(find . -type f -name "*.mp4" | sed 's/^[^_]*//'|sort -u ))

# iterate through all patterns, do things
for each in ${aPATTERN[@]}; do
        # create a temp working directory for files that match the pattern
        vDIR=`gmktemp -d -p $(pwd)`
        # create array of all files found matching the pattern
        aFIND+=(`find . -mindepth 1 -maxdepth 1 -type f -iname \*$each`)
        # move all files that match the match to the working temp directory
        for file in ${aFIND[@]}; do
                mv -iv $file $vDIR
        done
        # reset the found files array, get ready for next pattern
        aFIND=()
done

型

shell

来源：https://stackoverflow.com/questions/71242031/find-groups-of-files-that-end-with-the-same-17-characters

4条答案

按热度按时间

cgfeq70w1#

在python中：

import os

os.chdir("folder_path")

data = {}
data = [[file[-22:], file] for file in os.listdir()]

output = {}
for pattern, filename in data:
    output.setdefault(pattern, []).append(filename)
print(output)

字符串
这将创建一个将每个文件与相应模式相关联的dict。

输出：

{
    '_03_01_2021_092200.mp4': ['file1_03_01_2021_092200.mp4', 'file3_03_01_2021_092200.mp4', 'file2_03_01_2021_092200.mp4'], 
    '_01_01_2022_112230.mp4': ['file202_01_01_2022_112230.mp4', 'file201_01_01_2022_112230.mp4', 'file203_01_01_2022_112230.mp4'], 
    '_02_01_2021_002244.mp4': ['file1_02_01_2021_002244.mp4', 'file2_02_01_2021_002244.mp4', 'file3_02_01_2021_002244.mp4']
}

型

赞(0）回复(0）举报 2023-11-21

vcudknz32#

试着玩这个
首先得到所有模式排序和uniq

find ./data -type f -name "*.mp4" | sed 's/^[^_]*//'|sort -u

字符串
使用regex

find ./data -type f -regextype sed -regex '.*_[0-9]\{2\}_[0-9]\{2\}_[0-9]\{4\}_[0-9]\{6\}\.mp4$'| sed 's/^[^_]*//'|sort -u

型
然后通过while循环遍历模式以查找每个模式的文件

while read pattern
do
   # find and exec
   find ./data -type f -name "*$pattern" -exec mv {} /to/whatever/you/want/ \;
   #or find and xargs
   find ./data -type f -name "*$pattern" | xargs -I {} mv {} /to/whaterver/you/want/
done < <(find ./data -type f -name "*.mp4" | sed 's/^[^_]*//'|sort -u)

型

赞(0）回复(0）举报 2023-11-21

8tntrjer3#

有几种方法可以实现这一点，包括编写bash脚本，但如果是我，我会选择快速简单的方法。使用grep并阅读：

PATTERN=_02_01_2021_002244.mp4
find . -name '*.mp4' | grep $PATTERN; while read -t 1 A; do echo $A; done

字符串
可能有更好的方法，我还没有想到，但这得到了工作完成。

赞(0）回复(0）举报 2023-11-21

qyyhg6bp4#

试试这个：

#!/bin/bash

while IFS= read -r line
do
    if [[ "$line" == *_+([0-9])_+([0-9])_+([0-9])_+([0-9])\.mp4 ]]
    then
        echo "MATCH: $line"
    else
        echo "no match: $line"
    fi
done < <(/bin/ls -c1)

字符串
记住，当你构建模式时，is使用的是globbing，而不是regex。
这就是为什么我没有使用[0-9]{2}来匹配2个数字，{}在globbing中不会这样做，就像在regex中一样。
要使用正则表达式，请使用：用途：

#!/bin/bash

while IFS= read -r line
do
    if [[ $(echo "$line" | grep -cE '*_[0-9]{2}_[0-9]{2}_[0-9]{4}_[0-9]{6}\.mp4') -ne 0 ]]
    then
        echo "MATCH: $line"
    else
        echo "no match: $line"
    fi
done < <(/bin/ls -c1)

型
这是一个更精确的匹配，因为您可以指定在每个子模式中接受多少位数。

赞(0）回复(0）举报 2023-11-21

我来回答

shell 查找以相同的17个字符结尾的文件组

4条答案

相关问题

热门标签

最新问答