如何使用Python3按照字母数字顺序对os.walk(path)进行排序，重复的放在原始文件之后？

f45qwnt8 于 2023-03-04 发布在 Python

关注(0)|答案(2)|浏览(281)

在python 3（特别是python 3.10.6）中，如何改变os.walk(path)对找到的文件进行排序的方式？

IMG0001.jpg
IMG0002.jpg
IMG0002(1).jpg
IMG0002(2).jpg
IMG0003.jpg

如果每个(n)重复文件都排在原始文件之后，您将如何按此顺序对其进行排序？目前，os.walk(path)是这样排序的：

IMG0001.jpg
IMG0002(1).jpg
IMG0002(2).jpg
IMG0002.jpg
IMG0003.jpg

我想主要的问题是默认的排序方法给(（还有-）的排序值比给扩展中的.的排序值要高，如果这在这里是正确的，你将如何修改哪些特殊字符排在其他字符之前呢？
我试过使用sorted(files)，但是它的排序方式和os.walk(path)一样。如果我尝试sorted(files, reverse=True),，那么当原件排在副本之前时，多个副本现在向后排序，所有原件也向后排序，即：

IMG0003.jpg
IMG0002.jpg
IMG0002(2).jpg
IMG0002(1).jpg
IMG0001.jpg

python-3.x

来源：https://stackoverflow.com/questions/75599415/how-to-sort-os-walkpath-in-alphanumeric-order-with-duplicates-coming-after-th

2条答案

按热度按时间

tcomlyy61#

字符串排序是按字典顺序的，所以如果你想做些不同的事情，你需要一个自定义的排序键。这比预期的要复杂一些，但是类似下面的东西应该可以工作：

import os
import re

def key(fname):
    basename, ext = os.path.splitext(fname)
    v = 0
    if m := re.match(r"(.*)\((\d+)\)$", basename):
        basename, v = m.groups()
        v = int(v)
    return basename, ext, v

现在，您应该能够使用类似files.sort(key=key)的代码。

赞(0）回复(0）举报 2023-03-04

6tr1vspr2#

使用pathlib.Path可以更好地理解文件名语义，构建一个以特殊情况为前导元素、文件名为结尾的元组，对元组列表进行排序，但只保留最后一个元素。

def test():

    from pathlib import Path

    def filenamesort(inp : list[str]):
        """build a list of custom tuples from the filename list, sort it and 
        return the rightmost field, which is the filename.
        """
        
        def tupleize(v):
            """ returns a tuple of strings based on Path.stem for the filename

            special case.  split into the part before the last `(` and what comes after

            IMG0002(1).jpg => ('IMG0002', 1, 'IMG0002(2).jpg')

            normal case, return the stem and an empty value

            IMG0002.jpg  => ('IMG0002', 0, 'IMG0002.jpg')

            The last element, least significant to sort is the filename

            to be more solid foo(xxx).jpg should be ignored as xxx is not a numeric.
            
            """
            
            pa = Path(v)
            stem = pa.stem
            if stem.endswith(")"):
                lead, seq = stem.rsplit("(",maxsplit=1)
                return (lead,int(seq.rstrip(")")),v)
            else:
                # "" will sort before "1)"
                return (stem,0,v)

        li = [tupleize(v) for v in inp]

        #sort the list then return the last position in the tuple: the filename proper
        return [v[-1] for v in sorted(li)]

    
    def fmt(sin : str):
        res = [v for line in sin.splitlines() if (v:=line.strip())]
        return res

    inp = fmt("""IMG0001.jpg
IMG0002(1).jpg
IMG0002(2).jpg
IMG0002(11).jpg
IMG0002.jpg
IMG0003.jpg
""")

    exp = fmt("""IMG0001.jpg
IMG0002.jpg
IMG0002(1).jpg
IMG0002(2).jpg
IMG0002(11).jpg
IMG0003.jpg""")


    dataexp = [
        (inp,exp),
        ]

    for inp, exp in dataexp:
        for f in [filenamesort]:
            got = f(inp)
            msg = f"\n{f.__name__} for {str(inp):100.100} \nexp :{exp}:\ngot :{got}:\n"
            if exp == got:
                print(f"✅! {msg}")
            else:
                print(f"❌!  {msg}")

test()

输出：

✅! 
filenamesort for ['IMG0001.jpg', 'IMG0002(1).jpg', 'IMG0002(2).jpg', 'IMG0002(11).jpg', 'IMG0002.jpg', 'IMG0003.jpg'] 
exp :['IMG0001.jpg', 'IMG0002.jpg', 'IMG0002(1).jpg', 'IMG0002(2).jpg', 'IMG0002(11).jpg', 'IMG0003.jpg']:
got :['IMG0001.jpg', 'IMG0002.jpg', 'IMG0002(1).jpg', 'IMG0002(2).jpg', 'IMG0002(11).jpg', 'IMG0003.jpg']:

我检查了tupleize（元组的参数）是否可以用作sort的key参数。
也就是说sorted(inp,key=tupleize)也能工作
Wim是对的，这是失败的不同的延伸。修复与调整如下：

pa = Path(v)
        stem = pa.stem
        if stem.endswith(")"):
            lead, seq = stem.rsplit("(",maxsplit=1)
            return (lead,pa.suffix, int(seq.rstrip(")")),v)
        else:
            # "" will sort before "1)"
            return (stem,pa.suffix,0,v)

赞(0）回复(0）举报 2023-03-04

我来回答

如何使用Python3按照字母数字顺序对os.walk(path)进行排序，重复的放在原始文件之后？

2条答案

输出：

相关问题

热门标签

最新问答