使用Python os.walk构建JSON对象目录树

ntjbwcob  于 2022-12-30  发布在  Python
关注(0)|答案(2)|浏览(143)

因此,在解决这个问题时,我遇到了一个心理障碍,我所考虑的其他问题都没有真正抓住我的特定用例。有一个问题很接近,但我不太清楚如何具体地定制它。基本上,我有一个脚本,它使用os.walk()并重命名目标目录中的所有文件(以及任何子目录),具体的问题是,我试图以JSON格式记录操作的结果,输出如下:

{
    "timestamp": "2022-12-26 09:40:55.874718",
    "files_inspected": 512,
    "files_renamed": 256,
    "replacement_rules": {
        "%20": "_",
        " ": "_"
    },
    "target_path": "/home/example-user/example-folder",
    "data": [
        {
            "directory": "/home/example-user/example-folder",
            "files": [
                {
                    "original_name": "file 1.txt",
                    "new_name": "file_1.txt"
                },
                {
                    "original_name": "file 2.txt",
                    "new_name": "file_2.txt"
                },
                {
                    "original_name": "file 3.txt",
                    "new_name": "file_3.txt"
                }
            ],
            "children": [
                {
                    "directory": "/home/example-user/example-folder/sub-folder",
                    "files": [
                        {
                            "original_name": "file 1.txt",
                            "new_name": "file_1.txt"
                        },
                        {
                            "original_name": "file 2.txt",
                            "new_name": "file_2.txt"
                        },
                        {
                            "original_name": "file 3.txt",
                            "new_name": "file_3.txt"
                        }
                    ]
                }
            ]
        }
    ]
}

3元组中的第一项(dirpath)作为目标目录开始,并且在同一循环中,3元组中的第二项(dirnames)是该dirpath中的目录列表(如果有的话)。然而,我认为让我搞砸的是,在第二个循环上,dirpath成为前一循环中dirnames中的第一项(如果有的话)。我在计算将这个3元组循环数据转换到上面的嵌套层次结构的逻辑时遇到了麻烦。理想情况下,如果不具有子目录(children)的目录对象也根本不具有children密钥,则是很好的,但是将其设置为空列表也是很好的。
我非常感谢您对如何从os.walk()提供的内容中实现所需日志结构的任何建议或见解。同时欢迎您对改进JSON对象结构提出任何建议。谢谢!
https://github.com/dblinkhorn/file_renamer

bzzcjhmw

bzzcjhmw1#

您的方法中的一个问题是,您想要一个通过递归最自然地获得的层次结构结果,而os.walk使层次结构扁平化。
出于这个原因,我建议使用os.scandir来代替,它也恰好是与目录树交互的性能最好的工具之一。

import os
from datetime import datetime

def rename(topdir, rules, result=None, verbose=False, dryrun=False):
    is_toplevel = result is None
    if is_toplevel:
        result = dict(
            timestamp=datetime.now().isoformat(sep=' ', timespec='microseconds'),
            dryrun=dryrun,
            directories_inspected=0,
            files_inspected=0,
            files_renamed=0,
            replacement_rules=rules,
            target_path=topdir,
        )
    files = []
    children = []
    with os.scandir(topdir) as it:
        for entry in it:
            if entry.is_dir():
                children.append(rename(entry.path, rules, result, verbose, dryrun))
            else:
                result['files_inspected'] += 1
                for old, new in rules.items():
                    if old in entry.name:
                        newname = entry.name.replace(old, new)
                        dst = os.path.join(topdir, newname)
                        if not dryrun:
                            os.rename(entry.path, dst)
                            result['files_renamed'] += 1
                        if verbose:
                            print(f'{"[DRY-RUN] " if dryrun else ""}rename {entry.path!r} to {dst!r}')
                        files.append(dict(original_name=entry.name, new_name=newname))
                        break
    result['directories_inspected'] += 1
    res = dict(directory=topdir)
    if files:
        res.update(dict(files=files))
    if children:
        res.update(dict(children=children))
    if is_toplevel:
        res = result | res
    return res

示例

让我们构建一个可重现的示例
x一个一个一个一个x一个一个二个x
现在,使用上面的rename()函数:

rules = {'%20': '_', ' ': '_'}
res = rename('example', rules, verbose=True, dryrun=True)
# [DRY-RUN] rename 'example/example-folder/file 2.txt' to 'example/example-folder/file_2.txt'
# [DRY-RUN] rename 'example/example-folder/file 1.txt' to 'example/example-folder/file_1.txt'
# [DRY-RUN] rename 'example/example-folder/sub/folder/file 2.txt' to 'example/example-folder/sub/folder/file_2.txt'
# [DRY-RUN] rename 'example/example-folder/sub/folder/file 1.txt' to 'example/example-folder/sub/folder/file_1.txt'
# [DRY-RUN] rename 'example/example-folder/sub/folder/foo bar 1.txt' to 'example/example-folder/sub/folder/foo_bar_1.txt'
# [DRY-RUN] rename 'example/example-folder/foo bar 1.txt' to 'example/example-folder/foo_bar_1.txt'

>>> print(json.dumps(res, indent=4))
{
    "timestamp": "2022-12-29 15:24:06.930252",
    "dryrun": true,
    "directories_inspected": 4,
    "files_inspected": 6,
    "files_renamed": 0,
    "replacement_rules": {
        "%20": "_",
        " ": "_"
    },
    "target_path": "example",
    "directory": "example",
    "children": [
        {
            "directory": "example/example-folder",
            "files": [
                {
                    "original_name": "file 2.txt",
                    "new_name": "file_2.txt"
                },
                {
                    "original_name": "file 1.txt",
                    "new_name": "file_1.txt"
                },
                {
                    "original_name": "foo bar 1.txt",
                    "new_name": "foo_bar_1.txt"
                }
            ],
            "children": [
                {
                    "directory": "example/example-folder/sub",
                    "children": [
                        {
                            "directory": "example/example-folder/sub/folder",
                            "files": [
                                {
                                    "original_name": "file 2.txt",
                                    "new_name": "file_2.txt"
                                },
                                {
                                    "original_name": "file 1.txt",
                                    "new_name": "file_1.txt"
                                },
                                {
                                    "original_name": "foo bar 1.txt",
                                    "new_name": "foo_bar_1.txt"
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
fruv7luv

fruv7luv2#

我不确定我是否理解正确,但是要从os.walk获得一个简单的嵌套结构,可以尝试以下方法:

import json
import os
from typing import Union

structure = {}

def get_element(dirpath: str) -> Union[dict, None]:
    _element = structure[root]
    if dirpath.startswith(root):
        dirpath = dirpath[len(root)+1:]
    for key in dirpath.split(os.sep):
        try:
            _element = _element[key]
        except KeyError:
            return None
    return _element

target = os.path.abspath('sample')
root, _ = target.rsplit(os.sep, 1)
structure[root] = {}
for path, children, files in os.walk(target):
    element = get_element(path)
    if element is None:
        element = structure[root][path.split(os.sep)[-1]] = {}
    element['files'] = files
    for child in children:
        element[child] = {}

print(json.dumps(structure, sort_keys=True, indent=4))

会产生一个输出

{
    "/path/to/folder": {
        "sample": {
            "dir": {
                "files": [
                    "more_samples.txt"
                ],
                "subdir": {
                    "files": [
                        "important.txt"
                    ]
                },
                "with": {
                    "children": {
                        "files": [
                            "other.txt",
                            "some.txt"
                        ]
                    },
                    "files": []
                }
            },
            "files": [
                "test.txt"
            ]
        }
    }
}

这个有用吗?
注意:这是一个最小的例子,试图解决请求的主要部分。你需要围绕它构建你的结构的其余部分。
注2:如果你在某处有一个名为files的子文件夹,关键字files可能会引起冲突。请明智地选择。)

相关问题