如何流式读取node.js中目录？

abithluo 于 2023-02-03 发布在 Node.js

关注(0)|答案(6)|浏览(162)

假设我有一个包含100K+甚至500K+文件的目录，我想用fs.readdir读取目录，但它是异步的而不是流的，有人告诉我异步在读取整个文件列表之前会占用内存。
那么解决方案是什么呢？我想用流方法来重定向，可以吗？

node.js

来源：https://stackoverflow.com/questions/25757293/how-to-stream-read-directory-in-node-js

6条答案

按热度按时间

qxsslcnc1#

在现代计算机中，遍历一个包含500K文件的目录不算什么。当你在Node.js中异步地执行fs.readdir时，它所做的只是读取指定目录中的文件名列表。它并不读取文件的内容。我刚刚测试了目录中的700K文件。它只需要21MB的内存来加载这个文件名列表。
一旦你加载了这个文件名列表，你只需要一个一个地遍历它们，或者通过设置一些并发限制来并行地遍历它们，你就可以轻松地使用它们。

var async = require('async'),
    fs = require('fs'),
    path = require('path'),
    parentDir = '/home/user';

async.waterfall([
    function (cb) {
        fs.readdir(parentDir, cb);
    },
    function (files, cb) {
        // `files` is just an array of file names, not full path.

        // Consume 10 files in parallel.
        async.eachLimit(files, 10, function (filename, done) {
            var filePath = path.join(parentDir, filename);

            // Do with this files whatever you want.
            // Then don't forget to call `done()`.
            done();
        }, cb);
    }
], function (err) {
    err && console.trace(err);

    console.log('Done');
});

赞(0）回复(0）举报 2023-02-03

rbpvctlc2#

现在有一种方法可以用异步迭代来实现！你可以：

const dir = fs.opendirSync('/tmp')

for await (let file of dir) {
  console.log(file.name)
}

把它变成溪流：

const _pipeline = util.promisify(pipeline)
await _pipeline([
  Readable.from(dir),
  ... // consume!
])

赞(0）回复(0）举报 2023-02-03

ewm0tg9j3#

更现代的答案是使用opendir（添加了v12.12.0）迭代每个找到的文件，因为它被找到了：

import { opendirSync } from "fs";

const dir = opendirSync("./files");
for await (const entry of dir) {
  console.log("Found file:", entry.name);
}

fsPromises.opendir/openddirSync返回Dir的一个示例，它是一个可迭代对象，为目录中的每个文件返回Dirent（目录条目）。
这样做效率更高，因为它返回找到的每个文件，而不必等到收集完所有文件。

赞(0）回复(0）举报 2023-02-03

xpszyzbs4#

以下是两个可行的解决方案：
1.异步生成器。可以使用fs.opendir函数创建Dir对象，该对象具有Symbol.asyncIterator属性。

import { opendir } from 'fs/promises';

// An async generator that accepts a directory name
const openDirGen = async function* (directory: string) {
    // Create a Dir object for that directory
    const dir = await opendir(directory);

    // Iterate through the items in the directory asynchronously
    for await (const file of dir) {
        // (yield whatever you want here)
        yield file.name;
    }
};

其用法如下：

for await (const name of openDirGen('./src')) {
    console.log(name);
}

1.使用上面创建的异步生成器可以创建Readable流。

// ...
import { Readable } from 'stream';

// ...

// A function accepting the directory name
const openDirStream = (directory: string) => {
    return new Readable({
        // Set encoding to utf-8 to get the names of the items in
        // the directory as utf-8 strings.
        encoding: 'utf-8',
        // Create a custom read method which is async, but works
        // because it doesn't need to be awaited, as Readable is
        // event-based anyways.
        async read() {
            // Asynchronously iterate through the items names in
            // the directory using the openDirGen generator.
            for await (const name of openDirGen(directory)) {
                // Push each name into the stream, emitting the
                // 'data' event each time.
                this.push(name);
            }
            // Once iteration is complete, manually destroy the stream.
            this.destroy();
        },
    });
};

您可以像使用任何其他Readable流一样使用它：

const myDir = openDirStream('./src');

myDir.on('data', (name) => {
    // Logs the file name of each file in my './src' directory
    console.log(name);
    // You can do anything you want here, including actually reading
    // the file.
});

这两种解决方案都允许异步迭代目录中的项目名称，而不是像fs.readdir那样一次将它们全部拉入内存。

赞(0）回复(0）举报 2023-02-03

to94eoyn5#

@mstephen19给出的答案是正确的，但是它使用了Readable.read()不支持的异步生成器，如果你试图把opendirGen()转换成递归函数，递归到目录，它就不能工作了。
使用Readable.from()是这里的解决方案，下面是他的解决方案的修改（opendirGen()仍然不是递归的）：

import { opendir }  from 'node:fs/promises';
import { Readable } from 'node:stream';

async function* opendirGen(dir) {
    for await ( const file of await opendir('/tmp') ) {
        yield file.name;
    }
};

Readable
    .from(opendirGen('/tmp'), {encoding: 'utf8'})
    .on('data', name => console.log(name));

赞(0）回复(0）举报 2023-02-03

uurv41yg6#

截至版本10，仍然没有很好的解决方案。节点只是还没有那么成熟。
现代的文件系统可以很容易地处理一个目录中数百万个文件。2当然，你可以像你建议的那样，在大规模的操作中为它做一个很好的案例。
底层的C库会迭代目录列表，一次一个，这是它应该做的，但是我见过的所有节点实现，都声称要迭代，使用fs.readdir，它会尽可能快地将所有内容读入内存。
据我所知，你必须等待一个新版本的libuv被节点采用，然后让维护者来解决这个老问题。
在版本12中将进行一些改进。

赞(0）回复(0）举报 2023-02-03

我来回答

如何流式读取node.js中目录？

6条答案

相关问题

热门标签

最新问答