我正在尝试解决一个内存泄漏问题,当运行下面的代码runeveryminute.js时,每分钟运行一次。每次,我都看到数据使用量增加了大约5MB,但没有被垃圾收集。当我使用devtools查看文件时,似乎Windows占用了大量空间,这让我认为这是jsdom的问题,但我无法找出原因。
- runeveryminute.js的示例**出于本示例的目的:
- "Homepage.html"是Hacker News主页的保存副本(https://news.ycombinator.com/)
- "intermediate. pem"是一个包含我正在使用的一组SSL证书的文件
- 代码如下:**
import {JSDOM} from 'jsdom';
import fetch from 'node-fetch';
import UserAgent from 'user-agents';
import path from 'path';
import sslrootcas from 'ssl-root-cas';
const rootCas = sslrootcas.create();
import {fileURLToPath} from 'url';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
rootCas.addFile(path.resolve(__dirname,'intermediate.pem'));
import http from 'node:http';
import https from 'node:https';
import fs from 'fs';
export const emailstripperSO = function(){
fs.readFile(__dirname+'/config/Homepage.html', 'utf8', function(err, storeBody) {
let dom = new JSDOM(storeBody);
let atagpromises = [];
const httpAgent = new http.Agent({
keepAlive: true
});
const httpsAgent = new https.Agent({ca: rootCas});
for (var alist of dom.window.document.querySelectorAll("a")){
let atagpromise = new Promise ((resolve,reject)=>{
let oldURL = alist.href;
let requestOptions = {
host: oldURL.split('/')[2]
,path: '/'+oldURL.split('/').slice(3).join('/')
,headers: {"User-Agent": new UserAgent() }
};
if (oldURL.split(":")[0] && ["http", "https", "mailto"].includes(oldURL.split(":")[0] )
){
if (oldURL.substr(0,4)!="http"){
return resolve(oldURL);
}
if (oldURL.substr(0,5)=="http:" ) {
var myAgent = httpAgent;
}
if (oldURL.substr(0,5)=="https" ) {
var myAgent = httpAgent;
}
// // return fetch(oldURL, {
fetch(oldURL, {
method: "GET"
,headers: {"User-Agent": new UserAgent() }
, redirect: "manual"
// , agent: myAgent
,agent: function(_parsedURL) {
if (_parsedURL.protocol == 'http:') { return httpAgent;}
if (_parsedURL.protocol == 'https:') { return httpsAgent;}
}
})
.then(response => {
var newURL = response.url;
console.log("new linke "+newURL);
alist.href = newURL ;
})
.catch(err=>{
console.log("error");
console.log(err);
resolve(oldURL);
});
}
// }
else {
console.log("Skipped visiting <"+oldURL+">");
resolve(oldURL);
}
});
atagpromises.push(atagpromise);
};
Promise.all(atagpromises)
.then(data=>{
var serializedDom = dom.serialize();
//do more stuff with serializedDom here
})
.catch(err=>{
console.log("Error in atagpromises: "+err);
});
});
}
- 我所尝试的:**我怀疑问题可能出在Agent上,所以我尝试了几种不同的编写Agent()的方法,正如您在注解部分所看到的,但都产生了内存泄漏。Agent需要在http和https之间切换,因为有时重定向链接会在http和https URL之间重定向。
1条答案
按热度按时间bvhaajcl1#
很难说您是否有内存泄漏,或者垃圾收集器只是决定只要有足够的可用内存就不进行清理。
您可以通过设置old_space来限制Node.js的内存使用:
该数字是最大大小(MB)。要找到合适的值,需要测试多个值。