删除R中未使用的库

c3frrgcw  于 2023-03-05  发布在  其他
关注(0)|答案(3)|浏览(106)

我刚刚完成了我的第一个脚本,从一个文本文件汇编威布尔分析。在我所有的修补工作中,我怀疑我可能加载了一些库,这些库在最终脚本中没有使用。有没有一种快速的方法可以检查脚本正在使用哪些库,而不必检查每个函数?

yks3o0rb

yks3o0rb1#

下面是一个脚本,它可以找到你已经加载但没有在脚本中使用的包。它需要在一个干净的会话中运行,因为没有办法验证你当前会话的状态是否与脚本创建的状态相同。它假设包只使用libraryrequire加载,这是一个很好的实践。我没有广泛地测试它,但似乎相当合理。
注解中解释了代码是如何工作的,这是一个有趣的练习,完全用base R编写,这样它本身就不必加载任何包。
使用getParseData作为起点的想法来自Eric Green's answer to this related question

# Define the file to test in the line below. That is the only per-run configuration needed.
fileToTest <- "Plot.R"

# Get the parse data for the file
parseData <- getParseData(parse(fileToTest), includeText = TRUE)

# Extract all the function calls and keep a unique list of them.
functionCalls <- unique(parseData[parseData$token == "SYMBOL_FUNCTION_CALL", "text"])

# Look for any calls to `library` or `require` and go two steps up the
# call tree to find the complete call (with arguments).
libraryCalls <- parseData[parseData$token == "SYMBOL_FUNCTION_CALL" & parseData$text %in% c("library", "require"),]
libraryCalls <- parseData[parseData$id %in% libraryCalls$parent,]
libraryCalls <- parseData[parseData$id %in% libraryCalls$parent,]
libraryCalls <- libraryCalls$text

# Execute all the library/require calls to attach them to this session
eval(parse(text = libraryCalls))

# For each function called,
# * Use `getAnywhere` to find out where it is found. That information is in a character
# vector which is the `where` component of the returned list.
# * From that vector of locations, keep only the ones starting with "package:",
# getting rid of those starting with "namespace:".
# * Take the first one of these which sould be the first package that the
# function is found in and thus would be the one used.
names(functionCalls) <- functionCalls
matchPkg <- vapply(functionCalls, 
                   FUN = (\(f) grep("^package:", getAnywhere(f)$where, value = TRUE)[1]), 
                   FUN.VALUE = character(1))

# get a list of all packages from the search path, keep only those that are
# actually packages (not .GlobalEnv, Autoloads, etc.), ignore those that are
# automatically attached (base, methods, datasets, utils, grDevices, graphics, stats),
# and then see of those which ones did not show up in the list of packages used
# by the functions.
packages <- search()
packages <- grep("^package:", packages, value = TRUE)
packages <- setdiff(packages, c("package:base", "package:methods", "package:datasets", "package:utils", "package:grDevices", "package:graphics", "package:stats"))
packages <- setdiff(packages, unique(matchPkg))

# Report results
if(length(packages) > 0) {
  cat("Unused packages: \n"); 
  print(packages)
} else {
  cat("No unused packages found.\n")
}
4zcjmb1e

4zcjmb1e2#

如果你通过libraryrequire附加库,那么搜索你的代码是最容易的。如果你调用库而没有通过<library>::<export>语法附加它们,那么搜索::。如果你担心传递依赖或者只是想创建一个可复制的环境,请查看packrat包:http://rstudio.github.io/packrat/

lfapxunr

lfapxunr3#

这不是特别漂亮或高效,但它应该做的工作(在大多数情况下):

library("stringr")

script_path = "/path/to/your/script.R"
load_command_pattern <- "library\\(\"[a-z,0-9]+\"\\)"

text <- readChar(script_path, file.info(script_path)$size)
pck <- str_extract_all(text, pattern = load_command_pattern)

# Find all instances where packages are loaded
packages <- list()
for(i in 1:length(pck[[1]])){
  p = pck[[1]][i]
  name <- str_extract(gsub("library", "", p), "[a-z,0-9]+")
  packages <- append(packages, name, after = length(packages))
}

# Load packages
for(i in 1:length(packages)){
  p <- packages[[i]]
  library(packages[[i]], character.only = TRUE)
}

# Make a list to store packages from which no function is called
remove <- list()
for(i in 1:length(packages)){
  p <- packages[[i]]
  # list all functions contained in the package
  funs <- ls(paste0("package:", p))
  # add an opening bracket to make sure to only find functions, not comments etc.
  functions <- paste0(funs, "\\(")
  # for every function in the package, check whether its name appears in the script
  in_script <- mapply(grepl, functions, text)
  # if none of the functions are contained in the script, add the package to the list
  if(!any(in_script)){
    remove <- append(remove, p)
  }
}

# Remove loading commands for all packages
for(i in 1:length(remove)){
  to_remove <- paste0("library\\(\"",remove[[i]] , "\"\\)")
  text = gsub(to_remove, "", text)
}

# Save output (to a new file! Don't overwrite your existing script without testing!)
sink(file = "/path/to/your/new_script.R")
cat(gsub("\\r", "", text))
sink()

注意,我假设您使用library("package_name")加载包,您可能需要调整regex模式。
代码应该做什么:
1.阅读文本中的R脚本
1.查找加载包的所有示例。在本例中,我专门搜索调用library(...)。在这里,我们提取包名,假设它只包含字符和数字。
1.加载程序包,并列出其中包含的函数。如果在脚本中找不到任何函数,请将程序包名称附加到要删除的程序包列表中。
1.替换加载不必要的包的所有示例。(您也可以删除换行符。)
1.将脚本文本写入新文件。检查输出是否与预期相符,并测试新脚本。
请注意,这并不完美(例如,具有相似名称的函数可能出现在多个包中。此外,当前不区分完整函数名称匹配和函数名称结尾的匹配(搜索my_function(将给予another_my_function(的假肯定。您可以添加附加检查以查看函数名称前面是否存在符号、换行符或空格)。然而,我认为代码应该适用于大多数情况。
当然,如果你在脚本开始时加载了所有的包,你可以手动创建一个已加载包的列表,同样的,你也可以打印出未使用包的列表,然后手动删除它们。

相关问题