R语言计算大型网络上节点i与邻居使用的资源份额

pgx2nnw8 于 2023-03-10 发布在其他

关注(0)|答案(1)|浏览(123)

目的主要目的是能够计算节点i相对于其邻居所使用的资源份额：r_i / sum_j^i{r_j}

其中r_i是节点i的资源，并且sum_j^i{r_j}是i的邻居的资源的总和。
我对任何R、python或stata解决方案都持开放态度，只要它们能够完成我几乎要放弃的任务......请看下面我以前尝试的片段。
为了实现这个目标，我尝试执行以下类型的搜索：
| 节点|列1|列2|第3栏|
| - ------|- ------|- ------|- ------|
| 我|[答]|[列表]|列表|
| j|[A、B、i]|||
在列1中搜索i如果找到更新列1
| 节点|列1|列2|第3栏|
| - ------|- ------|- ------|- ------|
| 我|[阿、j]|[列表]|列表|
| j|[A、B、i]|||

数据Dataframe约为700 k行，列表最多可包含20个元素。列1-列3中的列表可以为空。存储为字符串的条目类似于“1579301860”。

df：的前10个条目

df[["ID","s22_12","s22_09","s22_04"]].head(10)
,ID,s22_12,s22_09,s22_04
0,547232925,[],[],[]
1,1195452119,[],[],[]
2,543827523,[],[],[]
3,1195453927,[],[],[]
4,1195456863,[],[],[]
5,403735824,[],[],[]
6,403985344,[],[],[]
7,1522725190,"['547232925', '1561895862', '1195453927', '1473969746', '1576299336', '1614620375', '1526127302', '1523072827', '398988727', '1393784634', '1628271142', '1562369345', '1615273511', '1465706815', '1546795725']","['1550103038', '547232925', '1614620375', '1500554025', '1526127302', '1523072827', '1554793443', '1393784634', '1603417699', '1560658585', '1533511207', '1439071476', '1527861165', '1539382728', '1545880720']","['1529732185', '1241865116', '1524579382', '1523072827', '1526127302', '1560851415', '1535455909', '1457280850', '1577015775', '1600877852', '1549989930', '1528007558', '1533511207', '1527861165', '1591602766']"
8,789656124,[],[],[]
9,662539468,[1195453927],[],[]

我尝试的是：R尝试分解列表并以长格式放置。然后我尝试了R中的两种主要方法：

1.将长数据加载到igraph中，然后应用于节点的图neighbors（），保存到列表中，并使用plyr获得neighbor_df（工作，但2个节点在67秒内完成）

# Initialize the result data frame
result <- data.frame(Node = nodes)
#result <- as.data.frame(matrix(NA, nrow = n_nodes, ncol = 0))
neighbor_lists <- lapply(nodes, function(x) {
  neighbors <- names(neighbors(graph, x))
  if (length(neighbors) == 0) {
    neighbors <- NA
  }
  return(neighbors)
})
neighbor_df <- plyr::ldply(neighbor_lists, rbind)
names(neighbor_df) <- paste0("Neighbor",1:ncol(neighbor_df))
result <- cbind(result,neighbor_df)

1.使用data.table，split读取长格式，在拆分时应用dcast（〈-内存过载）

result_long <- edges[, .(to = to, Node = from)][, rn := .I][,   .(Node, Neighbor = to, Number = rn)][order(Number),]
result_long[,cast_cat:=findInterval(Number,seq(100000,6000000,100000))]
# reshape to wide
result_wide <- dcast(result_long, Node ~ Number, value.var = "Neighbor", fill = "")
#Only tested on sample data, target data is 19 mln rows and dcast shall be split, but then it consumes 200Gb of ram
result_wide[, (2:ncol(result_wide)) := lapply(.SD, function(x) ifelse(x == "", NA, x)), .SDcols = 2:ncol(result_wide)]
result_wide = na_move(result_wide, cols = names(result_wide[,!1]) )
result_wide<- Filter(function(x)!all(is.na(x)), result_wide)

我按照安迪的要求贴了出来，但我认为这会让问题变得混乱。

r

来源：https://stackoverflow.com/questions/75333666/compute-share-of-resources-used-by-node-i-w-r-t-neighbors-on-a-large-network

1条答案

按热度按时间

osh3o9ms1#

感谢@Stefano Barbi的评论：

# extract attributes characteristics:
r <- vertex_attr(g,"rcount",index=V(g))

#create a dgC sparse matrix from graph
m <- get.adjacency(g)

# premultiply the adj matrix to find the sum of the neighbors resources
sum_of_rj = r %*% m

# add node's own resources
sum_of_r = sum_of_rj + r

#find the vector of shares
share = r / sum_of_r@x

sh_tab = data.table(i = sum_of_r@Dimnames[[2]], sh = share)
sh_tab

赞(0）回复(0）举报 2023-03-10

我来回答

R语言计算大型网络上节点i与邻居使用的资源份额

1条答案

相关问题

热门标签

最新问答

R语言 计算大型网络上节点i与邻居使用的资源份额

1条答案

相关问题

热门标签

最新问答

R语言计算大型网络上节点i与邻居使用的资源份额