我目前正面临着一个挑战,试图用R中的ggsankey包转换CSV数据以实现可视化。我的目标是从存储在CSV文件中的数据创建一个新的DataFrame,在那里我可以匹配物种之间的符号。DataFrame应包含以下列:“x”、“node”、“next_x”和“next_node”。在此上下文中,“x”表示物种名称,“node”表示当前符号,“next_x”表示后续物种名称,并且“next_node”应对应于后续物种中的匹配符号。
这是我的CSV数据:
species symbol start stop orientation
Homo_sapiens SLC35A1 1 2 1
Homo_sapiens RARS2 2 3 -1
Homo_sapiens ORC3 3 4 1
Homo_sapiens AKIRIN2 4 5 -1
Homo_sapiens SPACA1 5 6 1
Homo_sapiens CNR1 6 7 -1
Homo_sapiens RNGTT 7 8 -1
Homo_sapiens PNRC1 8 9 1
Homo_sapiens PM20D2 9 10 1
Homo_sapiens SRSF12 10 11 -1
Homo_sapiens GABRR1 11 12 -1
Mus_musculus GABRR1 1 2 1
Mus_musculus PM20D2 2 3 -1
Mus_musculus SRSF12 3 4 1
Mus_musculus PNRC1 4 5 -1
Mus_musculus RNGTT 5 6 1
Mus_musculus CNR1 6 7 1
Mus_musculus SPACA1 7 8 -1
Mus_musculus AKIRIN2 8 9 1
Mus_musculus ORC3 9 10 -1
Mus_musculus RARS2 10 11 1
Mus_musculus SLC35A1 11 12 -1
Rattus_norvegicus GABRR1 1 2 1
Rattus_norvegicus PM20D2 2 3 -1
Rattus_norvegicus SRSF12 3 4 1
Rattus_norvegicus PNRC1 4 5 -1
Rattus_norvegicus RNGTT 5 6 1
Rattus_norvegicus CNR1 6 7 1
Rattus_norvegicus SPACA1 7 8 -1
Rattus_norvegicus AKIRIN2 8 9 1
Rattus_norvegicus ORC3 9 10 -1
Rattus_norvegicus RARS2 10 11 1
Rattus_norvegicus SLC35A1 11 12 -1
Canis_lupus_familiaris SLC35A1 1 2 1
Canis_lupus_familiaris RARS2 2 3 -1
Canis_lupus_familiaris ORC3 3 4 1
Canis_lupus_familiaris AKIRIN2 4 5 -1
Canis_lupus_familiaris SPACA1 5 6 1
Canis_lupus_familiaris CNR1 6 7 -1
Canis_lupus_familiaris RNGTT 7 8 -1
Canis_lupus_familiaris PNRC1 8 9 1
Canis_lupus_familiaris SRSF12 9 10 -1
Canis_lupus_familiaris PM20D2 10 11 1
Canis_lupus_familiaris GABRR1 11 12 -1
Monodelphis_domestica SLC35A1 1 2 1
Monodelphis_domestica RARS2 2 3 -1
Monodelphis_domestica ORC3 3 4 1
Monodelphis_domestica AKIRIN2 4 5 -1
Monodelphis_domestica SPACA1 5 6 1
Monodelphis_domestica CNR1 6 7 -1
Monodelphis_domestica RNGTT 7 8 -1
Monodelphis_domestica PNRC1 8 9 1
Monodelphis_domestica SRSF12 9 10 -1
Monodelphis_domestica PM20D2 10 11 1
Monodelphis_domestica GABRR1 11 12 -1
Ornithorhynchus_anatinus SLC35A1 1 2 1
Ornithorhynchus_anatinus RARS2 2 3 -1
Ornithorhynchus_anatinus ORC3 3 4 1
Ornithorhynchus_anatinus AKIRIN2 4 5 -1
Ornithorhynchus_anatinus SPACA1 5 6 1
Ornithorhynchus_anatinus CNR1 6 7 -1
Ornithorhynchus_anatinus RNGTT 7 8 -1
Ornithorhynchus_anatinus PNRC1 8 9 1
Ornithorhynchus_anatinus PM20D2 9 10 1
Ornithorhynchus_anatinus LOC100076186 10 11 -1
Ornithorhynchus_anatinus LOC114805750 11 12 1
Gallus_gallus PM20D2 1 2 -1
Gallus_gallus PNRC1 2 3 -1
Gallus_gallus BORCS6 3 4 1
Gallus_gallus RNGTT 4 5 1
Gallus_gallus LOC101749895 5 6 1
Gallus_gallus CNR1 6 7 1
Gallus_gallus SPACA1 7 8 -1
Gallus_gallus AKIRIN2 8 9 1
Gallus_gallus ORC3 9 10 -1
Gallus_gallus RARS2 10 11 1
Gallus_gallus SLC35A1 11 12 -1
Taeniopygia_guttata CFAP206 1 2 1
Taeniopygia_guttata SLC35A1 2 3 1
Taeniopygia_guttata RARS2 3 4 -1
Taeniopygia_guttata ORC3 4 5 1
Taeniopygia_guttata AKIRIN2 5 6 -1
Taeniopygia_guttata CNR1 6 7 -1
Taeniopygia_guttata RNGTT 7 8 -1
Taeniopygia_guttata BORCS6 8 9 -1
Taeniopygia_guttata PNRC1 9 10 1
Taeniopygia_guttata PM20D2 10 11 1
Taeniopygia_guttata GABRR1 11 12 -1
Chelonia_mydas SLC35A1 1 2 1
Chelonia_mydas RARS2 2 3 -1
Chelonia_mydas ORC3 3 4 1
Chelonia_mydas AKIRIN2 4 5 -1
Chelonia_mydas SPACA1 5 6 1
Chelonia_mydas CNR1 6 7 -1
Chelonia_mydas RNGTT 7 8 -1
Chelonia_mydas LOC102938330 8 9 -1
Chelonia_mydas PNRC1 9 10 1
Chelonia_mydas PM20D2 10 11 1
Chelonia_mydas GABRR1 11 12 -1
Anolis_carolinensis PM20D2 1 2 -1
Anolis_carolinensis SRSF12 2 3 1
Anolis_carolinensis PNRC1 3 4 -1
Anolis_carolinensis RNGTT 4 5 1
Anolis_carolinensis LOC107982676 5 6 -1
Anolis_carolinensis CNR1 6 7 1
Anolis_carolinensis SPACA1 7 8 -1
Anolis_carolinensis AKIRIN2 8 9 1
Anolis_carolinensis ORC3 9 10 -1
Anolis_carolinensis RARS2 10 11 1
Anolis_carolinensis SLC35A1 11 12 -1
Xenopus_laevis GABRR2.S 1 2 1
Xenopus_laevis GABRR1.S 2 3 1
Xenopus_laevis PM20D2.S 3 4 -1
Xenopus_laevis LOC108717975 4 5 1
Xenopus_laevis RNGTT.S 5 6 1
Xenopus_laevis CNR1.S 6 7 1
Xenopus_laevis AKIRIN2.S 7 8 1
Xenopus_laevis ORC3.S 8 9 -1
Xenopus_laevis RARS2.S 9 10 1
Xenopus_laevis SLC35A1.S 10 11 -1
Xenopus_laevis LOC108717977 11 12 1
Latimeria_chalumnae DDX24 1 2 -1
Latimeria_chalumnae PPP4R4 2 3 1
Latimeria_chalumnae SERPINA10B 3 4 -1
Latimeria_chalumnae ARRDC3A 4 5 1
Latimeria_chalumnae LOC102360869 5 6 -1
Latimeria_chalumnae CNR1 6 7 1
Latimeria_chalumnae SPACA1 7 8 -1
Latimeria_chalumnae AKIRIN2 8 9 1
Latimeria_chalumnae ORC3 9 10 -1
Latimeria_chalumnae RARS2 10 11 1
Latimeria_chalumnae LOC102362557 11 12 1
Protopterus_annectens LOC122794922 1 2 1
Protopterus_annectens LOC122794923 2 3 1
Protopterus_annectens LOC122794924 3 4 1
Protopterus_annectens FBXL5 4 5 1
Protopterus_annectens CC2D2A 5 6 -1
Protopterus_annectens CNR1 6 7 1
Protopterus_annectens CPEB2 7 8 -1
Protopterus_annectens BOD1L1 8 9 -1
Protopterus_annectens C1QTNF7 9 10 -1
Protopterus_annectens NKX3-2 10 11 1
Protopterus_annectens RAB28 11 12 1
Danio_rerio MYO6A 1 2 1
Danio_rerio LOC569340 2 3 -1
Danio_rerio MEI4 3 4 1
Danio_rerio NT5E 4 5 1
Danio_rerio SNX14 5 6 -1
Danio_rerio CNR1 6 7 -1
Danio_rerio RNGTT 7 8 -1
Danio_rerio PNRC1 8 9 1
Danio_rerio GABRR1 9 10 -1
Danio_rerio GABRR2B 10 11 -1
Danio_rerio UBE2J1 11 12 -1
Oreochromis_niloticus SI:DKEY-174M14.3 1 2 1
Oreochromis_niloticus RDH14B 2 3 -1
Oreochromis_niloticus LOC102078481 3 4 1
Oreochromis_niloticus RNGTT 4 5 1
Oreochromis_niloticus LOC112842425 5 6 -1
Oreochromis_niloticus CNR1 6 7 1
Oreochromis_niloticus AKIRIN2 7 8 1
Oreochromis_niloticus RARS2 8 9 1
Oreochromis_niloticus SLC35A1 9 10 -1
Oreochromis_niloticus LOC100692709 10 11 -1
Oreochromis_niloticus LOC102081816 11 12 1
Scyliorhinus_canicula SLC35A1 1 2 1
Scyliorhinus_canicula RARS2 2 3 -1
Scyliorhinus_canicula ORC3 3 4 1
Scyliorhinus_canicula AKIRIN2 4 5 -1
Scyliorhinus_canicula LOC119967921 5 6 1
Scyliorhinus_canicula CNR1 6 7 -1
Scyliorhinus_canicula RNGTT 7 8 -1
Scyliorhinus_canicula LOC119967175 8 9 -1
Scyliorhinus_canicula PNRC1 9 10 1
Scyliorhinus_canicula LOC119967178 10 11 1
Scyliorhinus_canicula LOC119967180 11 12 -1
Petromyzon_marinus LOC116953416 1 2 -1
Petromyzon_marinus LOC116953419 2 3 -1
Petromyzon_marinus CEP162 3 4 1
Petromyzon_marinus FBXL22 4 5 -1
Petromyzon_marinus RNGTT 5 6 1
Petromyzon_marinus CNR1 6 7 1
Petromyzon_marinus AKIRIN2 7 8 1
Petromyzon_marinus ORC3 8 9 -1
Petromyzon_marinus RARS2 9 10 1
Petromyzon_marinus SLC35A1 10 11 -1
Petromyzon_marinus RHBDL2 11 12 1
为了实现这一点,我编写了以下代码,我相信它应该会产生所需的输出:
library(dplyr)
# Read the CSV data
data <- read.csv("cnr1.csv")
# Create a new DataFrame with the desired columns
new_data <- data.frame(x = character(0), node = character(0), next_x = character(0), next_node = character(0), stringsAsFactors = FALSE)
# Fill the new DataFrame based on the given rules
for (i in 1:(nrow(data) - 1)) {
row <- data[i, ]
next_row <- data[i + 1, ]
next_species <- next_row$species
next_symbol <- next_row$symbol
matching_symbol <- ifelse(next_species == next_row$species, next_symbol, "NA")
new_row <- data.frame(
x = row$species,
node = row$symbol,
next_x = next_species,
next_node = matching_symbol
)
new_data <- bind_rows(new_data, new_row)
}
# Add the last row with NA values
last_row <- data[nrow(data), ]
new_data <- bind_rows(new_data, data.frame(
x = last_row$species,
node = last_row$symbol,
next_x = "NA",
next_node = "NA"
))
# Write the transformed data to a new CSV file
output_filename <- "output.csv"
write.csv(new_data, file = output_filename, row.names = FALSE)
cat(paste("Output written to", output_filename))
然而,当前的输出并不像预期的那样。我得到如下输出:
x,node,next_x,next_node
Homo_sapiens,SLC35A1,Homo_sapiens,NA
Homo_sapiens,RARS2,Homo_sapiens,NA
Homo_sapiens,ORC3,Homo_sapiens,NA
Homo_sapiens,AKIRIN2,Homo_sapiens,NA
Homo_sapiens,SPACA1,Homo_sapiens,NA
Homo_sapiens,CNR1,Homo_sapiens,NA
Homo_sapiens,RNGTT,Homo_sapiens,NA
Homo_sapiens,PNRC1,Homo_sapiens,NA
Homo_sapiens,PM20D2,Homo_sapiens,NA
Homo_sapiens,SRSF12,Homo_sapiens,NA
Homo_sapiens,GABRR1,Mus_musculus,Mus_musculus
Mus_musculus,GABRR1,Mus_musculus,NA
Mus_musculus,PM20D2,Mus_musculus,NA
Mus_musculus,SRSF12,Mus_musculus,NA
Mus_musculus,PNRC1,Mus_musculus,NA
Mus_musculus,RNGTT,Mus_musculus,NA
Mus_musculus,CNR1,Mus_musculus,NA
Mus_musculus,SPACA1,Mus_musculus,NA
Mus_musculus,AKIRIN2,Mus_musculus,NA
Mus_musculus,ORC3,Mus_musculus,NA
Mus_musculus,RARS2,Mus_musculus,NA
Mus_musculus,SLC35A1,Rattus_norvegicus,NA
.
.
.
Petromyzon_marinus,LOC116953416,Petromyzon_marinus,NA
Petromyzon_marinus,LOC116953419,Petromyzon_marinus,NA
Petromyzon_marinus,CEP162,Petromyzon_marinus,NA
Petromyzon_marinus,FBXL22,Petromyzon_marinus,NA
Petromyzon_marinus,RNGTT,Petromyzon_marinus,NA
Petromyzon_marinus,CNR1,Petromyzon_marinus,NA
Petromyzon_marinus,AKIRIN2,Petromyzon_marinus,NA
Petromyzon_marinus,ORC3,Petromyzon_marinus,NA
Petromyzon_marinus,RARS2,Petromyzon_marinus,NA
Petromyzon_marinus,SLC35A1,Petromyzon_marinus,NA
Petromyzon_marinus,RHBDL2,NA,NA
主要问题在于向“next_x”和“next_node”列赋值。这些值应该由随后的物种及其相关的符号来确定,但我的代码似乎分配了不正确的值。此外,对于最后一个物种,“next_x”和“next_node”都应该是“NA”。
预期输出格式如下:
x node next_x next_node
Homo_sapiens CNR Mus_musculus CNR
Mus_musculus CNR Rattus_norvegicus CNR
.
.
.
Gallus_gallus LOC101749895 NA NA
.
.
.
Petromyzon_marinus RHBDL2 NA NA
我将非常感谢任何关于如何纠正代码以生成预期输出的见解或建议。感谢您的帮助!
1条答案
按热度按时间7uzetpgm1#
dplyr::lag()
和dplyr::lead()
非常适合从上一行或下一行获取值。下面的代码替换了
data <- read.csv("cnr1.csv")
行之后的所有内容