基于长数据R的2个条件拉出数据

kq0g1dla 于 12个月前发布在其他

关注(0)|答案(1)|浏览(79)

我希望我能把问题说清楚。
我是R的新手，目前正在处理来自3个不同语料库的文本长数据。我的目标是知道给定字符串在给定语料库中出现的频率（%）。这是长数据示例：

Center     n   tag  
 1 apple   19  poetry
 2 melon   34  media
 3 lemon   1   spoken
 4 lemon   1   poetry
 5 peach   1   spoken
 6 apple   1   poetry
 7 orange  1   media
 8 banana  1   spoken
 9 banana  1   media
10 melon   1   media
...

字符串
因此，我想根据标签的条件，拉动字符串“apple”和“melon”，沿着频率。这样，我就可以比较每个不同标签中字符串的频率（%）。
范例：

Center     %   tag  
 1 apple   4   poetry
 2 apple   34  media
 3 apple   23  spoken
 4 melon   15  poetry
 5 melon   23  spoken
 6 melon   2   poetry

型
我的最终目标是使用ggplot barplot来可视化框架。
我仍然想办法在长数据中以这种方式提取数据。我目前的可视化如下：

的数据
它只是显示了每个标签的频率的条形图。我想要的是除了标签之外，它还显示了我想要比较的字符串的两个变量。测量的不是数字形式的频率，而是相对于标签的百分比频率。

r

来源：https://stackoverflow.com/questions/77733050/pulling-out-a-data-based-on-2-conditions-of-long-data-r

1条答案

按热度按时间

xvw2m8pv1#

我正在根据“钻石”数据集模拟您的数据。我同意评论，它不是很清楚你想要实现什么，但也许下面的帮助：

library(tidyverse)
## make similar data frame
df <- diamonds %>%
  select(Center = cut, n = table, tag = color)

## define your strings
strings <- c("Premium", "Good")

## filter those
df %>%
  filter(Center %in% strings) %>%
  ## calculate frequency of tag occurrence based on "n" within each Center
  ## this first step might not be necessary depending on how your data looks in real life
  group_by(Center, tag) %>%
  summarise(n = sum(n)) %>%
  ## this calculates the percentatge
  group_by(Center) %>%
  reframe(perc = round(100*n/sum(n)), tag = tag) %>%
  ## then pass this new data frame to ggplot
  ggplot() +
  geom_col(aes(Center, perc, fill = tag))
#> `summarise()` has grouped output by 'Center'. You can override using the
#> `.groups` argument.

字符串
x1c 0d1x的数据
创建于2023-12-29带有reprex v2.0.2

赞(0）回复(0）举报 12个月前

我来回答

基于长数据R的2个条件拉出数据

1条答案

相关问题

热门标签

最新问答