在R中观察到给定模式的所有年份发出信号

bkkx9g8r  于 2023-01-15  发布在  其他
关注(0)|答案(2)|浏览(120)

我有这样一个数据集:

year = c("2000", "2000", "2000", "2002", "2000", "2002", "2007")
id = c("X", "X", "X", "X", "Z", "Z", "Z")
product = c("apple", "orange", "orange", "orange", "cake", "cake", "bacon")
market = c("CHN", "USA", "USA", "USA", "SPA", "CHL", "CHL")
df = data.frame(year, id, product, market)

我想创建3个变量,表示:

  1. FPFM =如果是该产品在该给定市场的首次销售,则取值1
  2. FP =如果是第一次使用该产品,则取值1
  3. FM =如果是首次进入该市场,则取值1:
    因此,新数据如下所示:
year = c("2000", "2000", "2000", "2002", "2000", "2002", "2007")
id = c("X", "X", "X", "X", "Z", "Z", "Z")
product = c("apple", "orange", "orange", "orange", "cake", "cake", "bacon")
market = c("CHN", "USA", "USA", "USA", "SPA", "CHL", "CHL")
FPFM = c(1, 1, 1, 0, 1, 1, 1)
FP = c(1, 1, 1, 0, 1, 0, 1)
FM = c(1, 1, 1, 0, 1, 1, 0)
df_desired = data.frame(year, id, product, market, FPFM, FP, FM)

我尝试了以下df_new代码,但没有成功:

df_new <- df %>%
  arrange(id, year) %>% 
  group_by(id, product, market) %>% 
  mutate(FPFM = row_number(year) == 1) %>% 
  as.data.frame() %>% 
  group_by(id, product) %>% 
  mutate(FP = row_number(year) == 1) %>% 
  as.data.frame() %>% 
  group_by(id, market) %>% 
  mutate(FM = row_number(year) == 1) %>% 
  as.data.frame()

它只给出了第一次观察的值。我想要有观察到的产品,市场或两者结合的第一年的值。
第3行应为“真”;正确;正确”而不是“错误”;FASLE; FALSE”,因为它属于同一年。
我想到的另一个解决方案是用唯一值总结df三次,然后与原始df右连接。但是,这将花费大量的时间和空间,因为我有大量的数据。
您是否拥有最高效、最集成的解决方案?

shyt4zoc

shyt4zoc1#

我只想做一个小的帮助函数,使代码更简洁。注意,我们可以用数学把逻辑函数改为二进制函数

library(tidyverse)

which.firsts <- function(.data, ...){
  .data %>%
    arrange(id, year) %>% 
    group_by(...) %>%
    mutate(.val = `+`(year == first(year))) %>%
    pull(.val)
}

df %>%
  mutate(FPFM = which.firsts(., id, product, market),
         FP = which.firsts(., id, product),
         FM  = which.firsts(., id, market))
#>   year id product market FPFM FP FM
#> 1 2000  X   apple    CHN    1  1  1
#> 2 2000  X  orange    USA    1  1  1
#> 3 2000  X  orange    USA    1  1  1
#> 4 2002  X  orange    USA    0  0  0
#> 5 2000  Z    cake    SPA    1  1  1
#> 6 2002  Z    cake    CHL    1  0  1
#> 7 2007  Z   bacon    CHL    1  1  0
gpfsuwkq

gpfsuwkq2#

row_number(year) == 1更改为year == year[1]

df_new <- df %>%
  arrange(id, year) %>% 
  group_by(id, product, market) %>% 
  mutate(FPFM = year == year[1]) %>% 
  group_by(id, product) %>% 
  mutate(FP = year == year[1]) %>% 
  group_by(id, market) %>% 
  mutate(FM = year == year[1])

另外,重复as.data.frame似乎是不必要的。如果你真的想要一个data.frame而不是tibble,你可以保留最后一个,但在我看来tibble是一个更好的选择。检查“高级R”的这一节,了解一些原因。
结果:

> df_new
  year id product market  FPFM    FP    FM
1 2000  X   apple    CHN  TRUE  TRUE  TRUE
2 2000  X  orange    USA  TRUE  TRUE  TRUE
3 2000  X  orange    USA  TRUE  TRUE  TRUE
4 2002  X  orange    USA FALSE FALSE FALSE
5 2000  Z    cake    SPA  TRUE  TRUE  TRUE
6 2002  Z    cake    CHL  TRUE FALSE  TRUE
7 2007  Z   bacon    CHL  TRUE  TRUE FALSE

相关问题