R中分类数据的时间序列线图

mrwjdhj3  于 2023-06-03  发布在  其他
关注(0)|答案(2)|浏览(133)

Time series plot exampleSnippet of my data
时间序列数据为2015-2021年。我需要像我所展示的那样绘制线图。不幸的是,我正在得到一些奇怪的情节。我应该使用什么代码?
这是我在R enter image description here中使用的代码

2q5ifsrm

2q5ifsrm1#

@Dimitri. Thanks for the previous codes but I a still did 
   not get 
    what I wanted.

    > dput(head(df_new2))
    structure(list(Year = c(2015, 2015, 2015, 2015, 2015, 
    2015), 
    Age = c(3, 3, 3, 3, 3, 3), Race = c(2, 2, 2, 2, 2, 2), 
    Gender = c(1, 
    1, 1, 1, 1, 1), Life_expectancy = c(71.2, 70.8, 66.9, 62, 
    57.1, 52.3)), row.names = c(NA, 6L), class = "data.frame")

    > #Time series plot for Life expectancy
    > library(ggplot2)
    > library(tseries)
    > 
    > ts_df1 <- ts(df_new2$Life_expectancy, start=2015, 
    end=2021, 
    frequency=12)
    > class(ts_df1)
   [1] "ts"
    > 
    > autoplot(ts_df1) + ylab("Life expectancy") + xlab("Year")
    > #Time series plot for Age
    > library(ggplot2)
    > library(tseries)
    > 
    > ts_df2 <- ts(df_new2$Age, start=2015, end=2021, 
    frequency=12)
    > class(ts_df2)
   [1] "ts"
    > 
    > autoplot(ts_df2) + ylab("Age") + xlab("Year")
    > #Time series plot for Gender
    > library(ggplot2)
    > library(tseries)
    > 
    > ts_df3 <- ts(df_new2$Gender, start=2015, end=2021, 
    frequency=50)
    > class(ts_df3)
   [1] "ts"
    > 
    > autoplot(ts_df3) + ylab("Gender") + xlab("Year")

    The code above is the one I used for my time series plot.

    [Time series plot for dependent and independent variables]. 
   [1]

   So, I want something like the on the image above. I need to 
   do time series line plot in order to find the most 
   influential variables on life expectancy. I was asked to 
   plot time series line graph for all my y and x variables. 
   But, when I did I got 
   something like this.
   [Time series plot for y variable][2]
   [Time series plot for variable age][3]
   [Time series plot for variable gender][4]

   [1]: https://i.stack.imgur.com/0GJwv.png
   [2]: https://i.stack.imgur.com/WzU63.png
   [3]: https://i.stack.imgur.com/58FT1.png
   [4]: https://i.stack.imgur.com/KMcLc.png
brccelvz

brccelvz2#

你想策划的事情不是很清楚。我猜你想做的是

ggplot(df_new2) + 
   aes(x=Year, y=Life_expectancy, group="") + 
   geom_line(col="red") +
   labs(title="Life expectancy from 2015-2021",
        x="Year",
        y="Life expectancy")

也许吧

library(gridExtra)

grid.arrange(
ggplot(df_new2) + 
   aes(x=Year, y=Life_expectancy, group="") + 
   geom_line(col="red") +
   labs(x="Year",
        y="Life expectancy"),
ggplot(df_new2) + 
   aes(x=Year, fill=Age) + 
   geom_bar() +
   labs(x="Year",
        y="Age"),
ggplot(df_new2) + 
   aes(x=Year, fill=Race) + 
   geom_bar() +
   labs(x="Year",
        y="Race"),
ggplot(df_new2) + 
   aes(x=Year, fill=Gender) + 
   geom_bar() +
   labs(x="Year",
        y="Gender"),
nrow=4
)

编辑二:

我想我更了解你想做什么。我认为问题在于每年都有多个值。例如,在虚拟数据中,2015年有6个预期寿命值。因此,当你绘制2015年的数据时,geom_line必须经过2015年的每个值(71.2,70.8,66.9,62,57.1和52.3),在最小值(52.3)和最大值(71.2)之间形成一条垂直直线。
所以,如果你想要一个唯一的行,你必须以某种方式总结你的变量。例如,您可以使用dplyr::summarizegroup_by计算某个日期的平均值或中位数。
(我还在你的dput中添加了2行,以在2016年具有值,并修改了性别列以具有更多的多样性)

df_new2=structure(list(Year = c(2015, 2015, 2015, 2015, 2015, 
                        2015,2016,2016), 
               Age = c(3, 3, 3, 3, 3, 3,2,3), Race = c(2, 2, 2, 2, 2, 2,1,2), 
               Gender = c(1, 
                          1, 1, 2, 1, 1,2,1), Life_expectancy = c(71.2, 70.8, 66.9, 62, 
                                                              57.1, 52.3,75,52)), row.names = c(NA, 8L), class = "data.frame")
library(dplyr)

df_new2 %>% group_by(Year) %>% summarize(mean=mean(Life_expectancy)) %>% ggplot() +
  aes(x=Year, y=mean, group="") +
  geom_line()

要查看离差,您可以将标准差添加到图中

library(dplyr)

df_new2 %>% group_by(Year) %>% summarize(mean=mean(Life_expectancy), sd=sd(Life_expectancy)) %>% ggplot() +
  aes(x=Year,  group="") +
  geom_line(aes(y=mean,)) +
  geom_ribbon(aes(ymin=mean-sd, ymax=mean+sd), fill="coral", alpha=0.2)

您还可以绘制线性回归周围的点(这里您不必进行总结)

df_new2 %>% ggplot() +
  aes(x=Year, y=Life_expectancy) +
  geom_point(shape=1) +
  geom_smooth(method="lm")

或者使用平均值(这里我们看不到差异,但如果你尝试不同的年份,你会看到这条线不是线性的,但缺点是一年的结果与其他年份完全不同):

df_new2 %>% group_by(Year) %>% mutate(mean=mean(Life_expectancy)) ggplot() +
  aes(x=Year, y=Life_expectancy) +
  geom_point(shape=1) +
  geom_smooth(method="lm")

与中位数

library(dplyr)

df_new2 %>% group_by(Year) %>% summarize(median=median(Life_expectancy)) %>% ggplot() +
  aes(x=Year, y=median, group="") +
  geom_line()

或者你可以为每个性别绘制一个图表,例如:

df_new2 %>% mutate(Gender=as.factor(Gender)) %>% group_by(Year,Gender) %>% summarize(median=median(Life_expectancy)) %>% ggplot() +
  aes(x=Year, y=median, group=Gender, col=Gender) +
  geom_line()

(you只有当Age变量是数字时,才可以对它执行相同的操作,这意味着3表示3岁,而不是年龄类别数3。如果是指年龄类别编号3,则视为分类)。
对于分类变量,你不能画一个折线图。在这里,您遇到了与以前相同的问题:ggplot认为性别0和1是数字,所以每年在0和1之间画一条线。

# First you have to specify that these variables are categorical and not numerical

df_new2=df_new2 %>% mutate(Gender=as.factor(Gender),
                   Race=as.factor(Race))

然后,你可以计算一下你每年有多少次性别为1的人,例如:

df_new2 %>% ggplot() + 
  aes(x=Year, fill= Gender) +
  geom_histogram(,binwidth=1)

或者将Year变量视为分类变量(比将其视为时间序列更有用,因为您不知道2016年的确切时间:是2016/01/01吗?2016/05/24?):

df_new2 %>% ggplot() + 
  aes(x=as.factor(Year), fill= Gender) +
  geom_bar()

或者如果你真的想要一条线,你可以做例如男性/女性每年的数量

df_new2 %>% group_by(Gender,Year) %>% summarize(number=n()) %>% ggplot() +
  aes(x=Year, y=number, group=Gender, col=Gender) +
  geom_line()

相关问题