R语言 将示例数据Map到实际csv数据

hkmswyz6  于 2023-09-27  发布在  其他
关注(0)|答案(2)|浏览(121)

感谢大卫和所有人,我认为我取得了进步。它仍然不会产生一个线图,但在代码中没有逻辑上看起来错误的东西。我在这里没有功劳-我只是剪切和粘贴什么比我聪明的人已经想通了,但我仍然没有得到一个图表。最后链接到github csv。

data = read.csv("C:/Users/12083/Desktop/librarydata.csv") # Read the data into R

head(data)                                            # Quality control, looks good
str(data)
data$dates = as.Date(data$dates, format = "%d/%m/%Y") # This formats the date as dates for R
library(tidyverse)                                    # This will import some functions that you need, spcifically %>% and ggplot
# Step 0: look that the data makes sense to you
summary(data$dates)
summary(data$city)

# Step 1: filter the right data
start.date = as.Date("2003-01-02")
end.date   = as.Date("2010-05-04")

filtered = data %>% 
  filter(dates >= start.date & 
           dates <= end.date) # This will only take rows between those dates
summary(filtered)
colnames(filtered)

library(dplyr)

filtered_agg <- filtered %>%
  group_by(city, dates, Location) %>%
  summarize(location_sum=n()) 

filtered_agg
summary(filtered_agg)
# Step 2: Plotting
# Now you can create the plot with ggplot:
# Notes: 
# I added geom_point() so that each X value gets a point. 
# I think it's easier to read. You can remove this if you like
# Also added color, because I like it, feel free to delete


# The problem is in here - somewhere
Plot = ggplot(filtered_agg, aes(x=dates, y=Location, group = city)) + geom_line(aes(linetype=city, color = city)) + geom_point(aes(color=city))
Plot
dput

https://github.com/karl1776/chart

colnames(filtered)
 [1] "ï..Class.ID"                "city"                       "dates"                      "year"                       "month"                     
 [6] "day"                        "cit"                        "Department.College"         "Course.Level"               "Course.Title"              
[11] "Tour."                      "TILT."                      "Date.Taught"                "Session.Number"             "AM.PM"                     
[16] "Hour.Count"                 "Library.Instructor"         "Other.Library.Instructor"   "Duplicate."                 "Course.Instructor"         
[21] "ACRL"                       "IPED"                       "Location"                   "Building.Room"              "Distance.Class."           
[26] "Location.of.Site.1"         "Site.1.Number.of.Students"  "Location.of.Site.2"         "Site.2.Number.of.Students"  "Location.of.Site.3"        
[31] "Site.3.Number.of.Students"  "Location.of.Site.4"         "Site.4.Number.of.Students"  "Location.of.Site.5"         "Site.5.Number.of.Students" 
[36] "Location.of.Site.6"         "Site.6.Number.of.Students"  "Location.of.Site.7"         "Site.7.Number.of.Students"  "Location.of.Site.8"        
[41] "Site.8.Number.of.Students"  "Location.of.Site.9"         "Site.9.Number.of.Students"  "Location.of.Site.10"        "Site.10.Number.of.Students"

也许我只是没有看到它,但我有一个困难的时间看例子与虚拟数据和翻译,如何加载实际数据从csv文件的图片显示我的输出从虚拟数据-正是我想要的。当我使用我的实际数据时,什么也没有发生--我是否遗漏了一个ggplot命令来打印图?

library(readxl)
require(tidyverse)
require(ggplot2)
require(lubridate)
#load data
df <- read_excel("C:/Users/12083/Desktop/librarydata.xlsx")
#plot data
df_example %>%
  ggplot(aes(date,city, color=city))+
  geom_line(aes(linetype=lt))+ #you can use single string for the same linetype for all lines or a vector of strings for each data point
  scale_linetype_identity()+ #this removes the linetype from the legend
  theme_minimal()

df_example

我得到了这个输出--这是完全正确的,但没有伴随它的情节。

city      dates classes       lt
1       Boise 2020-01-01      52    solid
2       Boise 2020-02-01      36    solid
3       Boise 2020-03-01      69    solid
4       Boise 2020-04-01     100    solid
5       Boise 2020-05-01      72    solid
6   Pocatello 2020-01-01      82   dashed
7   Pocatello 2020-02-01      15   dashed
8   Pocatello 2020-03-01      68   dashed
9   Pocatello 2020-04-01      17   dashed
10  Pocatello 2020-05-01      51   dashed
11  Salt Lake 2020-01-01      71   dotted
12  Salt Lake 2020-02-01      65   dotted
13  Salt Lake 2020-03-01      33   dotted
14  Salt Lake 2020-04-01      44   dotted
15  Salt Lake 2020-05-01      16   dotted
16 Twin Falls 2020-01-01       3  dotdash
17 Twin Falls 2020-02-01      30  dotdash
18 Twin Falls 2020-03-01      19  dotdash
19 Twin Falls 2020-04-01      34  dotdash
20 Twin Falls 2020-05-01      69  dotdash
21  Elsewhere 2020-01-01      62 longdash
22  Elsewhere 2020-02-01      14 longdash
23  Elsewhere 2020-03-01      59 longdash
24  Elsewhere 2020-04-01      35 longdash
25  Elsewhere 2020-05-01      91 longdash

dput

structure(list(`Class ID` = c(4438, 4439, 4428, 4437, 4430, 4431, 
4432, 4433, 4434, 4435, 4436, 4427, 4440, 4417, 4414, 4407, 4413, 
4412, 4418, 4410), city = c("Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Meridian", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Idaho Falls"), date = structure(c(1468972800, 1468972800, 
1468886400, 1468800000, 1468454400, 1468454400, 1468368000, 1468368000, 
1468368000, 1468281600, 1468281600, 1466553600, 1466553600, 1461283200, 
1460592000, 1460419200, 1460419200, 1460073600, 1460073600, 1459987200
), tzone = "UTC", class = c("POSIXct", "POSIXt")), year = c(2016, 
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016), month = c(7, 
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 4, 4, 4, 4, 4, 4, 4), day = c(20, 
20, 29, 18, 14, 14, 13, 13, 13, 12, 12, 22, 22, 22, 13, 12, 12, 
8, 8, 7), cit = c("Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Meridian", 
"Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
"Idaho Falls"), `Department/College` = c("College of Arts and Letters", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Arts and Letters", "Library", "Library", "Library", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Education", "Library", "Division of Health Sciecnes", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Arts and Letters", "College of Arts and Letters", 
"College of Arts and Letters", "College of Arts and Letters"), 
    `Course Level` = c("Lower Division", "Lower Division", "Lower Division", 
    "Lower Division", "Lower Division", "Lower Division", "K-12", 
    "K-12", "K-12", "Lower Division", "Lower Division", "Lower Division", 
    "K-12", "Graduate", "Lower Division", "Lower Division", "Lower Division", 
    "Lower Division", "Lower Division", "Lower Division"), `Course Title` = c("ACAD 1111", 
    "ACAD 1111", "POLS 1110", "ENGL 1123", "ACAD 1111", "ACAD 1111", 
    "Kid University", "Kid University", "Kid University", "ACAD 1111", 
    "ACAD 1111", "EDUC 1110", "Kid University", "Nursing_Orientation", 
    "ENGL 1102", "ENGL 1101", "ENGL 1101", "ENGL 1102", "ENGL 1102", 
    "ENGL 1102"), `Tour?` = c(FALSE, FALSE, FALSE, TRUE, FALSE, 
    FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, 
    FALSE, FALSE, TRUE, TRUE, FALSE), `TILT?` = c(FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
    ), `Date Taught` = structure(c(1468972800, 1468972800, 1468886400, 
    1468800000, 1468454400, 1468454400, 1468368000, 1468368000, 
    1468368000, 1468281600, 1468281600, 1466553600, 1466553600, 
    1461283200, 1460592000, 1460419200, 1460419200, 1460073600, 
    1460073600, 1459987200), tzone = "UTC", class = c("POSIXct", 
    "POSIXt")), `Session Number` = c("Third Session", "Third Session", 
    "Single Session", NA, "Second Session", "Second Session", 
    "Single Session", "Single Session", "Single Session", "First Session", 
    "First Session", "Single Session", "Single Session", "Single Session", 
    "Single Session", "Single Session", "First Session", "Third Session", 
    "Third Session", "Second Session"), `AM/PM` = c("AM", "PM", 
    "PM", "PM", "AM", "PM", "PM", "PM", "PM", "AM", "PM", "PM", 
    "PM", "AM", "PM", "PM", "AM", "AM", "AM", "AM"), `Hour Count` = c(1.5, 
    1.5, 1, 1.5, 1.5, 1.5, 0.5, 0.5, 1, 1.5, 1.5, 1.5, 1, 1, 
    1.5, 1.5, 1.5, 1, 1, 1.5), 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Cathy Gray", 
    NA, NA, NA, NA, "Monte Asche", "Philip Homan", NA), `Duplicate?` = c(FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, 
    FALSE), ACRL = c(0, 0, 7, 5, 0, 0, 7, 7, 7, 22, 9, 
    8, 13, 35, 19, 6, 8, 0, 0, 0), IPED = c(22, 9, 7, 5, 23, 
    9, 7, 7, 7, 22, 9, 8, 13, 35, 19, 6, 8, 19, 19, 22), `Location of Instructor` = c("Pocatello", 
    "Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
    "Pocatello", "Pocatello", "Pocatello", "Pocatello", "Pocatello", 
    "Pocatello", "Pocatello", "Meridian", "Pocatello", "Pocatello", 
    "Pocatello", "Pocatello", "Pocatello", "Idaho Falls"), `Building/Room` = c("LIBR 212", 
    "LIBR 212", "LIBR 212", "LIBR 212", "LIBR 212", "LIBR 212", 
    "Special Collections", "LIBR 212", "LIBR 212", "LIBR 212", 
    "LIBR 212", "LIBR 212", "LIBR 212", "Meridian", "LIBR 212", 
    "LIBR 212", "LIBR 212", "LIBR 212", "LIBR 212", "CHE 306"
    ), `Distance Class?` = c(FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), `Location of Site 1` = c("Boise", 
    "Boise", "Boise", "Boise", "Boise", "Boise", "Boise", "Boise", 
    "Boise", "Boise", "Boise", "Boise", "Boise", "Boise", "Boise", 
    "Boise", "Boise", "Boise", "Boise", "Boise"), `Site 1 Number of Students` = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    `Location of Site 2` = c("Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls", "Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls", "Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls", "Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls", "Idaho Falls", "Idaho Falls", "Idaho Falls", 
    "Idaho Falls"), `Site 2 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 3` = c("Twin Falls", 
    "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", 
    "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", 
    "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls", 
    "Twin Falls", "Twin Falls", "Twin Falls", "Twin Falls"), 
    `Site 3 Number of Students` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 4` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `Site 4 Number of Students` = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    `Location of Site 5` = c(NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_), `Site 5 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 6` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 6 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 7` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 7 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 8` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 8 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 9` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 9 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `Location of Site 10` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `Site 10 Number of Students` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))
>
uplii1fm

uplii1fm1#

OP,看起来你在如何从 *.csv导入数据并将其转换为你想要的图方面遇到了一些麻烦。既然你似乎能够创建一个图,我将掩盖这一部分,并引导你通过一个例子,一个很好的方法来导入数据,然后确保你可以转换到你的图。

导入.csv文件准备数据

我将从一个.csv文件开始,该文件是我使用您在问题中发布的df_example的输出创建的。我将该数据导出到 *.csv文件,现在我们可以导入它:

df <- read.csv('OP_example.csv')

导入数据后的第一步是确保它“看起来正确”并了解结构。即使是您自己创建的文件,确保df的外观也是非常重要的。在这里,head()str()summary()是您的朋友。

> head(df)
  X      city      dates classes     lt
1 1     Boise 2020-01-01      52  solid
2 2     Boise 2020-02-01      36  solid
3 3     Boise 2020-03-01      69  solid
4 4     Boise 2020-04-01     100  solid
5 5     Boise 2020-05-01      72  solid
6 6 Pocatello 2020-01-01      82 dashed

> str(df)
'data.frame':   25 obs. of  5 variables:
 $ X      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ city   : chr  "Boise" "Boise" "Boise" "Boise" ...
 $ dates  : chr  "2020-01-01" "2020-02-01" "2020-03-01" "2020-04-01" ...
 $ classes: int  52 36 69 100 72 82 15 68 17 51 ...
 $ lt     : chr  "solid" "solid" "solid" "solid" ...

您可以看到,在编写 *.csv文件时,它创建了一个“X”列,这只是行号。没什么大不了的我们也让其他一切看起来都很好,除了你会注意到df$dates是作为chr读取的,而不是作为Date或其他类似日期的类。由于我要使用此列创建一个图,因此需要将其作为日期:

> df$dates <- as.Date(df$dates, format='%Y-%m-%d')

> str(df)
'data.frame':   25 obs. of  5 variables:
 $ X      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ city   : chr  "Boise" "Boise" "Boise" "Boise" ...
 $ dates  : Date, format: "2020-01-01" "2020-02-01" "2020-03-01" "2020-04-01" ...
 $ classes: int  52 36 69 100 72 82 15 68 17 51 ...
 $ lt     : chr  "solid" "solid" "solid" "solid" ...

请注意,我为日期指定了format=。您将找到与strptime()函数的format=within the documentation相关的%命名法的信息。当我在df上再次运行str()时,您将看到df$dates现在是Date类,而不是chr

绘图

现在对于绘图,只需确保您正在阅读和绘制正确的 Dataframe 。从您的代码示例中…您正在使用df_example绘图,但在df中阅读。我不确定这是不是一个错字。
您的首选项似乎是使用pipe %>%命令,而不是在ggplot()中声明框架,所以我将在这里这样做:

df %>%
  ggplot(aes(x=dates, y=classes, color=city)) +
  geom_line() + geom_point() + theme_bw()

为您提供:

希望能帮到你。由于我们没有您的特定 *.csv文件,并且您在绘制特定数据框时没有遇到问题,因此您遇到困难的最合理的地方是确保在阅读文件时,数据的列和类采用您期望的格式。此外,请确保您的代码正在调用以绘制正确的 Dataframe 。

rur96b6h

rur96b6h2#

聚合与绘图

dplyr允许轻松聚合数据。此代码将创建一个新的数据集,其中包含“Location”变量的每个值在城市和日期的每个唯一组合中出现的次数:

library(dplyr)

filtered_agg <- filtered %>%
  group_by(city, dates, Location) %>%
  summarize(location_sum=n()) 

filtered_agg

对于情节,像这样的东西应该会给予你一个结果:

Plot = ggplot(filtered_agg, aes(x=dates, y=location_sum, group = city)) + geom_line(aes(linetype=city, color = city)) + geom_point(aes(color=city)) 

Plot

但是对于一个简单的线图来说,你似乎有太多的维度。如果城市的数量(你也可以切换city和location_sum)不是太大,facet_wrap将使图更可读:

ggplot(filtered_agg, aes(x=dates, y=location_sum)) + geom_line(aes(linetype=Location, color = Location)) + geom_point(aes(color=Location)) + facet_wrap(~city)

加载数据

log = df$city是否工作(如果不工作,它将返回错误消息)?如果是的话,那就好像你想多了。您可以跳过创建df_example所涉及的步骤,直接在ggplot命令中使用df

library(readxl)
library(ggplot2)

df <- read_excel("C:/Users/12083/Desktop/librarydata.xlsx")

df %>%
   ggplot(aes(dates,classes, color=city))+
   geom_line(aes(linetype=lt))+ 
   scale_linetype_identity()+ #this removes the linetype from the legend
   theme_minimal()

如果这不起作用,您可能需要调整read_excel命令中的选项。

相关问题