我有一个数据集,其中一列值的格式为:“"'a ':1,' b ':2,'c':3}”我想把它转换成一个矩阵,或者至少列出:
| 项目a|B|(c)秘书长的报告|
| - ------|- ------|- ------|
| 1个|第二章|三个|
我试过在str_extract()
中使用RegEx,这使我能够非常一致地提取名称和值,尽管可能还有一些我还没有遇到的边缘情况。
str_extract_all(df[row, "tags"], "[a-zA-Z- ]{2,}|[0-9]+[']*s", simplify = TRUE)
这对价值观是有效的:
str_extract_all(str_extract_all(df[row, "tags"], "[0-9]+[,}]+", simplify = T), "[0-9]+", simplify = T)
虽然我知道可能有比嵌套提取更好的方法,但到目前为止,这是我想到的所有方法。实际上,获取这些值并通过编程将它们转换为矩阵是难倒我的。
编辑:数据集是一个 Dataframe ,可以在here中找到。特别是“steamspy_data”csv。我正在尝试将“tag”列从一个字符串转换为单独的行或某种列表,以便可以轻松分析标记及其关联值(频率)之间的关系。
> head(steam_super[,"tags"], 5)
[1] "{'Action': 2681, 'FPS': 2048, 'Multiplayer': 1659, 'Shooter': 1420, 'Classic': 1344, 'Team-Based': 943, 'First-Person': 799, 'Competitive': 790, 'Tactical': 734, \"1990's\": 564, 'e-sports': 550, 'PvP': 480, 'Military': 367, 'Strategy': 329, 'Score Attack': 200, 'Survival': 192, 'Old School': 164, 'Assassin': 151, '1980s': 144, 'Violent': 40}"
[2] "{'Action': 208, 'FPS': 188, 'Multiplayer': 172, 'Classic': 152, 'Shooter': 134, 'Class-Based': 124, 'Team-Based': 115, 'First-Person': 109, \"1990's\": 71, 'Co-op': 62, 'Competitive': 48, 'Old School': 46, 'Fast-Paced': 39, 'Online Co-Op': 28, 'Retro': 27, 'Remake': 27, 'Violent': 26, 'Mod': 24, 'Funny': 20, 'Adventure': 15}"
[3] "{'FPS': 138, 'World War II': 122, 'Multiplayer': 115, 'Action': 99, 'Shooter': 95, 'War': 80, 'Team-Based': 79, 'Classic': 61, 'Class-Based': 55, 'First-Person': 50, 'Historical': 28, 'Military': 19, 'Singleplayer': 16, 'Tactical': 14, 'Co-op': 12, 'World War I': 5}"
[4] "{'Action': 85, 'FPS': 71, 'Multiplayer': 58, 'Classic': 50, 'Shooter': 49, 'First-Person': 33, 'Arena Shooter': 22, 'Sci-fi': 16}"
[5] "{'FPS': 235, 'Action': 211, 'Sci-fi': 166, 'Singleplayer': 148, 'Classic': 146, 'Shooter': 144, 'First-Person': 126, 'Aliens': 122, 'Adventure': 87, \"1990's\": 77, 'Atmospheric': 73, 'Military': 50, 'Story Rich': 40, 'Silent Protagonist': 33, 'Co-op': 27, 'Great Soundtrack': 25, 'Puzzle': 18, 'Gore': 18, 'Moddable': 16, 'Masterpiece': 16}"
dput
:
structure(list(appid = 10L, type = "game", name = "Counter-Strike",
is_free = "False", dlc = "", detailed_description = "Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.",
about_the_game = "Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.",
short_description = "Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.",
fullgame = NA, developers = "['Valve']", publishers = "['Valve']",
price_overview = "{'currency': 'GBP', 'initial': 719, 'final': 719, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': '£7.19'}",
platforms = "{'windows': True, 'mac': True, 'linux': True}",
metacritic = "{'score': 88, 'url': 'https://www.metacritic.com/game/pc/counter-strike?ftag=MCD-06-10aaa1f'}",
reviews = "", categories = "[{'id': 1, 'description': 'Multi-player'}, {'id': 36, 'description': 'Online Multi-Player'}, {'id': 37, 'description': 'Local Multi-Player'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]",
genres = "[{'id': '1', 'description': 'Action'}]", release_date = "{'coming_soon': False, 'date': '1 Nov, 2000'}",
content_descriptors = "{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}",
developer = "Valve", publisher = "Valve", score_rank = NA_integer_,
positive = 124534L, negative = 3339L, userscore = 0L, owners = "10,000,000 .. 20,000,000",
average_forever = 17612L, average_2weeks = 709L, median_forever = 317L,
median_2weeks = 26L, price = 999L, initialprice = 999L, discount = 0L,
languages = "English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean",
genre = "Action", ccu = 14923L, tags = "{'Action': 2681, 'FPS': 2048, 'Multiplayer': 1659, 'Shooter': 1420, 'Classic': 1344, 'Team-Based': 943, 'First-Person': 799, 'Competitive': 790, 'Tactical': 734, \"1990's\": 564, 'e-sports': 550, 'PvP': 480, 'Military': 367, 'Strategy': 329, 'Score Attack': 200, 'Survival': 192, 'Old School': 164, 'Assassin': 151, '1980s': 144, 'Violent': 40}"), row.names = 1L, class = "data.frame")
1条答案
按热度按时间km0tfn4u1#
使用jsonlite或rjson包来处理json数据。我们将任何在{或空格后面的“转换为“,并将任何”后面的冒号也转换为“。然后应用
fromJSON
。给出:
另一种方法是将输入转换为dcf格式,然后运行
read.dcf
:给出: