基于 \ 拆分数据框字符串

问题描述

我正在尝试将数据框中的字符串列拆分为基于 .我的数据看起来像这样：

     Rk Player  Season   Age Tm    Lg       WS     G    GS    MP    FG   FGA  `2P` `2PA`  `3P` `3PA`    FT   FTA   ORB   DRB   TRB   AST   STL   BLK   TOV    PF
  <dbl> <chr>   <chr>  <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1 "Lebro~ 2010-~    26 MIA   NBA    15.6    79    79  3063   758  1485   666  1206    92   279   503   663    80   510   590   554   124    50   284   163
2     2 "Pau G~ 2010-~    30 LAL   NBA    14.7    82    82  3037   593  1120   592  1117     1     3   354   430   268   568   836   273    48   130   142   203

我尝试操作的列是“玩家”列。该列的条目看起来像这样。

"Lebron James\\jamesle01"        "Pau Gasol\\gasolpa01"           "Dwight Howard\\howardw01"

我想把它分成两列 pName 和 pID

pName            pID

Lebron James    jamesle01
Pau Gasol       gasolpa01
Dwight Howard   howardw01

我曾尝试使用 gsub、sub、str_replace 和 separate，但一直无法弄清楚

player_stats2010 %>%
+   str_replace(Player,"\\\.*","")

player_stats2010 %>%
+   sub("\\\","",player_stats2010$Player)

player_stats2010 %>%
+   sub("\.*",player_stats2010$Player)

player_stats2010 %>%
+   gsub("\.*",player_stats2010$Player)

player_stats2010_test <- player_salaries_2010 %>%
+   separate(Player,c("pName","pID"),"\")

尽管看了网上和其他几个问题，但我真的不明白这个问题的语法。如果您能帮助我理解我不理解的内容，那就太棒了。非常感谢:)

解决方法

1) read.table 使用末尾注释中的 x 并且只有基数 R 使用 read.table，如下所示：

read.table(text = x,sep = "\\",col.names = c("pName","pID"))

给予：

          pName       pID
1  LeBron James jamesle01
2     Pau Gasol gasolpa01
3 Dwight Howard howardw01

2) 整理

使用 tidyr 我们可以做到这一点：

library(tidyr)
data.frame(x) %>%
  separate(x,c("pName","pID"),sep = r"{\\}")

注意

假设输入为：

x <- c("LeBron James\\jamesle01","Pau Gasol\\gasolpa01","Dwight Howard\\howardw01")

您可以使用 str_extract 包中的 stringr 以及不包含 \\ 的字符类：

library(stringr)
player_stats$pName <- str_extract(player_stats$Player,"^[\\w\\s]+")
player_stats$pID <- str_extract(player_stats$Player,"[\\w\\s]+$")

在这两种情况下，您都定义了一个仅允许出现字母 (\\w) 和空白字符 (\\s) 的字符类，这两个变量之间的区别在于 pName 查找的是模式从字符串开头 (^) 开始，而 pID 从字符串末尾 ($) 开始查找。

结果：

player_stats
                    Player         pName       pID
1  LeBron James\\jamesle01  LeBron James jamesle01
2     Pau Gasol\\gasolpa01     Pau Gasol gasolpa01
3 Dwight Howard\\howardw01 Dwight Howard howardw01

数据：

player_stats <- data.frame(
  Player = c("LeBron James\\jamesle01","Dwight Howard\\howardw01"))

r r string string tidyr

基于 \ 拆分数据框字符串

问题描述

解决方法

注意

相关问答