比方說,我想知道四名籃球運動員中哪一名是最好的,我設立了一個小比賽,兩名球員1對1比賽,我記錄了一組數(shù)據(jù)
#rm(list=ls())
set.seed(1234)
# some made up scores from my tournament
df <- data.frame(
player1 = c("a", "a", "b", "c", "d", "d"),
player2 = c("b", "c", "d", "b", "a", "c"),
date = c("2021-01-01", "2021-01-02", "2021-01-04", "2021-01-05", "2021-01-06", "2021-01-08"),
p1_dunks = sample(c(4:11), 6, replace = TRUE),
p2_dunks = sample(c(3:12), 6, replace = TRUE),
p1_blocks = sample(c(8:10), 6, replace = TRUE),
p2_blocks = sample(c(10:12), 6, replace = TRUE),
p1_threepointers = sample(c(2:7), 6, replace = TRUE),
p2_threepointers = sample(c(1:5), 6, replace = TRUE)
)
為了計算一個球員在比賽的任何一點上表現(xiàn)得有多好,我可以將其旋轉兩次,并用每個統(tǒng)計的累積和替換每個統(tǒng)計的計數(shù)
# cast to long and get cumulative stats per player
melted_df <- df %>%
pivot_longer(cols = starts_with(c("p1", "p2")), names_to = "stat", values_to = "number") %>%
pivot_longer(cols = starts_with("player"), names_to = "player", values_to = "name") %>%
filter(
(player == "player1" & grepl("^p1", stat)) |
(player == "player2" & grepl("^p2", stat))
) %>%
arrange(date) %>%
group_by(player, stat) %>%
mutate(number = cumsum(number))
那么我就可以很容易地對此提出質疑了
melted_df %>%
filter(date < "2021-01-05") %>%
filter(!duplicated(name)) %>%
filter(grepl("dunks$", stat))
但是假設在我的用例中,我需要將這個長格式的數(shù)據(jù)強制恢復到其原始形式(使用player1、player2,然后是每個player1和player2的統(tǒng)計數(shù)據(jù))。我可以試試
# try to cast back to original format
back_to_wider_df <- melted_df %>%
pivot_wider(names_from = "player", values_from = "name") %>%
pivot_wider(names_from = "stat", values_from = "number")
但這反而會給出一個數(shù)據(jù)幀,該數(shù)據(jù)幀是每匹配一行的“偏移量”,其中有一半的NA值:
> head(back_to_wider_df)
# A tibble: 6 × 9
date player1 player2 p1_dunks p1_blocks p1_threepointers p2_dunks p2_blocks p2_threepointers
<chr> <chr> <chr> <int> <int> <int> <int> <int> <int>
1 2021-01-01 a NA 7 9 6 NA NA NA
2 2021-01-01 NA b NA NA NA 11 11 4
3 2021-01-02 a NA 18 18 9 NA NA NA
4 2021-01-02 NA c NA NA NA 18 22 8
5 2021-01-04 b NA 23 27 15 NA NA NA
6 2021-01-04 NA d NA NA NA 26 32 11
是否有一種簡單的方法將其修復回原始形狀,以便前三行應為:
> df
date player1 player2 p1_dunks p1_blocks p1_threepointers p2_dunks p2_blocks p2_three_pointers
1 2021-01-01 a b 7 9 6 11 11 4
2 2021-01-02 a c 18 18 9 18 22 8
3 2021-01-04 b d 23 27 15 26 32 11
thanks,
一種方法是使用
lead
函數(shù)并刪除NA