📜  如何在R中按组计算与数据帧前一行的时间差

📅  最后修改于: 2022-05-13 01:54:57.561000             🧑  作者: Mango

如何在R中按组计算与数据帧前一行的时间差

数据帧可能由属于组的不同值组成。这些列可能具有属于不同数据类型或时间范围的值作为 POSIXct 对象。这些对象允许轻松应用数学运算,可以通过以下方式执行:

方法一:使用dplyr包

group_by 方法用于根据特定列中包含的组划分和隔离日期。所需的分组依据列被指定为该函数的参数。它可能包含多个列名。

句法:

接下来是 mutate() 方法的应用,该方法用于移动方向并在数据中执行操作。可以使用新列名指定新列名。可以使用该库的 lag() 方法计算与前一行的差异。此方法在向量中查找先前的值。



任何组的第一次出现都由 NA 值替换。

例子:

R
library(dplyr)
  
# creating a dataframe
data_frame <- data.frame(col1 = sample(6:9, 5 , replace = TRUE),
                         col3 =  c(as.POSIXct("2021-05-08 08:32:07"),
                                   as.POSIXct("2021-07-18 00:21:07"),
                                   as.POSIXct("2020-11-28 23:32:09"),
                                   as.POSIXct("2021-05-11 18:32:07"),
                                   as.POSIXct("2021-05-08 08:32:07"))
                         )
print ("Original DataFrame")
print (data_frame)
  
# comouting difference of each group
data_frame %>%
  arrange(col1, col3) %>%
  group_by(col1) %>%
  mutate(diff = col3 - lag(col3))


R
# creating a dataframe
data_frame <- data.frame(col1 = sample(6:9, 5 , replace = TRUE),
                         col3 =  c(as.POSIXct("2021-05-08 08:32:07"),
                                   as.POSIXct("2021-07-18 00:21:07"),
                                   as.POSIXct("2020-11-28 23:32:09"),
                                   as.POSIXct("2021-05-11 18:32:07"),
                                   as.POSIXct("2021-05-08 08:32:07"))
                         )
print ("Original DataFrame")
print (data_frame)
  
# comouting difference of each group
data_frame$diff <- unlist(tapply(data_frame$col3, INDEX = data_frame$col1,
                          FUN = function(x) c(0, `units<-`(diff(x), "secs"))))
                                   
print ("Modified DataFrame")
print (data_frame)


R
library("data.table")
  
# creating a dataframe
dt <- data.table(col1 = sample(6:9, 5 , replace = TRUE),
                         col3 =  c(as.POSIXct("2021-05-08 08:32:07"),
                                   as.POSIXct("2021-07-18 00:21:07"),
                                   as.POSIXct("2020-11-28 23:32:09"),
                                   as.POSIXct("2021-05-11 18:32:07"),
                                   as.POSIXct("2021-05-08 08:32:07"))
                         )
print ("Original DataFrame")
print (dt)
  
# comouting difference of each group
dt[, diff := difftime(col3, shift(col3, fill=col3[1L]),
                      units="secs"), by=col1]
  
print ("Modified DataFrame")
print (dt)


输出

[1] "Original DataFrame" 
   col1                col3
1    8 2021-05-08 08:32:07 
2    8 2021-07-18 00:21:07 
3    7 2020-11-28 23:32:09 
4    6 2021-05-11 18:32:07 
5    7 2021-05-08 08:32:07 
# A tibble: 5 x 3 
# Groups:   col1 [3]    
           col1 col3                diff            
                                
1     6 2021-05-11 18:32:07       NA secs 
2     7 2020-11-28 23:32:09       NA secs 
3     7 2021-05-08 08:32:07 13856398 secs 
4     8 2021-05-08 08:32:07       NA secs 
5     8 2021-07-18 00:21:07  6104940 secs

方法二:使用tapply方法

tapply() 方法用于在列表或数据框对象上应用函数。指定的函数(可以是用户定义的或预定义的)应用于数据帧对象的每个单元格。



在这种情况下,函数是计算时间范围内的差异,单位为秒。组中遇到的值的所有第一个实例都被零替换。

例子:

电阻

# creating a dataframe
data_frame <- data.frame(col1 = sample(6:9, 5 , replace = TRUE),
                         col3 =  c(as.POSIXct("2021-05-08 08:32:07"),
                                   as.POSIXct("2021-07-18 00:21:07"),
                                   as.POSIXct("2020-11-28 23:32:09"),
                                   as.POSIXct("2021-05-11 18:32:07"),
                                   as.POSIXct("2021-05-08 08:32:07"))
                         )
print ("Original DataFrame")
print (data_frame)
  
# comouting difference of each group
data_frame$diff <- unlist(tapply(data_frame$col3, INDEX = data_frame$col1,
                          FUN = function(x) c(0, `units<-`(diff(x), "secs"))))
                                   
print ("Modified DataFrame")
print (data_frame)

输出

[1] "Original DataFrame" 
col1                col3 
1    7 2021-05-08 08:32:07 
2    6 2021-07-18 00:21:07 
3    8 2020-11-28 23:32:09 
4    7 2021-05-11 18:32:07 
5    6 2021-05-08 08:32:07 
[1] "Modified DataFrame" 
col1                col3     diff 
1    7 2021-05-08 08:32:07        0 
2    6 2021-07-18 00:21:07 -6104940 
3    8 2020-11-28 23:32:09        0 
4    7 2021-05-11 18:32:07   295200 
5    6 2021-05-08 08:32:07        0

方法 3:使用 data.table

可以添加一个新列来计算 data.table 的行之间的时间差。 difftime() 方法可用于计算这种差异。它用于计算时间间隔或差异。

为了找到下一个时区值,即要在 difftime() 中应用的 t2,使用 shift() 方法在指定的输入向量或列表中引入超前或滞后。

by 属性按指定的列名添加到数据组中。

例子:

电阻

library("data.table")
  
# creating a dataframe
dt <- data.table(col1 = sample(6:9, 5 , replace = TRUE),
                         col3 =  c(as.POSIXct("2021-05-08 08:32:07"),
                                   as.POSIXct("2021-07-18 00:21:07"),
                                   as.POSIXct("2020-11-28 23:32:09"),
                                   as.POSIXct("2021-05-11 18:32:07"),
                                   as.POSIXct("2021-05-08 08:32:07"))
                         )
print ("Original DataFrame")
print (dt)
  
# comouting difference of each group
dt[, diff := difftime(col3, shift(col3, fill=col3[1L]),
                      units="secs"), by=col1]
  
print ("Modified DataFrame")
print (dt)

输出

[1] "Original DataFrame" 
col1                col3 
1:    7 2021-05-08 08:32:07 
2:    7 2021-07-18 00:21:07 
3:    8 2020-11-28 23:32:09 
4:    8 2021-05-11 18:32:07 
5:    8 2021-05-08 08:32:07 
[1] "Modified DataFrame" 
col1                col3         diff 
1:    7 2021-05-08 08:32:07        0 secs 
2:    7 2021-07-18 00:21:07  6104940 secs 
3:    8 2020-11-28 23:32:09        0 secs 
4:    8 2021-05-11 18:32:07 14151598 secs 
5:    8 2021-05-08 08:32:07  -295200 secs