如何在一个数据帧中从不同时间序列获取趋势值?

问题描述

以下是示例数据:

df = (pd.crosstab(df['items'],df['status'],margins=True,margins_name='count')
       .rename_axis(columns=None,index='unique')
       .iloc[:-1]
       .reset_index())
print (df)
  unique  fail  pass  count
0   bike     1     2      3
1    car     2     1      3
2    jet     4     0      4

很明显如何获取客户ID之一的趋势值:

data<-read.table(textConnection('customer_ID transaction_num sales
                                    Josh         1              $35
                                    Josh         2              $50
                                    Josh         3              $65
                                    Ray          1              $65
                                    Ray          2              $52
                                    Ray          3              $49
                                    Eric         1              $10 
                                    Eric         2              $13
                                    Eric         3              $9'),header=TRUE,stringsAsFactors=FALSE)
    
  
  data$sales<-as.numeric(sub('\\$','',data$sales))

但是我想为每个客户ID的“趋势值”列提供一个数据框,而不是“销售额”列或紧邻它。要获得类似的东西:

  dataTransformed<-dcast(data,transaction_num ~ customer_ID,value.var="sales",fun.aggregate=sum)
  transaction_num Eric Josh Ray
               1   10   35  65
               2   13   50  52
               3    9   65  49
  fitted(lm(dataTransformed$Eric ~ dataTransformed$transaction_num))
    1        2        3 
11.16667 10.66667 10.16667 

任何帮助都会非常有用。谢谢

解决方法

您可以简单地将lm与一个交互项相结合:

data$trend <- fitted(lm(sales ~ customer_ID * transaction_num,data))
data
#>   customer_ID transaction_num sales    trend
#> 1        Josh               1    35 35.00000
#> 2        Josh               2    50 50.00000
#> 3        Josh               3    65 65.00000
#> 4         Ray               1    65 63.33333
#> 5         Ray               2    52 55.33333
#> 6         Ray               3    49 47.33333
#> 7        Eric               1    10 11.16667
#> 8        Eric               2    13 10.66667
#> 9        Eric               3     9 10.16667
,

您可以为每个lm申请customer_ID

library(dplyr)

data %>% 
    group_by(customer_ID) %>% 
    mutate(trend = fitted(lm(sales ~ transaction_num)))

#  customer_ID transaction_num sales trend
#  <chr>                 <int> <dbl> <dbl>
#1 Josh                      1    35  35  
#2 Josh                      2    50  50. 
#3 Josh                      3    65  65  
#4 Ray                       1    65  63.3
#5 Ray                       2    52  55.3
#6 Ray                       3    49  47.3
#7 Eric                      1    10  11.2
#8 Eric                      2    13  10.7
#9 Eric                      3     9  10.2