自动导出给定 data.frame 中所有时间序列的图形,逐列,每 X 个数据点

问题描述

非常感谢您的帮助。

非常感谢你,我希望你能帮助我,因为我一个人做不到这样的事情。如果我问的不合适,我真的很抱歉,我提前道歉。

我需要目视检查两个时间序列的大量图表(数千个)。两者之一是固定的(让我们将其命名为 GW)并且不会改变,而其他人(让我们将它们命名为 Pz1、Pz2、Pz3、...Pz n)对于每个图形都会不同:基本上,我需要在视觉上进行比较GW 与所有可能的 Pz,给定固定长度(即 10 个数据点)。

数据采用帧格式,即:

structure(list(GW = c(0.08,0.04,0.35,0.54,0.39,0.94,0.51,0.01,0.44,0.63,0.14,0.79,0.43,0.73,NA,NA),Pz1 = c(-2.459826614,-2.905007821,-2.241113224,-1.549264338,-1.761962438,-1.282612221,-0.428828702,1.659042479,2.63518648,3.076022461,1.886216859,0.124473561,-1.025720789,-1.969461882,-2.216075605,-2.508858567,-2.495831241,-2.36513321,-2.295002655,-2.103511241,-1.009051105,-0.619616011,-0.941517493
),Pz2 = c(-2.916090168,-3.262459435,-2.455396094,-1.488106654,-0.417171756,-1.781014095,-0.605012986,1.037062685,1.977265974,2.587846362,2.499228916,1.0852274,-0.503736287,-1.829562138,-2.410220303,-2.135932209,-2.105646454,-2.79131286,-2.446274505,-2.412041743,-1.467408853,-1.207262381,-1.286558601),Pz3 = c(-2.507944967,-3.718722989,-2.812847708,-1.702389524,-0.356014073,-0.436223413,-1.10341486,0.860878402,1.355286181,1.929925855,2.011052817,1.698239458,0.457017552,-1.307577636,-2.270320559,-2.330076907,-1.732720095,-2.401128073,-2.872454155,-2.563313593,-1.775939355,-1.665620129,-1.874204971),Pz4 = c(-2.526729696,-3.310577787,-3.269111262,-2.059841138,-0.570296943,-0.375065729,0.241375822,0.362476528,1.179101897,1.307946062,1.35313231,1.210063358,1.07002961,-0.346823797,-1.748336058,-2.190177163,-1.926864794,-2.028201714,-2.482269368,-2.989493243,-1.927211205,-1.974150631,-2.332562719),Pz5 = c(-3.284551238,-3.329362517,-2.86096606,-2.516104692,-0.927748557,-0.5893486,0.302533506,1.70726721,0.680700023,1.131761778,0.731152517,0.552142851,0.58185351,0.266188261,-0.787582218,-1.668192661,-1.78696505,-2.222346412,-2.109343009,-2.599308456,-2.353390855,-2.125422481,-2.641093221
),Pz6 = c(-4.011896321,-4.087184059,-2.87975079,-2.107959491,-1.38401211,-0.946800213,0.088250636,1.768424893,2.025490705,0.633359905,0.554968233,-0.069836942,-0.076066996,-0.221987839,-0.174570161,-0.707438822,-1.264980548,-2.082446669,-2.303487708,-2.226382097,-1.963206068,-2.551602131,-2.792365071),Pz7 = c(-4.769878994,-4.814529142,-3.637572331,-2.12674422,-0.975866909,-1.403063767,-0.269200978,1.554142023,2.086648388,1.978150587,0.05656636,-0.246021225,-0.698046789,-0.879908346,-0.66274626,-0.094426764,-0.304226709,-1.560462167,-2.163587964,-2.420526796,-1.59027971,-2.161417344,-3.218544722)),class = "data.frame",row.names = c(NA,-23L))

基本上,我正在寻找可以自动化整个过程并将所有图形单独保存在一个文件夹中的东西,以便我可以耐心地以图形方式检查它们。

最重要的是,A 和 B...,在绘制之前应该归一化(Z-score)(即 Z=x-mean/Standard Deviation),应该包括某种坐标,以便我可以找到它们在矩阵中(即图 #4=B5-B8,理想情况下(这将是一个梦想)还有 spearman 相关系数和 RMSE 值(请查看底部图像的链接)。

Schematic summary of what I am trying to achieve

不幸的是,我不知道从哪里开始,非常感谢您的帮助。

如果它令人困惑,我很抱歉,如果您需要进一步澄清,请随时告诉我,我会尽量解释得更好。

非常感谢!

解决方法

我可以帮助您解决问题的第一部分 - 自动保存绘图。


library(tidyverse)

# this is your data,I have called it 'my_data'
my_data <- structure(list(GW = c(0.08,0.04,0.35,0.54,0.39,0.94,0.51,0.01,0.44,0.63,0.14,0.79,0.43,0.73,NA,NA),Pz1 = c(-2.459826614,-2.905007821,-2.241113224,-1.549264338,-1.761962438,-1.282612221,-0.428828702,1.659042479,2.63518648,3.076022461,1.886216859,0.124473561,-1.025720789,-1.969461882,-2.216075605,-2.508858567,-2.495831241,-2.36513321,-2.295002655,-2.103511241,-1.009051105,-0.619616011,-0.941517493),Pz2 = c(-2.916090168,-3.262459435,-2.455396094,-1.488106654,-0.417171756,-1.781014095,-0.605012986,1.037062685,1.977265974,2.587846362,2.499228916,1.0852274,-0.503736287,-1.829562138,-2.410220303,-2.135932209,-2.105646454,-2.79131286,-2.446274505,-2.412041743,-1.467408853,-1.207262381,-1.286558601),Pz3 = c(-2.507944967,-3.718722989,-2.812847708,-1.702389524,-0.356014073,-0.436223413,-1.10341486,0.860878402,1.355286181,1.929925855,2.011052817,1.698239458,0.457017552,-1.307577636,-2.270320559,-2.330076907,-1.732720095,-2.401128073,-2.872454155,-2.563313593,-1.775939355,-1.665620129,-1.874204971),Pz4 = c(-2.526729696,-3.310577787,-3.269111262,-2.059841138,-0.570296943,-0.375065729,0.241375822,0.362476528,1.179101897,1.307946062,1.35313231,1.210063358,1.07002961,-0.346823797,-1.748336058,-2.190177163,-1.926864794,-2.028201714,-2.482269368,-2.989493243,-1.927211205,-1.974150631,-2.332562719),Pz5 = c(-3.284551238,-3.329362517,-2.86096606,-2.516104692,-0.927748557,-0.5893486,0.302533506,1.70726721,0.680700023,1.131761778,0.731152517,0.552142851,0.58185351,0.266188261,-0.787582218,-1.668192661,-1.78696505,-2.222346412,-2.109343009,-2.599308456,-2.353390855,-2.125422481,-2.641093221),Pz6 = c(-4.011896321,-4.087184059,-2.87975079,-2.107959491,-1.38401211,-0.946800213,0.088250636,1.768424893,2.025490705,0.633359905,0.554968233,-0.069836942,-0.076066996,-0.221987839,-0.174570161,-0.707438822,-1.264980548,-2.082446669,-2.303487708,-2.226382097,-1.963206068,-2.551602131,-2.792365071),Pz7 = c(-4.769878994,-4.814529142,-3.637572331,-2.12674422,-0.975866909,-1.403063767,-0.269200978,1.554142023,2.086648388,1.978150587,0.05656636,-0.246021225,-0.698046789,-0.879908346,-0.66274626,-0.094426764,-0.304226709,-1.560462167,-2.163587964,-2.420526796,-1.59027971,-2.161417344,-3.218544722)),class = "data.frame",row.names = c(NA,-23L))


#~~~ step 1: transform the data from 'wide' format to 'long' format ~~~

my_data %>%
    pivot_longer(names_to = 'key',values_to = 'value',-GW) %>%
    {. ->> my_data_long}

# the data now looks like this:
# this 'long' format reduces the number of columns,and increases the number of rows. 
# Each row now refers to which column the data came from,and the value from the column,for each value of GW
  # GW    key   value
  # <dbl> <chr> <dbl>
  # 0.08  Pz1   -2.46
  # 0.08  Pz2   -2.92
  # 0.08  Pz3   -2.51
  # with many more rows.....


#~~~ step 2: write a function to plot the data and save the plot ~~~

saveplot_functionx <- function(i){

    # first we make the plot

     # we take the long format data    
     my_plot <- my_data_long %>%  

      # then we keep only one 'key' (column in the 'wide' data format)
      filter(  
        key == i
      ) %>%

      # and make a plot (you didnt specify the type of plot,but this can be changed as you please)
      ggplot(aes(GW,value))+
      geom_point()+
      ylab(i)     

    # then we save the plot (including variable name in title)
    #***NOTE*** this saves plots into a new folder 'my_folder',which you need to create in your working directory
    ggsave(paste0('my_folder/my_plot_',i,'.pdf'),my_plot)

  }


#~~~ step 3: loop through all the values of 'key' (column names in the wide format data) ~~~

 for(i in unique(my_data_long$key)){
    
    saveplot_function(i)
    
  }


当然,您可以自定义绘图,包括标签、标题、颜色、大小、格式等,但这应该能让您了解如何实现。

我不完全了解您在规范化数据等方面的追求,因此我会将这部分留给比我更有用的人。

,

非常感谢您的帮助! 通过运行这段代码,我设法为每个 Pz 自动导出一个图形(这很棒!)。然而,这并不是我所需要的(我很抱歉,我没有清楚地解释我想要的是什么)。

所以基本上 GW 和 Pz 都是 Y 轴显示为 geom_line(),而 X 轴可以是任何东西,这并不重要。

现在,运行您编写的代码,为每个 Pz 生成一个图。是否可以为每个 Pz 间隔获得多个图? (即,Plot#1= GW(固定),Pz1(前 14 个数字),Plot#2= GW(固定),Pz1(14 个数字比前一个数字移动 1),Plot#3= GW(固定),Pz1 (14 个数字从前一个偏移 1),Plot#4= GW(固定),Pz1(14 个数字从前一个偏移 1)......等等......当列(或行的情况下)数据以'long'格式转换)完成,它从旁边的一个开始(即完成所有Pz1,并从Pz2开始)。

请参阅随附的 Excel 文档进行说明!

https://easyupload.io/pvxohj

非常感谢您的帮助,我真的很感激!