如何从 DelimitedFiles.readdlm() 对象创建数据框?

问题描述

我正在尝试按如下方式创建 DataFrame:

[root@srvr0 ~]# julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help,"]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.4.1 (2020-04-14)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |
    
julia> using DataFrames

julia> using DelimitedFiles
    
julia> P,H = readdlm("programminglanguages.csv",',';header=true);

julia> P
73×2 Array{Any,2}:
 1951  "Regional Assembly Language"
 1952  "Autocode"
 1954  "IPL"
 1955  "FLOW-MATIC"
 1957  "FORTRAN"
 1957  "COMTRAN"
 1958  "LISP"
 1958  "ALGOL 58"
 1959  "FACT"
 1959  "COBOL"
 1959  "RPG"
 1962  "APL"
 1962  "Simula"
 1962  "snobol"
 1963  "CPL"
 1964  "Speakeasy"
 1964  "BASIC"
 1964  "PL/I"
 1966  "JOSS"
 1967  "BCPL"
 1968  "logo"
 1969  "B"
 1970  "Pascal"
 1970  "Forth"
    ⋮  
 1995  "Ada 95"
 1995  "Java"
 1995  "Delphi "
 1995  "JavaScript"
 1995  "PHP"
 1997  "Rebol"
 2000  "ActionScript"
 2001  "C#"
 2001  "D"
 2002  "Scratch"
 2003  "Groovy"
 2003  "Scala"
 2005  "F#"
 2006  "PowerShell"
 2007  "Clojure"
 2009  "Go"
 2010  "Rust"
 2011  "Dart"
 2011  "Kotlin"
 2011  "Red"
 2011  "Elixir"
 2012  "Julia"
 2014  "Swift"

julia> H
1×2 Array{AbstractString,2}:
 "year"  "language"

julia> typeof(P)
Array{Any,2}

julia> typeof(H)
Array{AbstractString,2}

julia> vec(H)
2-element Array{AbstractString,1}:
 "year"
 "language"

julia> typeof(vec(H))
Array{AbstractString,1}

julia> DataFrame(P,H)

但我收到以下错误

ERROR: MethodError: no method matching DataFrame(::Array{Any,2},::Array{AbstractString,2})
Closest candidates are:
  DataFrame(::AbstractArray{T,2} where T) at /opt/julia/julia-1.4.1/share/julia/stdlib/v1.4/packages/DataFrames/yH0f6/src/dataframe/dataframe.jl:209
  DataFrame(::AbstractArray{T,2} where T,::AbstractArray{Symbol,1}; makeunique) at /opt/julia/julia-1.4.1/share/julia/stdlib/v1.4/packages/DataFrames/yH0f6/src/dataframe/dataframe.jl:209
  DataFrame(::T; copycols) where T at /opt/julia/julia-1.4.1/share/julia/stdlib/v1.4/packages/DataFrames/yH0f6/src/other/tables.jl:23
Stacktrace:
 [1] top-level scope at REPL[10]:1

更新 1: 参考 Dr.Bogumils 解决方案:

julia> DataFrame(P,vec(H))
ERROR: MethodError: no method matching DataFrame(::Array{Any,1})
Closest candidates are:
  DataFrame(::AbstractArray{T,1}; makeunique) at /opt/julia/julia-1.4.1/share/julia/stdlib/v1.4/packages/DataFrames/yH0f6/src/dataframe/dataframe.jl:209
  DataFrame(::T; copycols) where T at /opt/julia/julia-1.4.1/share/julia/stdlib/v1.4/packages/DataFrames/yH0f6/src/other/tables.jl:23
Stacktrace:
 [1] top-level scope at REPL[13]:1

julia> 

请指导我使用 readdlm 对象的标头创建 Datafrome。

更新 2:

我是通过反复试验的方法得到的:

julia> df1=DataFrame(P,Symbol.(vec(H)))
73×2 DataFrame
│ Row │ year │ language                   │
│     │ Any  │ Any                        │
├─────┼──────┼────────────────────────────┤
│ 1   │ 1951 │ Regional Assembly Language │
│ 2   │ 1952 │ Autocode                   │
│ 3   │ 1954 │ IPL                        │
│ 4   │ 1955 │ FLOW-MATIC                 │
│ 5   │ 1957 │ FORTRAN                    │
│ 6   │ 1957 │ COMTRAN                    │
│ 7   │ 1958 │ LISP                       │
│ 8   │ 1958 │ ALGOL 58                   │
│ 9   │ 1959 │ FACT                       │
│ 10  │ 1959 │ COBOL                      │
│ 11  │ 1959 │ RPG                        │
│ 12  │ 1962 │ APL                        │
│ 13  │ 1962 │ Simula                     │
│ 14  │ 1962 │ snobol                     │
│ 15  │ 1963 │ CPL                        │
│ 16  │ 1964 │ Speakeasy                  │
│ 17  │ 1964 │ BASIC                      │
│ 18  │ 1964 │ PL/I                       │
│ 19  │ 1966 │ JOSS                       │
│ 20  │ 1967 │ BCPL                       │
│ 21  │ 1968 │ logo                       │
⋮
│ 52  │ 1995 │ Java                       │
│ 53  │ 1995 │ Delphi                     │
│ 54  │ 1995 │ JavaScript                 │
│ 55  │ 1995 │ PHP                        │
│ 56  │ 1997 │ Rebol                      │
│ 57  │ 2000 │ ActionScript               │
│ 58  │ 2001 │ C#                         │
│ 59  │ 2001 │ D                          │
│ 60  │ 2002 │ Scratch                    │
│ 61  │ 2003 │ Groovy                     │
│ 62  │ 2003 │ Scala                      │
│ 63  │ 2005 │ F#                         │
│ 64  │ 2006 │ PowerShell                 │
│ 65  │ 2007 │ Clojure                    │
│ 66  │ 2009 │ Go                         │
│ 67  │ 2010 │ Rust                       │
│ 68  │ 2011 │ Dart                       │
│ 69  │ 2011 │ Kotlin                     │
│ 70  │ 2011 │ Red                        │
│ 71  │ 2011 │ Elixir                     │
│ 72  │ 2012 │ Julia                      │
│ 73  │ 2014 │ Swift                      │

解决方法

这很难准确回答,但错误只是告诉您不能将两个矩阵传递给 DataFrame 构造函数。

DataFrame 的可能构造函数可以在文档 here 中找到。看起来最接近您想要的东西的可能是

DataFrame(columns::AbstractVecOrMat,names::Union{AbstractVector,Symbol};
          makeunique::Bool=false,copycols::Bool=true)

适应您的用例(我在这里创建了一个随机 P 和一个简单的向量 H,其中列名当然我没有您的数据):

julia> P = Any[rand() for i ∈ 1:3,j ∈ 1:3]
3×3 Matrix{Any}:
 0.0413352  0.41672   0.266163
 0.487072   0.308392  0.810582
 0.470833   0.459017  0.165082

julia> H = string.('a':'c')
3-element Vector{String}:
 "a"
 "b"
 "c"

julia> DataFrame(P,H)
3×3 DataFrame
 Row │ a          b         c        
     │ Any        Any       Any      
─────┼───────────────────────────────
   1 │ 0.0413352  0.41672   0.266163
   2 │ 0.487072   0.308392  0.810582
   3 │ 0.470833   0.459017  0.165082

编辑:我还应该推荐只使用优秀的 CSV 包 - 正如 Bogumil 在评论中指出的那样,您面临的问题是 readdlm 将标题放入矩阵中。使用 CSV,您本可以做到:

using CSV,DataFrames

df = CSV.read("programminglanguages.csv",DataFrame)