Julia 编码函数没有将 github 格式的 base64 数据转换为 UTF-8,怎么办?

问题描述

我尝试获取 GitHub 上的数据表单文件并阅读它们。

using HTTP
using JSON
using GitHub
using StringEncodings

# authenticate with GitHub to increase query limits
mytoken = ENV["GITHUB_AUTH"]
myauth = GitHub.authenticate(mytoken)

GitHub.repo("JuliaRegistries/General")
data_rootdir = GitHub.directory("JuliaRegistries/General","/")
data_rootdir = data_rootdir[1]
data_dir = GitHub.directory("JuliaRegistries/General",data_rootdir[3].path)
data_dir = data_dir[1]
manifest_file_data = GitHub.file("JuliaRegistries/General",joinpath(data_dir[3].path,"Deps.toml"))
data_Deps_file = decode(manifest_file_data.content,manifest_file_data.encoding)

在 REPL 中运行此代码后,我得到:

julia> manifest_file_data = GitHub.file("JuliaRegistries/General",joinpath(data_rootdir[3].path,"Deps.toml"))
Content (all fields are Union{nothing,T}):
  typ: "file"
  name: "Deps.toml"
  path: "A/ACTRModels/Deps.toml"
  encoding: "base64"
  content: "WzBdCkRpc3RyaWJ1dGVkID0gIjhiYTg5ZTIwLTI4NWMtNWI2Zi05MzU3LTk0\nNzAwNTIwZWUxYiIKRGlzdHJpYnV0aW9ucyA9ICIzMWMyNGUxMC1hMTgxLTU0\nNzMtYjhlYi03OTY5YWNkMDM4MmYiClBhcmFtZXRlcnMgPSAiZDk2ZTgxOWUt\nZmM2Ni01NjYyLTk3MjgtODRjOWM3NTkyYjBhIgpQa2cgPSAiNDRjZmU5NWEt\nMWViMi01MmVhLWI2NzItZTJhZmRmNjliNzhmIgpSYW5kb20gPSAiOWEzZjgy\nODQtYTJjOS01ZjAyLTlhMTetoDQ1OTgwYTFmZDVjIgpSZWV4cG9ydCA9ICIx\nODlhMzg2Ny0zMDUwLTUyZGEtYTgzNi1lNjMwYmE5MGFiNjkiClN0YXRzQmFz\nZSA9ICIyOTEzYmJkMi1hZThhLTVmNzetoGM5OS00ZmI2Yzc2ZjNhOTEiClN0\nYXRzRnVucyA9ICI0YzYzZDJiOS00MzU2LTU0ZGItOGNjYS0xN2I2NGMzOWU0\nMmMiClRlc3QgPSAiOGRmZWQ2MTQtZTIyYy01ZTA4LTg1ZTEtNjVjNTIzNGYw\nYjQwIgoKWyIwLjItMCJdClNhZmVUZXN0c2V0cyA9ICIxYmM4M2RhNC0zYjhk\nLTUxNmYtYWNhNC00ZmUwMmY2ZDgzOGYiCgpbIjAuNC43LTAiXQpTZXF1ZW50\naWFsU2FtcGxpbmdNb2RlbHMgPSaimGU3MWEyYTYtMmIzMC00NDQ3LTg3NDIt\nZDA4M2E4NWU4MmQxIgoKWyIwLjQuOC0wIl0KRGF0YUZyYW1lcyA9ICJhOTNj\nNmYwMC1lNTdkLTU2ODQtYjdiNi1kODE5M2YzZTQ2YzAiCg==\n"
  sha: "0164a1bffc3e4284b12af434f1faea56db22d875"
  url: HTTP.URI("https://api.github.com/repos/JuliaRegistries/General/contents/A/ACTRModels/Deps.toml?ref=master")
  git_url: HTTP.URI("https://api.github.com/repos/JuliaRegistries/General/git/blobs/0164a1bffc3e4284b12af434f1faea56db22d875")
  html_url: HTTP.URI("https://github.com/JuliaRegistries/General/blob/master/A/ACTRModels/Deps.toml")
  download_url: HTTP.URI("https://raw.githubusercontent.com/JuliaRegistries/General/master/A/ACTRModels/Deps.toml")
  size: 664

然后我尝试转换数据我得到错误

 julia> data_Deps_file = decode(manifest_file_data.content,manifest_file_data.encoding)
ERROR: MethodError: no method matching decode(::String,::String)
Closest candidates are:
  decode(::Array{UInt8,1},::AbstractString) at /Users/vserge/.julia/packages/StringEncodings/B9gIH/src/StringEncodings.jl:524
  decode(::Array{UInt8,::Union{AbstractString,Encoding}) at /Users/vserge/.julia/packages/StringEncodings/B9gIH/src/StringEncodings.jl:525
Stacktrace:
 [1] top-level scope at none:1

你能帮我理解如何正确地从 GitHub 获取数据吗。

解决方法

Base64 标准库提供了处理 base64 编码字符串所需的一切。特别是 base64decode 函数:

julia> content = "WzBdCkRpc3RyaWJ1dGVkID0gIjhiYTg5ZTIwLTI4NWMtNWI2Zi05MzU3LTk0\nNzAwNTIwZWUxYiIKRGlzdHJpYnV0aW9ucyA9ICIzMWMyNGUxMC1hMTgxLTU0\nNzMtYjhlYi03OTY5YWNkMDM4MmYiClBhcmFtZXRlcnMgPSAiZDk2ZTgxOWUt\nZmM2Ni01NjYyLTk3MjgtODRjOWM3NTkyYjBhIgpQa2cgPSAiNDRjZmU5NWEt\nMWViMi01MmVhLWI2NzItZTJhZmRmNjliNzhmIgpSYW5kb20gPSAiOWEzZjgy\nODQtYTJjOS01ZjAyLTlhMTEtODQ1OTgwYTFmZDVjIgpSZWV4cG9ydCA9ICIx\nODlhMzg2Ny0zMDUwLTUyZGEtYTgzNi1lNjMwYmE5MGFiNjkiClN0YXRzQmFz\nZSA9ICIyOTEzYmJkMi1hZThhLTVmNzEtOGM5OS00ZmI2Yzc2ZjNhOTEiClN0\nYXRzRnVucyA9ICI0YzYzZDJiOS00MzU2LTU0ZGItOGNjYS0xN2I2NGMzOWU0\nMmMiClRlc3QgPSAiOGRmZWQ2MTQtZTIyYy01ZTA4LTg1ZTEtNjVjNTIzNGYw\nYjQwIgoKWyIwLjItMCJdClNhZmVUZXN0c2V0cyA9ICIxYmM4M2RhNC0zYjhk\nLTUxNmYtYWNhNC00ZmUwMmY2ZDgzOGYiCgpbIjAuNC43LTAiXQpTZXF1ZW50\naWFsU2FtcGxpbmdNb2RlbHMgPSAiMGU3MWEyYTYtMmIzMC00NDQ3LTg3NDIt\nZDA4M2E4NWU4MmQxIgoKWyIwLjQuOC0wIl0KRGF0YUZyYW1lcyA9ICJhOTNj\nNmYwMC1lNTdkLTU2ODQtYjdiNi1kODE5M2YzZTQ2YzAiCg==\n";

julia> String(base64decode(content)) |> println
[0]
Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
Parameters = "d96e819e-fc66-5662-9728-84c9c7592b0a"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
StatsFuns = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

["0.2-0"]
SafeTestsets = "1bc83da4-3b8d-516f-aca4-4fe02f6d838f"

["0.4.7-0"]
SequentialSamplingModels = "0e71a2a6-2b30-4447-8742-d083a85e82d1"

["0.4.8-0"]
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"