有没有一种方便的方法可以在 Raku 中复制 R 的“命名向量”概念,可能使用 Mixins?

问题描述

最近有关 Raku 中 Mixins 的 StackOverflow 问题激起了我的兴趣,即是否可以应用 Mixins 来复制其他编程语言中存在的功能

比如在R语言中,可以给一个向量的元素起一个名字(即属性),这对于数据分析非常方便。有关出色示例,请参阅:"How to Name the Values in Your Vectors in R",作者是 Andrie de Vries 和 Joris Meys,他们使用 R 的内置 islands 数据集说明了此功能。下面是一个更普通的例子(代码在 R-REPL 中运行):

> #R-code
> x <- 1:4
> names(x) <- LETTERS[1:4]
> str(x)
 Named int [1:4] 1 2 3 4
 - attr(*,"names")= chr [1:4] "A" "B" "C" "D"
> x
A B C D 
1 2 3 4 
> x[1]
A 
1 
> sum(x)
[1] 10

下面我尝试使用 de Vries 和 Meys 使用的相同 R 数据集复制 islands 的“命名向量”。虽然下面的脚本运行并且(通常,请参阅下面的 #3)产生所需/预期的输出,但我在底部留下了三个主要问题:

#Raku-script below;

put "Read in data.";

my $islands_A = <11506,5500,16988,2968,16,184,23,280,84,73,25,43,21,82,3745,840,13,30,89,40,33,49,14,42,227,36,29,15,306,44,58,9390,32,6795,183,26,19,12,82>.split(","); #Area

my $islands_N = <<"Africa" "Antarctica" "Asia" "Australia" "Axel Heiberg" "Baffin" "Banks" "Borneo" "Britain" "Celebes" "Celon" "Cuba" "Devon" "Ellesmere" "Europe" "Greenland" "Hainan" "Hispaniola" "Hokkaido" "Honshu" "Iceland" "Ireland" "Java" "Kyushu" "Luzon" "Madagascar" "Melville" "Mindanao" "Moluccas" "New Britain" "New Guinea" "New Zealand (N)" "New Zealand (S)" "Newfoundland" "north America" "Novaya Zemlya" "Prince of Wales" "Sakhalin" "South America" "Southampton" "Spitsbergen" "Sumatra" "Taiwan" "Tasmania" "Tierra del Fuego" "Timor" "Vancouver" "Victoria">>; #Name

"----".say;

put "Count elements (Area): ",$islands_A.elems; #OUTPUT 48
put "Count elements (Name): ",$islands_N.elems; #OUTPUT 48

"----".say;

put "Create 'named vector' array (and output):\n";
my @islands;
my $i=0;
for (1..$islands_A.elems) { 
    @islands[$i] := $islands_A[$i] but $islands_N[$i].Str;
    $i++;
};

say "All islands (returns Area): ",@islands;             #OUTPUT: returns 48 areas (above)
say "All islands (returns Name): ",@islands>>.Str;       #OUTPUT: returns 48 names (above)
say "Islands--slice (returns Area): ",@islands[0..3];       #OUTPUT: (11506 5500 16988 2968)
say "Islands--slice (returns Name): ",@islands[0..3]>>.Str; #OUTPUT: (Africa Antarctica Asia Australia)
say "Islands--first (returns Area): ",@islands[0];          #OUTPUT: 11506
say "Islands--first (returns Name): ",@islands[0]>>.Str;    #OUTPUT: (Africa)

put "Islands--first (returns Name): ",@islands[0];          #OUTPUT: Africa
put "Islands--first (returns Name): ",@islands[0]>>.Str;    #OUTPUT: Africa
  1. 有没有更简单的方法来编写 Mixin 循环 ...$islands_A[$i] but $islands_N[$i].Str;?可以完全消除循环吗?

  2. 是否可以围绕 named-vector 编写 nvecput 包装器,以与 R 相同的方式返回 (name)\n(value),即使对于单个元素也是如此? Raku 的 Pair 方法在这里有用吗?

  3. 与上面的 #2 相关,在单元素 put调用 @islands[0] 返回名称 Africa 而不是区域值 11506。 [请注意,调用 say 不会发生这种情况]。是否有任何简单的代码可以实现以确保 put 始终返回(数字)value 或始终返回(Mixin)name 数组的全长切片?

解决方法

  1. 有没有更简单的方法? 使用 zip 元运算符 Z 结合中缀 but

    my @islands = $islands_A[] Z[but] $islands_N[];
    
  2. 为什么不修改数组来改变格式?

  3. put 根据它获得的值调用 .Strsay 调用 .gist

如果您希望 put 输出某些特定文本,请确保 .Str 方法输出该文本。

不过,我认为您实际上并不想 put 输出该格式。我认为您希望 say 输出该格式。 这是因为 say 是供人类理解的,而您希望它对人类更好。


当你有一个“乐可以做 X”的问题时,答案是不变的,是的,这只是工作量的问题,如果你仍然称它为乐点。

您真正想问的问题是X有多么容易。


我去实施了类似的东西,就像你提供的链接一样。

请注意,这只是我在睡前创建的一个快速实现。因此,请将其视为初稿。

如果我真的要真正地做到这一点,我可能会扔掉它并在花了几天时间学习足够的 R 来弄清楚它实际在做什么之后重新开始。

class NamedVec does Positional does Associative {
  has @.names is List;
  has @.nums is List handles <sum>;
  has %!kv is Map;

  class Partial {
    has $.name;
    has $.num;
  }

  submethod TWEAK {
    %!kv := %!kv.new: @!names Z=> @!nums;
  }

  method from-pairlist ( +@pairs ) {
    my @names;
    my @nums;
    for @pairs -> (:$key,:$value) {
      push @names,$key;
      push @nums,$value;
    }
    self.new: :@names,:@nums
  }

  method from-list ( +@list ){
    my @names;
    my @nums;
    for @list -> (:$name,:$num) {
      push @names,$name;
      push @nums,$num;
    }
    self.new: :@names,:@nums
  }

  method gist () {
    my @widths = @!names».chars Zmax @!nums».chars;
    sub infix:<fmt> ( $str,$width is copy ){
      $width -= $str.chars;
      my $l = $width div 2;
      my $r = $width - $l;
      (' ' x $l) ~ $str ~ (' ' x $r)
    }
    (@!names Zfmt @widths) ~ "\n" ~ (@!nums Zfmt @widths)
  }

  method R-str () {
    chomp qq :to/END/
    Named num [1:@!nums.elems()] @!nums[]
     - attr(*,"names")= chr [1:@!names.elems()] @!names.map(*.raku)
    END
  }

  method of () {}
  method AT-POS ( $i ){
    Partial.new: name => @!names[$i],num => @!nums[$i]
  }
  method AT-KEY ( $name ){
    Partial.new: :$name,num => %!kv{$name}
  }
}

multi sub postcircumfix:<{ }> (NamedVec:D $v,Str:D $name){
  $v.from-list: callsame
}
multi sub postcircumfix:<{ }> (NamedVec:D $v,List \l){
  $v.from-list: callsame
}
 

my $islands_A = <11506,5500,16988,2968,16,184,23,280,84,73,25,43,21,82,3745,840,13,30,89,40,33,49,14,42,227,36,29,15,306,44,58,9390,32,6795,183,26,19,12,82>.split(","); #Area
my $islands_N = <<"Africa" "Antarctica" "Asia" "Australia" "Axel Heiberg" "Baffin" "Banks" "Borneo" "Britain" "Celebes" "Celon" "Cuba" "Devon" "Ellesmere" "Europe" "Greenland" "Hainan" "Hispaniola" "Hokkaido" "Honshu" "Iceland" "Ireland" "Java" "Kyushu" "Luzon" "Madagascar" "Melville" "Mindanao" "Moluccas" "New Britain" "New Guinea" "New Zealand (N)" "New Zealand (S)" "Newfoundland" "North America" "Novaya Zemlya" "Prince of Wales" "Sakhalin" "South America" "Southampton" "Spitsbergen" "Sumatra" "Taiwan" "Tasmania" "Tierra del Fuego" "Timor" "Vancouver" "Victoria">>; 

# either will work
#my $islands = NamedVec.from-pairlist( $islands_N[] Z=> $islands_A[] );
my $islands = NamedVec.new( names => $islands_N,nums => $islands_A );

put $islands.R-str;

say $islands<Asia Africa Antarctica>;

say $islands.sum;
,

命名向量本质上将向量与从名称到整数位置的映射结合起来,并允许您按名称寻址元素。命名向量会改变 vector 的行为,而不是其元素的行为。所以在 Raku 中我们需要为一个数组定义一个角色:

role Named does Associative {
    has $.names;
    has %!index;

    submethod TWEAK {
        my $i = 0;
        %!index = map { $_ => $i++ },$!names.list;
    }

    method AT-KEY($key) {
        with %!index{$key} { return-rw self.AT-POS($_) }
        else { self.default }
    }

    method EXISTS-KEY($key) {
        %!index{$key}:exists;
    }

    method gist() {
        join "\n",$!names.join("\t"),map(*.gist,self).join("\t");
    }
}

multi sub postcircumfix:<[ ]>(Named:D \list,\index,Bool() :$named!) {
    my \slice = list[index];
    $named ?? slice but Named(list.names[index]) !! slice;
}

multi sub postcircumfix:<{ }>(Named:D \list,\names,Bool() :$named!) {
    my \slice = list{names};
    $named ?? slice but Named(names) !! slice;
}

在这个角色中混合可以为您提供 R 命名向量的大部分功能:

my $named = [1,2,3] but Named<first second last>;
say $named;                 # OUTPUT: «first␉second␉last␤1␉2␉3␤»
say $named[0,1]:named;     # OUTPUT: «first␉second␤1␉2␤»
say $named<last> = Inf;     # OUTPUT: «Inf␤»
say $named<end>:exists;     # OUTPUT: «False␤»
say $named<last end>:named; # OUTPUT: «last␉end␤Inf␉(Any)␤»

由于这只是一个概念证明,Named 角色不能很好地处理不存在元素的命名。它也不支持修改一部分名称。它可能确实支持创建一个可以混合到多个列表中的pun

请注意,此实现依赖于 subscript operators 是多个未记录的事实。如果您想将角色和运算符放在单独的文件中,您可能希望将 is export 特征应用于运算符。

,

这可能不是最佳的执行方式(或您正在寻找的),但是当我看到这个特定问题的陈述时,首先想到的是 Raku's allomorphs,即是具有两个相关值的类型,可根据上下文单独访问。

my $areas = (11506,82);
my $names = <"Africa" "Antarctica" "Asia" "Australia" "Axel Heiberg" "Baffin" "Banks" "Borneo" "Britain" "Celebes" "Celon" "Cuba" "Devon" "Ellesmere" "Europe" "Greenland" "Hainan" "Hispaniola" "Hokkaido" "Honshu" "Iceland" "Ireland" "Java" "Kyushu" "Luzon" "Madagascar" "Melville" "Mindanao" "Moluccas" "New Britain" "New Guinea" "New Zealand (N)" "New Zealand (S)" "Newfoundland" "North America" "Novaya Zemlya" "Prince of Wales" "Sakhalin" "South America" "Southampton" "Spitsbergen" "Sumatra" "Taiwan" "Tasmania" "Tierra del Fuego" "Timor" "Vancouver" "Victoria">;

my @islands;

for (0..^$areas) -> \i {
    @islands[i] := IntStr.new($areas[i],$names[i]);   
}

say "Areas: ",@islands>>.Int;
say "Names: ",@islands>>.Str;
say "Areas slice: ",(@islands>>.Int)[0..3];
say "Names slice: ",(@islands>>.Str)[0..3];
say "Areas first: ",(@islands>>.Int)[0];
say "Names first: ",(@islands>>.Str)[0];
,

我想我会做这样的事情:

class MyRow {
    has Str      $.island is rw;
    has Numeric  $.area   is rw;

    method Str {
        $!island;
        }

    method Numeric { 
        +$!area;
        }

    # does Cool coercion of strings that look numeric
    submethod BUILD ( Numeric(Cool) :$!area,:$!island ) {
    }; 
} 

class MyTable {
    has        @.data;                     
    has MyRow  @.rows  is rw;
    has        %!lookup;

    submethod TWEAK {
        @!rows = gather 
        for @!data -> ( $island,$area ) {
            my $row = MyRow.new( :$island,:$area );
            %!lookup{ $island } = $row;
            take $row;
        }
    }

    method find_island( $island ) {
        return %!lookup{ $island };
    }
}

要设置一张桌子:

my @raw = @island_names Z @island_areas;
my $table = MyTable.new( data => @raw );

按名称访问表的行:

my $row = $table.find_island('Africa');
say $row;   # MyRow.new(island => "Africa",area => 11506)

像字符串一样使用 row 元素可以获取名称, 像数字一样使用它可以为您提供区域:

say ~$row;  # Africa
say +$row;  # 11506

这里的功能之一是您可以将更多字段添加到您的 行,您不仅限于一个值和一个名称。

“find_island”方法使用内部 %lookup 哈希来索引 按岛屿名称的行,但与简单的哈希解决方案不同 没有唯一性约束:如果你有一个重复的岛屿 name,"find_island" 将定位集合中的最新行,但 另一行仍然存在。

警告:我没有想太多这支持的程度 动态地向表中添加更多行。