所以,我从这开始:
http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Ruby
这适用于非常小的字符串.但是,我的字符串长度超过10,000个字符 – 由于Levenshtein距离是递归的,这会导致我的Ruby on Rails应用程序中出现堆栈太深的错误.
解决方法
考虑非递归版本以避免过多的调用堆栈开销.
Seth Schroeder有一个
iterative implementation in Ruby,它使用多维数组;它似乎与Levenshtein距离的动态规划方法有关(如
pseudocode for the Wikipedia article所述).赛斯的ruby代码转载如下:
def levenshtein(s1,s2) d = {} (0..s1.size).each do |row| d[[row,0]] = row end (0..s2.size).each do |col| d[[0,col]] = col end (1..s1.size).each do |i| (1..s2.size).each do |j| cost = 0 if (s1[i-1] != s2[j-1]) cost = 1 end d[[i,j]] = [d[[i - 1,j]] + 1,d[[i,j - 1]] + 1,d[[i - 1,j - 1]] + cost ].min next unless @@damerau if (i > 1 and j > 1 and s1[i-1] == s2[j-2] and s1[i-2] == s2[j-1]) d[[i,j]] = [d[[i,j]],d[[i-2,j-2]] + cost ].min end end end return d[[s1.size,s2.size]] end