在Clojure中过滤两个文本文件之间的匹配字符串

问题描述

文本文件一个带有不同前缀的路径列表。

假设 before.txt 看起来像这样:

before/pictures/img1.jpeg
before/pictures/img2.jpeg
before/pictures/img3.jpeg

和 after.txt 看起来像这样:

after/pictures/img1.jpeg
after/pictures/img3.jpeg

deleted-files 函数应该去掉不同的前缀(before、after),比较两个文件,打印 after.txt 的缺失列表。

到目前为止的代码

(ns dirdiff.core
(:gen-class))

(defn deleted-files [prefix-file1 prefix-file2 file1 file2]
    (let [before (slurp "resources/davor.txt")
    (let [after (slurp "resources/danach.txt")
)

预期输出:被删除的那个

/pictures/img2.jpeg

如何过滤 clojure.clj 中的列表以仅显示丢失的列表?

解决方法

以下是我将如何处理它,从 this template project 开始:

(ns tst.demo.core
  (:use tupelo.core tupelo.test)
  (:require
    [clojure.set :as set]
    [tupelo.string :as str]
    ))

(defn file-dump->names
  [file-dump-str prefix ]
  (it-> file-dump-str
    (str/whitespace-collapse it)
    (str/split it #" ")
    (mapv #(str/replace % prefix "") it)))

(defn delta-files
  [before-files-in after-files-in
   before-prefix after-prefix]
  (let-spy [before-files     (file-dump->names before-files-in before-prefix)
            after-files      (file-dump->names after-files-in after-prefix)
            before-files-set (set before-files)
            after-files-set  (set after-files)
            delta-sorted     (vec (sort (set/difference before-files-set after-files-set)))]
    delta-sorted))

和一个单元测试来展示它的实际效果:

(dotest
  (let [before-files  "before/pictures/img1.jpeg
                       before/pictures/img2.jpeg
                       before/pictures/img3.jpeg "

        after-files   "after/pictures/img1.jpeg
                       after/pictures/img3.jpeg "
        before-prefix "before"
        after-prefix  "after"]
    (is= (delta-files before-files after-files before-prefix after-prefix)
      ["/pictures/img2.jpeg"])
    ))

一定要学习 these documentation sources,包括Getting ClojureClojure CheatSheet 等书籍。


注意事项:

我喜欢用 let-spylet-spy-pretty 来说明代码的进展。它产生如下输出:

-------------------------------
   Clojure 1.10.2    Java 15
-------------------------------

Testing tst.demo.core
before-files => ["/pictures/img1.jpeg" "/pictures/img2.jpeg" "/pictures/img3.jpeg"]
after-files => ["/pictures/img1.jpeg" "/pictures/img3.jpeg"]
before-files-set => #{"/pictures/img3.jpeg" "/pictures/img2.jpeg" "/pictures/img1.jpeg"}
after-files-set => #{"/pictures/img3.jpeg" "/pictures/img1.jpeg"}
delta-sorted => ["/pictures/img2.jpeg"]

Ran 2 tests containing 1 assertions.
0 failures,0 errors.

spyx 宏对于调试也非常有用。请参阅 the READMEthe API docs

,

您可能希望在删除前缀后计算两组文件名之间的设置差异

(defn deprefixing [prefix]
  (comp (filter #(clojure.string/starts-with? % prefix))
        (map #(subs % (count prefix)))))

(defn load-string-set [xf filename]
  (->> filename
       slurp
       clojure.string/split-lines
       (into #{} xf)))

(defn deleted-files [prefix-file1 prefix-file2 file1 file2]
  (clojure.set/difference (load-string-set (deprefixing prefix-file1) file1)
                          (load-string-set (deprefixing prefix-file2) file2)))

(deleted-files "before" "after"
               "/tmp/before.txt" "/tmp/after.txt")
;; => #{"/pictures/img2.jpeg"}