The long way through Software Craftsmanship

Validating CSV data in clojure

Dec 15, 2015 - 2 minute read - Comments - clojurehofcsvvalidationyagniidea-for-another-postreplrepl-based-developmentselectorlambda

At a client, we have CSVs of data that can be simplified to this 1:

(def data [
           ["total" 6 8 13]
           ["0"     1 2 3]
           ["0"     2 0 4]
           ["0"     3 0 6]
          ])

In this case, some of the row named total is the sum of the rest of the rows, but only for some columns (second and fourth). We do not want to get rid of the columns, as they need to be printed at the end.

This is what we need to validate:

sum (rest [1]) = total [1]

sum (rest [3]) = total [3]

this could be written as a one-off program but a better alternative for us was to write a program and let users decide what columns to validate. In the future, power users will write their own validations 2, thus creating an environment where users are no longer dependent on programmers as coupling business users to programmers does not scale.

Implementation

The full source code is here

(defn sum-eq-total [selector dataset]
  (letfn [(total-and-rest [coll]
             [(first coll) (rest coll)])]
  (let [[total rest-dataset] (total-and-rest dataset)
        selected-total (selector total)
        selected-column (map selector rest-dataset)
        sum-of-column (reduce + selected-column)]
        (= selected-total sum-of-column))))

(defn validate-columns [indices data]
  (let [generate-selector #(fn [dataset] (nth dataset %))
         selectors (map generate-selector indices)
         check-selector #(sum-eq-total % data)]
         (map check-selector selectors)))

We define the domain concept of selector for pointing to a dataset column

generate-selector #(fn [dataset] (nth dataset %))

This expression creates selectors based on a given index. It is a lambda that returns a function, thus being a HOF

Users can use the application in this fashion:

simple.core=> (validate-columns [1 3] data)
(true true)

The full source code is here

Conclusion

Working on the REPL for this problem has been a very good idea, but working in a spreadsheet software has been even better. Even faster feedback cycle 3.

The inclusion of the HOF for validating the data has paid its cost (good RoI)

Do not limit what your users can do, let them decide but do not complicate things unnecessarily. Know when to stop adding features and limit work to prevent YAGNI.


  1. for more information and a spike on reading CSV data in clojure, this spike may be useful ↩︎

  2. for the time being, there are no power users and no need to enable these custom validators. Doing it now would be YAGNI ↩︎

  3. in the case here, the data and business rules are so simple that there is no need for this software. ↩︎

Disclaimer about AI/GenAI

As of 2026-05-06, the text in these articles and blog entries has been written without AI/GenAI, except I sometimes use a spellchecker to fix errors. Think Word's spellchecker, not ChatGPT.

Notes, as of today (2026-05-06):

  • No code snippet has been automatically generated, nor vibe-coded, nor generated and reviewed.
  • I don’t have any article with AI contribution.

For future entries:

  • I may have used GenAI for the code in the repo. The code I exemplify/copy in the article will always be reviewed and tested, not vibe-coded. I will specify it in each snippet or at the top/bottom of the article.
  • I normally don't use it for the text contents, although if I have used it for the article text, it would be indicated as such.

Any entry before 2026-05-06 does not contain any AI/GenAI.

For more information, read the AI/GenAI Policy

Clojure and the macro and Books read in 2015Q4