Code is just data man.


Lisp more precisely, Common Lisp is often described as a data-oriented programming language. What does “data-oriented” mean here? In simple terms, it is a jargon that describes a programming model where the main task of a program is to transform data from one form into another. At least, that is how I understand it.

What I mean by “data” in this article is the program’s own code data. In my opinion, one of the programming languages that truly deserves the title data-oriented is Lisp, because in this language we can freely transform programs themselves.

NOTE: What I mean here is not Lisp macros, but transforming data at runtime.

Because Lisp programs are built from S-expressions, I feel that Lisp does not really have “syntax” in the traditional sense. Lisp programmers usually solve problems by creating DSLs (Domain-Specific Languages).

In Lisp, when we type (+ 1 1), the Lisp eval function evaluates it into 2. However, before we pass it to eval, we can transform it into another form. For example, we can transform addition into multiplication:

(let ((data '(+ 1 1)))
  (setf (car data) '*)
  data)
;; ==> (* 1 1)

In the Lisp world, more precisely, Common Lisp anything can be modified while the program is running. In this example, we modify the first element of the list into the symbol *.

The example above is probably not very useful, so let’s create something more practical. In almost every online forum, there is always someone who says that Lisp has too many parentheses. What if we removed them? Coincidentally, I also wanted to try implementing the Shunting Yard parsing algorithm.

Preparation

How do we build this? What I did was create a transform function that accepts variable arguments, because I want to be able to write something like this:

(transform 70 '* 6 '+ 10 '- 10)

The program above will be transformed into:

(- (+ (* 70 6) 10) 10)

After that, we can simply call eval.

In a function that uses &rest exprs, exprs is a list. Therefore, we first need to transform it into a string, concatenate everything, and then perform lexing and parsing.

(defun transform (&rest exprs)
  "transform exprs into lisp exprs then it can pass into eval,
   each expr should be symbol"
  (flet ((conc-str (a b) (concatenate 'string a b)))
    (let* ((input  (reduce #'conc-str (mapcar #'write-to-string exprs)))
           (tokens (lexer input))
           (parsed (parser tokens)))
      parsed)))

Lexer

The lexer is the step where we transform a string into tokens so that they can be easily recognized by the parser.

In this lexer, there are effectively two passes. The first loop identifies tokens in the form (cons TYPE VALUE). Because NUMBER tokens are built in reverse order during this process, I perform a second pass to fix their order so that the NUMBER values end up in the correct form.

(defun lexer (string)
  "Transform string to sequence of tokens,
   (lexer \"100+100\")
   ==> ((NUMBER . 100) (OP . ADD) (NUMBER . 100))"
  (let ((tokens '()))
    (dolist (ch (coerce string 'list))
      (if (digit-char-p ch 10)
          (if (and (consp (car tokens)) (eq (caar tokens) 'NUMBER))
              (push ch (cdar tokens))
              (push (cons 'NUMBER (list ch)) tokens))
          (ecase ch
            (#\+ (push (cons 'OP 'ADD) tokens))
            (#\- (push (cons 'OP 'SUB) tokens))
            (#\* (push (cons 'OP 'MUL) tokens))
            (#\( (push (cons 'OP 'OPR) tokens))
            (#\) (push (cons 'OP 'CPR) tokens)))))
    (mapcar (lambda (tok)
              (when (and (consp tok) (eq (car tok) 'NUMBER))
                (setf (cdr tok)
                      (parse-integer
                       (coerce (nreverse (cdr tok)) 'string))))
              tok)
            (nreverse tokens))))

Parsing

As mentioned earlier, I use the Shunting Yard algorithm. This algorithm was invented by Edsger Dijkstra. He named it that way because the algorithm works similarly to rearranging train cars in a railway yard.

The algorithm uses two stacks: an operator stack and an output stack. We shift input tokens between these two stacks. You can easily find visual explanations on YouTube, but the core idea is simply moving operators and numbers between the two stacks according to precedence rules.

Because Shunting Yard produces RPN (Reverse Polish Notation), also known as postfix notation, we just need to redirect the “train tracks” into a Lisp-style prefix expression. Below is the code I wrote:

(defun precedence (op)
  (ecase op
    (MUL 3)
    (ADD 2)
    (SUB 1)))

(defun symbol-op (op)
  (ecase op
    (ADD '+)
    (SUB '-)
    (MUL '*)))

(defun apply-op (ops out)
  (let ((op  (car ops))
        (rhs (car out))
        (lhs (cadr out)))
    (setf ops (cdr ops))
    (values ops
            (cons (list (symbol-op op) lhs rhs)
                  (cddr out)))))

(defun parser (tokens)
  "Make Lisp AST from sequence of tokens, with Shunting yard
   ((NUMBER . 100) (OP . ADD) (NUMBER . 100))
   ==> (+ 100 100)"
  (let ((ops '())
        (out '()))
    (dolist (tok tokens)
      (cond
        ((eq (car tok) 'NUMBER) (push (cdr tok) out))
        ((member (cdr tok) '(ADD SUB MUL))
         (do () ((or (null ops)
                     (not (member (car ops) '(ADD SUB MUL)))
                     (< (precedence (car ops)) (precedence (cdr tok))))
                 (push (cdr tok) ops))
           (multiple-value-setq (ops out)
             (apply-op ops out))))
        (t (case (cdr tok)
             (OPR (push tok ops))
             (CPR (do () ((eq (car ops) 'OPR) (setf ops (cdr ops)))
                    (multiple-value-setq (ops out)
                      (apply-op ops out))))))))
    (do () ((null ops) (car out))
      (multiple-value-setq (ops out)
        (apply-op ops out)))))

Eval

Because we already have a lexer and a parser, the next step is evaluation.

Since we are transforming infix syntax into Lisp’s prefix form, we can directly use Common Lisp’s eval function.

CL-USER> (transform 70 '* 6 '+ 10 '- 10)
(- (+ (* 70 6) 10) 10)
CL-USER> (eval (transform 70 '* 6 '+ 10 '- 10))
420 (9 bits, #x1A4)
CL-USER> 

That is what I have in mind when people use the phrase “code is data.” I hope this article is useful to the reader, and thank you for reading.

Bonus Header

As an exercise, try optimizing this expression:

(transform 10 '* 70 '* 6 '+ 10 '* 4 '- 10)
;; ==> (- (+ (* (* 10 70) 6) (* 10 4)) 10)

Make it into:

(transform 10 '* 70 '* 6 '+ 10 '* 4 '- 10)
;; ==> (- (+ (* 10 70 6) (* 10 4)) 10)