28th
This post is available as a PDF
Many ideas introduced originally in Lisp, such as conditional execution (the if-then-else construct), garbage collection, and first-class functions1, can be found in modern non-Lisp programming languages.
Other ideas pioneered by Lisp, like the idea that programs can be composed solely of expressions, and homoiconicity, or the idea that a program can be written in its own data structures, are rarer.
Learning a Lisp dialect, whether it’s Clojure, Scheme, or Common Lisp, is a great way to see for yourself how powerful all of these ideas can be when combined.
An even better way to understand the ideas is to implement your own Lisp23.
After we explore some fundamental concepts, we’ll look at how a language like Javascript works.
Then, we’ll build our own small Lisp to Javascript compiler on top of Javascript in about 32 lines of Javascript code.
Programs in languages like Javascript are composed of two things:
Lisp programs are composed only of expressions.
Both statements and expressions are, in some sense, instructions for the computer to do some piece of work. The difference between them is that a statement does not return a value, but an expression does.
Consider Javascript’s if-then-else statement:
if (x > 10) {
alert("x was bigger than 10!");
} else {
alert("x was not bigger than 10!")
}
We call alert inside both the then and else bodies of the expression because if does not return a value. If it did, we could pass it directly to a single alert call, as this wishful pseudo-Javascript demonstrates:
alert(if (x > 10) {
return "x was bigger than 10!"
} else {
return "x was not bigger than 10!"
});
If we eliminate statements from our wishful pseudo-Javascript entirely, there’s no longer a need for the return operator. It’s assumed that values “return” themselves:
alert(if (x > 10) {
"x was bigger than 10!"
} else {
"x was not bigger than 10!"
});
Javascript actually provides an if-then-else construct that returns a value, the ternary operator4, but it is seldom used and often discouraged5.
With expressions, it’s possible to write code in the imperative style the first example demonstrated. The opposite, writing expressions in terms of statements, is not.
If we agree that “power” means “possibility”, then languages like Lisp, which are based on expressions instead of statements, might be considered powerful.
Homoiconicity is a property of a programming language’s syntax, and means that programs in a language are expressed using that language’s own data structures. The property of homoiconicity is key to the way Lisp empowers the programmer to fabricate and manipulate syntax.
Homoiconicity might be understood by exploring the properties of source code and interpretation of that source code in a non-homoiconic language first.
Javascript, while influenced by Lisp in other ways6, is not homoiconic, and its popularity, availability, and conventional syntax make it a candidate for exploration.
Consider this fragment of Javascript code, which passes the sum of 1 and the product of 2 and 3 to a web browser’s alert function:
alert(1 + 2 * 3);
When a browser passes this code, or data, as a string to its Javascript interpreter, something like the following happens:
alert(+*);alert followed by (, it knows to expect zero or more tokens that must be succeeded by a closing ).2 * 3 happens first, because, * has higher precedence than + and because the interpreter knows it needs a value or identifier to pass to alert.1 + 6 is evaluated, returning 7.alert(7) is evaluated, and a message is displayed.If you were to write your own Javascript interpreter or compiler, you would need to implement a lexical analyzer, a parser, and an evaluator in order to complete the above steps.
Even if you were to implement your interpreter in Javascript, you’d have a long way to go, because none of these functions are provided by Javascript interpreters to Javascript programmers.
Javascript provides the eval function, but it accepts only strings, and you need to create source code strings yourself. Creating source code strings to pass to eval is effectively compilation, and done correctly could be as difficult as writing a compiler.
The task is not as daunting if we take the Lisp approach and build our programs in a data format richer than strings. Let’s meet the problem half way, and invent a Javascript syntax based on Javascript data.
Consider this alternative representation of the code fragment, in a language we just invented, JSONScript:
['alert', ['+', 1, ['*', 2, 3]]]
Unlike the earlier example, which was a string that required multiple phases of analysis to understand and run, the above code is written using data structures native to Javascript: string, number, and array.
JSON, which stands for “Javascript Object Notation”, is a notation for expressing data in terms of Javascript data structures. JSON is expressive enough for us to use it to represent code for a simple programming language.
We could easily write a function called compile_jsonjs that converted this representation into a string that could be passed to eval, because:
compile_jsonjs doesn’t need to know about operator precedence because we’re using s-expressions.S-expressions, short for “symbolic expressions”, are a convention for writing code that allows us to represent a function call with an array. This convention is also known as prefix notation. The first element of the array is the function, and subsequent elements are arguments to that function that are evaluated first.
S-expressions leave no ambiguity about the order in which functions must be applied. In the original example, the expression 1 + 2 * 3 evaluated to 7 because it’s in the Javascript specificaton that * must evaluate before +.
In JSONScript, we express our intent to multiply first by making ['*', 2, 3] an argument to +.
S-expressions don’t have to be arrays. 42 and "hamster dance" are also s-expressions, because they also evaluate, but to themselves.
Lisp uses lists instead of arrays as s-expressions, but the idea is the same.
There are at least two implementation details we face in implementing compile_jsonjs:
alert or *, must only appear as a string and as the first element of an array.
compile_jsonjs needs to recognize functions that Javascript considers to be operators, and return the correct syntax for them.'alert'(7).
compile_jsonjs must return function names as symbols.Conflating symbol and string is necessary but has unfortunate implications. For instance, it will be impossible to pass a function name as an argument to a function in JSONScript.
Adding the ability to pass function names as arguments equires us to go beyond JSON as our source data format, because JSON permits symbol literals only as an object field names.
We’re now equipped to reason about how compile_jsonjs will work.
* or +
compile_jsonjs(['+', 1, 2]) should return "1+2".alert, we return a function call. compile_jsonjs(['alert', 'hello']) should return "alert('hello')"compile_jsonjs(42) should return "42".Next, we can translate our plan into code. We know that compile_jsonjs will need helper functions for:
compile_jsonjs whether we’re compiling a function call or a literal. Literals are data that evaluate to themselves, like numbers or strings.The first helper function we need, isArray(), isn’t available on some browsers, and is actually difficult to write in a widely compatible way.
So, we can borrow it from the underscore.js8 library:
var isArray = function (obj) {
return Object.prototype.toString.call(obj) === '[object Array]';
}
Next, we need a function for testing if an object is in an array. Such a function is also not available in some browsers, but we can implement it like so:
var inArray = function(array, item) {
for(var i = 0; i < array.length; i++)
if (array[i] === item) return true;
return false;
}
With these functions in place, we can finally write our compiler:
var compile_jsonjs = function(expression) {
var operators = ['*', '+'];
if (isArray(expression)) {
var func_name = expression[0];
var func_args = expression.slice(1);
var compiled_args = [];
for (var i = 0; i < func_args.length; i++)
compiled_args.push(compile_jsonjs(func_args[i]));
if(inArray(operators, func_name)) {
return compiled_args.join(func_name);
} else {
return func_name + "(" + compiled_args.join(',') + ")";
}
} else {
return expression.toString()
}
}
The full source listing, which you can copy and paste into a web browser’s Javascript console, follows:
var isArray = function (obj) {
return Object.prototype.toString.call(obj) === '[object Array]';
}
var inArray = function(array, item) {
for(var i = 0; i < array.length; i++)
if (array[i] === item) return true;
return false;
}
var compile_jsonjs = function(expression) {
var operators = ['*', '+'];
if (isArray(expression)) {
var func_name = expression[0];
var func_args = expression.slice(1);
var compiled_args = [];
for (var i = 0; i < func_args.length; i++)
compiled_args.push(compile_jsonjs(func_args[i]));
if(inArray(operators, func_name)) {
return compiled_args.join(func_name);
} else {
return func_name + "(" + compiled_args.join(',') + ")";
}
} else {
return expression.toString()
}
}
We can test our compiler by seeing what code it generates with our JSONScript fragment:
compile_jsonjs(['alert', ['+', 1, ['*', 2, 3]]]);
And we can evaluate our compiled code by passing that result to eval:
eval(compile_jsonjs(['alert', ['+', 1, ['*', 2, 3]]]));
JSONScript probably isn’t a language you would want to use for your next programming project, but hopefully you now have a better understanding of what Lisp is and why it is different.
Further extending JSONScript will help you understand Lisp even more.
You can begin by adding support for other Javascript operators.
Next, you might implement lambda and quote. These will allow you to define functions in JSONScript, and to pass arrays to functions without evaluating them.
Once you have lambda and quote, you can implement other special forms, using Paul Graham’s The Roots of Lisp as a guide9.
If you’re feeling particularly ambitious, you might consider adding support for one of Lisp’s most famous and unique features - macros, or functions that:
What Made Lisp Different by Paul Graham ↩
The Roots of Lisp by Paul Graham ↩
Popularity by Brendan Eich ↩
The Roots of Lisp by Paul Graham ↩
This rant is brought to you by my reaction to Daniel Lemire’s post, “You probably misunderstand XML”
In school I took a class called “Data Interchange” or something, that taught primarily XML and the ecosystem around it. We learned about XSD, RELAX NG, XSLT, XPath, JAXB, SOAP, and lots of other things I thought to be loads of horrendous bullshit at the time. I hated that class.
I made a special effort to go, though. I was on a personal mission to point out every counterexample to, failing of, and argument against XML that I could. ”It’s the illegitimate half-brother of the holiest representation of all time, the S-expression, and it’s slower than shit to boot!” I cried.
I’ve learned a lot since then. The biggest thing I’ve learned is that it’s easy to dismiss any technology with cultural baggage on religious terms. It’s also easy to conflate your hatred of some technology, like SOAP, with some other technology, like XML, that is only loosely or incidentally related. Without assessing pragmatically to yourself whether SOAP sucks because it uses XML, or SOAP just plain sucks, you have not adequately justified your hatred.
The more you work without asking yourself these questions, the more likely you are to commit the cardinal sin of anyone in the computing profession who gives a shit about what they do - use the wrong tool for the job.
What’s much harder, and much more useful, is to take a pragmatic approach to determining a technology’s deficiencies and capabilities. I still don’t do this enough. This was one of the themes of Rich Hickey’s “Hammock Time” talk at the first Clojure Conj.
But it’s not what led me to later realize the legitimate use cases for XML. I digress.
What I later realized about XML as I programmed more is that the intrinsic structure, semi-structure, or non-structure of the data you are working with is only part of the question, and only leads you to a conclusion - well, opinion, mostly - on how to best represent it. The more complicated the domain, the more complicated your opinion, and the hairier the ultimate representation.
As you bang out your implementation, the less intuitive but really insidious part of the question lurks: “Will people in the future working with this thing I created be able to use it and modify it for their own needs? Will they know my intent well enough to improve upon it, or determine if some variant of it is “valid” in the sense that it is true to my original opinion, on which other consumers of this thing may depend?”
For domains that are simple in the ontological sense, it seems really dumb to use XML, because that question doesn’t seem worth addressing.
That’s because you’re confident other people could sit down, read a few your examples, and be comfortable adapting and extending your stuff. JSON and YAML are great for applications that fit this scenario, especially when you don’t have to worry about transport to or from places outside of your control. Like on a web app, when you’re doing Ajax stuff with your own backend.
Sometimes though, you need to transfer data between applications - applications that are coded by different people, who might have different perspectives of the underlying domain that the data being transported is derived from. And that’s when you need to look into XML, because of all the formats out there, it has the most tooling and documentation around writing schemas. XML schemas let you express your understanding of the domain, and how you chose to interpret pieces of it, declaratively, in an almost human readable format. More readable than BNF in my opinion, anyway.
It’s easy to know when you’re doing it wrong. Is your code littered with assertions about the structure of data coming from some outside source? Are there failure cases for your application that can be triggered by malformed data sneaking in? Do you have tons of unit tests around this?
If you do, you might be comfortable. You might feel good. But what about the other guys? If you have multiple developers working multiple code bases, all of which depend on some common interchange format, then the data on Bob’s box running application Q actually depends on the unit tests running on Sally’s box running application Z. You can commit another cardinal sin and duplicate the tests, version and integration-test all of these apps together, or you can suck it up and investigate a transfer format that supports schemas. Depending on your transport and other requirements, this might be XML, or it might be something like Thrift.
Schemas are not the answer to every question, and, like any tool, they can be the wrong one for the job. But in my limited experience, there’s a place for XML and it is easy to overlook because of your own cultural bias.
Suppose you’d like to send a short message, like an account number, to a friend, but have no secure way to transfer it and can’t hand the number off in person.
If there’s something that only you and your friend know, and this something in text is at least as long as the secret you’d like to send, you’re in luck.
Using your shared secret, Perl, Base64, and the XOR cipher, you can communicate small secrets securely.
An Example
Alice needs to send a 7 digit account number, like 9876543, to her friend Bob over e-mail. Alice knows that Bob’s first phone number was 555-4241 . To encode the account number using this shared secret, she can Base64 encode the result of XORing “5554241” and “9876543”.
On her computer, she runs this small Perl script:
perl -MMIME::Base64 -e 'print encode_base64("5554241"^"9876543")'
which returns the string “DA0CAgcAAg==”
Next, she composes an e-mail to Bob:
Hey Bob, here is that account number. Run the following command, substituting the asterisks for your first phone number, no area code, and without the dash: perl -MMIME::Base64 -e ‘print “*******”^decode_base64(“DA0CAgcAAg==”)’
When Bob gets the message, he runs the command on his machine:
perl -MMIME::Base64 -e 'print "5554241"^decode_base64("DA0CAgcAAg==")'
which prints 9876543.