Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Anansi – a NoWeb-inspired literate programming preprocessor (john-millikin.com)
57 points by jmillikin on July 8, 2018 | hide | past | favorite | 23 comments


I came to similar conclusions as the author about the usefulness of literate programming. Even after "taking it seriously" and "doing it right" and all those other no-true-scotsman type objections. (see HATEOAS/REST for more examples of these).

Initially, I was super excited about the idea. Imagine a world where you take a book off the shelf and peruse a master programmer's in-depth explanation of the details of some famous software. What they were thinking, the abstractions they decided on, the algorithms they chose, the tradeoffs made. Wouldn't that be an amazing world to live in?

In practice, very few pieces of software are written in a way that lends itself to a cohesive narrative. Software is written by numerous developers, layer by layer, often with hacks thrown in. New people come on, don't understand the whole thing, change parts of it, and move on. The requirements change, and the system contorts itself to serve multiple purposes that weren't envisioned by the original coders.

Keeping normal auto-generated documentation up to date under those circumstances is already a herculean task, literate programming asks more: maintain a story and flow and an overarching rationale. It's just not possible for most software.

There is a type of software that lends itself to literate programming: the kind of software Donald Knuth writes. Written by a single author, with encyclopedic knowledge of the problem domain, excellent writing style (Knuth is very funny if you've read his work), and above all, the programs Knuth writes tend to be "done" at some point. There are bugs to be fixed, but no new features are added that might drastically change the narrative.

Most software isn't like that, but if yours is, then literate programming can be fun.


I agree that literate programming works well for Knuth because of the kind of programs he writes: mainly, as you say, that the program was written for a particular purpose, and when that's achieved it's "done".

I wonder, though, whether it's necessarily true that more software cannot be like that. We could in principle move a little more in the direction of declaring programs done, and when new requirements come up, writing a new program to cover them. (Knuth has no qualms about having similar code in different programs; it's almost universally believed by other programmers today that that's a terrible thing.) The old program would continue to work well for its original purpose, and those who need the newer program would use that one instead.

As with books: sometimes you need a new edition of a book, sometimes a reprint with corrections, and sometimes a new book entirely. It's ok if multiple books cover the same topics and even if they do so in slightly different ways; all that matters is that each is internally consistent — we don't demand that everything related to a certain domain and written by the same author(s) be in a single book. With programs, almost always they only grow and expand, with incremented version numbers.


> I agree that literate programming works well for Knuth because of the kind of programs he writes: mainly, as you say, that the program was written for a particular purpose, and when that's achieved it's "done".

A piece of software in itself really isn't much of anything; the true value lies in the support you get from the development team. "Done" software is inherently unsupported, therefore close to worthless and probably won't be used in a production setting.

> (Knuth has no qualms about having similar code in different programs; it's almost universally believed by other programmers today that that's a terrible thing.)

It's a waste of programmer effort. In the open source realm at least, it would be a far more efficient use of programmer time and energy -- and easier on the users -- for programmers to collaborate on a single definitive program for each task, rather than reinvent wheels and confuse the marketplace with competing implementations of the same abstract process.


Let's talk specifics. Here are four programs written by Knuth, in roughly chronological order:

1. The Algol-58 compiler he wrote for Burroughs (specifically, for their B205 machine). You can read about it in many places (http://ed-thelen.org/comp-hist/B5000-AlgolRWaychoff.html#7 , https://www.youtube.com/watch?v=QeiuVNDQg4k&list=PLVV0r6CmEs... , or in great detail at http://datatron.blogspot.com/2015/11/knuths-algol-58-compile...). This was written in the summer of 1960, debugged by Christmas, and put on their computers. The machine didn't sell very well in the US, but apparently it (and the compiler) was being used in Brazil over the next decade, successfully. I wouldn't call this "worthless" by any means; it did its job. (Of course one might argue that the prevailing model at the time was for software to get "done", so it wasn't really an exception.)

2. TeX. This is the most famous example. After developing it for about 10 years, he declared it done (https://www.tug.org/TUGboat/tb11-4/tb30knut.pdf) except of course for bugfixes. (He still looks at bug reports once every few years (https://cs.stanford.edu/~knuth/abcde.html#bugs), but there was exactly one bug reported during 2007–2013, and it was a really inconsequential one about the whitespace for how an "empty" macro would be printed in the error logs.) TeX is stable, well-understood (at one point of them there were hundreds of people who "knew" the entire program, which is unprecedented for a program of that size), and is very well-supported (see TUG, tex.stackexchange.com etc) — and in any case most of the questions these days are about LaTeX (a set of macros, with horrible error-handling, the opposite of TeX) or other packages, not about TeX (the program) itself. At Knuth's request, extensions are released as new programs (pdfTeX, XeTeX, LuaTeX) etc, and TeX stays the same (and even these programs have approached stability). This is definitely neither "inherently unsupported" nor "close to worthless" nor "probably won't be used in a production setting" — at any given point of time several publishers are using it in production, not to mention various others who are not even making physical books.

3. The Stanford GraphBase. This is a suite of programs, also published in book form (as literate programs). There are people still making use of these books, and these programs can be used as building blocks for other programs, e.g. for many of the ones that Knuth writes mainly for himself (see https://cs.stanford.edu/~knuth/programs.html or https://github.com/shreevatsa/knuth-literate-programs/tree/m...). I don't think continuing to work on it, versus calling them done for now, would change anything about them.

4. Any of the programs on that page, e.g. say SETSET, which was written in February 2001, and “Enumerates nonisomorphic unplayable hands in the game of SET®”. Or the first two (SHAM, written December 1992, and OBDD, written May 1996) — both written mainly to find that there are “exactly 2,432,932 knight's tours [that] are unchanged by 180-degree rotation of the chessboard”. Once the job is done, just what is to be achieved by refusing to declare them “done”? (Incidentally, note that Knuth did figure out an improvement to SETSET, but wrote it as a new program. The original program is fine though.)

In fact, if you look at the blog post by Joel Spolsky from 2002 on “Five Worlds”, in at least three of them (and maybe four), software can be quite commonly declared “done” (always, of course, except for bug fixes: “done” just means we've finally decided what the software is supposed to do; it's not the same as abandoning it even when it's not doing what we decided it's supposed to do). Throwaway code is often indeed thrown away, games get changes released as sequels or separate expansion packs, embedded software often cannot be updated anyway, and internal software can also often be "done". It's only the first one in which it is usually not.


You can sometimes find self-contained chunks of bigger software that lend themselves to that kind of style. For example, I once had to write a highly-compressed on-disk memory-mapped data structure for searching for large sets of phrases in text, along with associated payloads for each one. About half of that code is actual code; the other half is paragraph-length comments walking the reader through it. How are the payloads encoded? How does the index work? How do we handle prefix compression? What's the deal with unicode normalization? And so on.

It was so much fun to write! These opportunities don't come along very often, but they do come along sometimes.


The exact experience of mine.

One of these types of software that lends to literate programming especially well is the one with lots of math behind it. The one with deceptively short functions which no matter how well expressed in no matter how clean programming language still look like magic.

Seeing axioms, assumptions and approximations growing into theorems and formulas, budding lush crown of graphs, examples and intuitions, resulting in the end in sweet code fruits, all in the beautifully typeset form, is just... sweet.

Oh! And good luck to all those who dump all the documentation and go straight to source code. Not this time, my friend. You can try but you will fail. She's just not one of those girls.

When literate programming stops being some "hipster sh*t" and simply becomes necessity using it is even more so rewarding.


> Eventually the pain of working within a single massive source file became overwhelming, and I decided to write my own literate programming tool that could consume a filesystem hierarchy

I wonder if that’s where they went wrong.

Perhaps when your code is too big to fit in a file, it is too big to describe serially at all. And it’s time to split it in two separate pieces, two modules with a formal interface between.

The idea that you could have a nonlinear narrative without any formalism dictating the relationships between paths is perhaps asking too much of the narrative form.

Perhaps that’s the whole point of a module system: creating formalisms so that you have a chance at holding in your head the relationship between one series of procedures and another.

(This is not entirely academic. I have about 100 richly connected single file modules on NPM)


> Perhaps when your code is too big to fit in a file, it is too big to describe serially at all. And it’s time to split it in two separate pieces, two modules with a formal interface between.

I'm not sure these two issues are related. The "tangled" PDF output of haskell-dbus 0.9 was split into chapters, each chapter being roughly a single Haskell module (with a specified API). And books are by nature serial, yet they can express enormously complex ideas -- think of a compilers textbook.

This was all six years ago so my memory's a bit fuzzy, but I remember feeling lost in my own code when it was all in one file. I couldn't do things like "split window and go to top" to find the imports/exports of the current module, and searches for a symbol name return far noisier results when they're all implicitly project-global.

It might be possible to fix these with better tooling, like an editor that could parse literate source and show the rendered output. But then you'd be treating the "source" as a sort of opaque input and editing at the level of "compiled" output, which is behavior I associate more with reverse-engineering than typical software development.


> And books are by nature serial, yet they can express enormously complex ideas -- think of a compilers textbook.

I'd argue that, in many cases, well-organized books have split their material up into chunks with formal interfaces between them. A textbook's table of contents and introduction create a high-level structure, cross-references allow like material to be grouped with like material to make a coherent picture of each subsystem or topic, and chapters can be read out of order or even skipped entirely by some readers. Textbooks for mathematics or computer science are the most obvious examples, but I find the same level of organization and ability to jump around even in popular science books and philosophical discourses. Even the simplistic "five paragraph essay" that is taught in elementary school can be interpreted as an implementation of a formal interface.


All of that is true of a single source file too -- editors can generate tables of content and cross-references. The experience of using a well-organized paper encyclopedia and Wikipedia are fundamentally different. The point of "serial" is that if I want to read a book, I have to pick an origin point and then start scanning linearly. There's no search bar in a textbook.


I agree.

Many textbooks also include a depencency graph that suggests different orders in which the chapters could be read and understood.


Oddly, I prefer a larger file over a myriad of smaller ones. If, line for line, they are the same, then fewer files is my preference.

Not that I want everything in one. Having a sort of chapter concept can help give consumable chunks of a program to understand.

There are several examples, but I'm finding The Stanford GraphBase to be very consumable. Even for somewhat dense c code.


As a side note: I find the literate programming pattern is more appropriate for combined demo and tests. I am not sure implementation really belongs in the same source location.


I saw https://news.ycombinator.com/item?id=17483242 ("Literate Programming: Empower Your Writing with Emacs Org-Mode") and figured HN might be interested in a real-world attempt to use literate programming for a larger project (Haskell implementation of D-Bus).


Very candid. I've written before about how Literate Programming gets misused: http://akkartik.name/post/literate-programming

This feels like supporting evidence.


Not misused -- I think I gave a pretty good shot at writing literate code as Knuth originally intended it. It's just that the goal of literate programming, the transformation of source code into a document that can be read like a book, doesn't seem to be useful.

It's worth noting that Knuth wrote WEB in 1981, ten years before the web. There's no way he could have known at the time that hyperlinks and search would be a far more useful interaction model for reference documentation.


It's important to remember that for Knuth, the typeset documentation is the “real” program; neither the input you type (the .web or .w file) nor the code generated (the .pas or .c file) are intended to be looked at much. (You may look at them sometimes, the same way you may sometimes look at the generated assembly of your C program for debugging, but that should be rare.) He writes programs on paper; he programmed the whole of TeX and Metafont by writing them with pen/pencil on notepads for several months, before approaching a computer and typing it all in. He also reads a lot of programs written by others, something most programmers today don't really do. (http://www.gigamonkeys.com/code-reading/) When he wants to understand or remember what some program does, he pulls out the printout, reads it like a book, etc. As a scholar, reading and writing books is his natural activity, so everything is optimized for that.

I read a few of Knuth's programs (https://github.com/shreevatsa/knuth-literate-programs/tree/m...) a while ago, the intended way: printed them out and read them while sitting at a table, pen in hand, no computers nearby. The 60s/70s style is quite different — since then a bunch of solutions to the challenges in programming have evolved (abstraction, structured programming, modules with information hiding, OOP, etc), but instead Knuth has evolved his own different solutions, which takes some getting used to. But after a while it was fine and illuminating; you understand there are different ways of doing things, and they can be effective too.


I don't think this is right. In Knuth's own words at the top of his site:

"The main idea is to treat a program as a piece of literature, addressed to human beings rather than to a computer." (https://www-cs-faculty.stanford.edu/~knuth/lp.html)

This is the consistent message I've gotten from his writings: the goal is to communicate to other people. "Transformation of source code into a document that can be read like a book" feels far more low-level than that.

> Knuth wrote WEB in 1981, ten years before the web. There's no way he could have known at the time that hyperlinks and search would be a far more useful interaction model for reference documentation.

Knuth certainly knew about hyperlinks. On the same page, Knuth says:

"The program is also viewed as a hypertext document, rather like the World Wide Web. (Indeed, I used the word WEB for this purpose long before CERN grabbed it!)"

There's a pleasing non-linearity to Knuth's creations, both in the source (with fragments being named and referring to each other) and in typeset form (with all the attention to the index of fragments, to showing with each fragment all the places that refer to it).

---

In any case, we may be splitting hairs here. I'm not a scholar of Knuth's work, and maybe your interpretation is correct. I agree that if you define Literate Programming as "transformation of source code into a document that can be read [linearly] like a book", then that goal is not useful. If you define it as "better communicating programs to other people," then I think that goal is still relevant. All programmers should keep this goal in mind, while loosening their grips a tad on the precise details of how they happen to aim for the goal at a specific point in time. There's still lots of room for improvement.


Hi Kartik! Still at it I see...

me too ^_^


I tried writing some Java code with noweb - and at least for Java 1.3 - the ability to reuse code at any level (like a while loop working with an ADT) helped avoid some of java's tedious verbosity. This helped keep the narrative / document to a manageable length - but it lead to rather confusing Java output (only really a problem if people try to read just the code, ignoring the document).

Overall, I've come to feel that with more powerful languages - some doctests and block comments can go a long way in achieving similar results.

Although for reading proper laid out text with a nice font and good support for mathematical notation does help.

But with a language with rich abstractions, and a sane use of Unicode (from the simple case of using π as a constant, to the more sophisticated of having arrows and actual not-equal signs) - along with a freedom of ordering, so you can do:

  complex_procedure:
    simple_subtask1
    simple_subtask2

  simple_subtask1
  ...
And not get a reference error, because simple_subtask1 is referenced before it's declared - all go a long way in facilitating source code that reads more like prose than code.

[ed: I should add, that I don't really think simple text formats are the best way to offer rich editing experiences - we see this with most IDEs - that "understand" code - but I think the lisp/smalltalk idea of images and being able to go back and forth between byte code and text form really is the way forward.

Like word processors, spreadsheets and image editors all use rich file formats for their working sets.

One such approach is the leo editor: http://leoeditor.com/ ]


I've been working full-time on this problem since March.

Although not ready for prime-time, Orb[0] is self-hosting, and I hope to have a full release ready in a month or so.

[0]: https://github.com/mnemnion/orb/

I've been using it to develop several related projects. It has a long way to go.


See also http://literate.zbyedidia.webfactional.com/ and https://github.com/zyedidia/Literate for information about "Literate", which is my favorite Literate programming tool.


I've also tried several times literate programming. I actually like it for small libraries that requires a good documentation but little maintenance.

What I like in this experience is how literate programming forces me to spell out each of my assumptions and justify every choice, just because I then have to program with the reader in mind. And I found that sometimes those assumptions were not as sound as I though. For that reason I would recommend anyone to give it a try.

By the way, I also had to write my own tool, extracting the code from the documentation instead of generating both code and documentation, which allowed me to use both my favorite document format and programming language.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: