Lua code to process a LuaTeX node list

Introduction

LuaTeX provides access to the deepest internal structures of the TeX engine: nodes, the fundamental building blocks created and assembled by the typesetting engine. I won’t try to explain nodes in detail here but instead refer you to an excellent article on the LuaTeX wiki.

If you are interested to explore node structures, for example the internal structure of a vbox or hbox, you can use the following code to get you started. It does not present anything radically new but simply gives some simple boilerplate code that you can expand to suit your own interests. For example, I used it to convert a node list to a PostScript representation of a paragraph.

Here is an example representation of a node structure.

How to build these node diagrams? I built this diagram using a DLL I wrote for LuaTeX: a customised build of the graphviz library with a Lua binding using the excellent LuaGRAPH library. I also used Patrick Gundlach‘s Lua code LuaTeX nodelist visualization to create the data for graphviz to process (Thanks Patrick!). The node graphs were converted to EPS (via graphviz) and PDFs were generated on the fly using GhostScript in a DLL with a Lua binding. You can of course use Patrick’s code to generate the graphviz data and run graphviz via the command line or via system/shell calls using Lua. I just prefer to have everything callable from DLLs.

Basic background information

Internally, LuaTeX defines quite a number of different node types; for a full list refer to the LuaTeX Reference Manual. You can generate a list of the node types using the LuaTeX API call

node.types()

which returns a table.

For example:

\directlua{
for i,v in pairs(node.types()) do
   print(i,v)
end
}

If you look at the sample node structure diagram above you can see that node lists are a nested linked list structure. To process this data structure you need to “walk over” the node list with a recursive function. The reason for needing recursion is that internally TeX builds nested data structures and it let’s you have boxes within boxes within boxes… These nested structures have to be parsed using recursion. So, the idea is that you start with the first node in the list and then visit and examine each node in turn. As we’ve noted there are quite a few different types of node, so the “action” you may want to perform for each node will depend on the type (id) of that node.

The way I’ve chosen to do this is to have a set of functions and to excute the appropriate function when you see a node of a particular type. One way to do this is with a table indexed by node id and the table value indexed by the id is a function. For example, suppose we have a function called “processnode”:

\directlua{
function processnode(node)
   print("processnode called")
end
}

The argument to the function “node” is the particular node you are looking at. Using the LuaTeX API function node.types() you can quickly populate a table with code such as this:

\directlua {
   nodedispatch={}
      for i,v in pairs(node.types()) do
         nodedispatch[i]=processnode
   end
}

Here, nodedispatch is our table indexed by node type, with each value set to a function called processnode. Calling the processnode function is very easy. Suppose you have a node id value idvalue then all you need to do is something like this:

nodedispatch[idvalue](node)

nodedispatch[idvalue] returns the function and (node) calls the function with your node object.

And whatsits too!

One very important node type is the “whatsit” (see the LuaTeX Reference Manual). TeX’s whatsits all have the same node id but the various different whatsits are defined by the subtype field of the main whatsit node. Similar to node.types() LuaTeX provides a handy API function node.whatsits() which we can use to build another function table, this time for processing whatsits.

\directlua {
whatsitdispatch={}
   for i,v in pairs(node.whatsits()) do
   whatsitdispatch[i]=processwhatsit
end
}

Where processwhatsit is another function to process whatsits.

Wrapping it all together

The above gives a brief summary of the approach but we now need to hook this all together into something you can use (you can download the full code below). Firstly, we need our recursive function to process the node list:

\directlua{
function listnodes(head)
	while head do
		local id = head.id
		nodedispatch[id](head)
   		if id == node.id('hlist') or id == node.id('vlist') then
    			listnodes(head.list)
		end
	head = head.next
       end
end
}

Note that the recursion happens when we see a node type of hlist or vlist because these contain links to further lists which we need to “recurse into”. We now need to glue this into our TeX code which we can do with a simple TeX macro as follows:

\def\dobox#1{\directlua{listnodes(tex.box[#1])}}

An example of using this would be:

\setbox100=\vbox{I love Lua\TeX!}
\dobox{100}

Download sample code

I’ve put some sample code (in a TeX file) for download here.