{"id":1449,"date":"2011-11-15T11:40:02","date_gmt":"2011-11-15T11:40:02","guid":{"rendered":"http:\/\/www.readytext.co.uk\/?p=1449"},"modified":"2013-11-28T09:03:05","modified_gmt":"2013-11-28T09:03:05","slug":"lua-code-to-process-a-luatex-node-list","status":"publish","type":"post","link":"https:\/\/www.readytext.co.uk\/?p=1449","title":{"rendered":"Lua code to process a LuaTeX node list"},"content":{"rendered":"<h1>Introduction<\/h1>\n<p>LuaTeX provides access to the deepest internal structures of the TeX engine: <em>nodes<\/em>, the fundamental building blocks created and assembled by the typesetting engine. I won&#8217;t try to explain nodes in detail here but instead refer you to an <a href=\"http:\/\/wiki.luatex.org\/index.php\/TeX_without_TeX\">excellent article on the LuaTeX wiki<\/a>. <\/p>\n<p>If you are interested to explore node structures, for example the internal structure of a vbox or hbox, you can use the following code to get you started. It does not present anything radically new but simply gives some simple boilerplate code that you can expand to suit your own interests. For example, I used it to convert a node list to a PostScript representation of a paragraph.<\/p>\n<p>Here is an example representation of a node structure.<\/p>\n<p><iframe src=\"http:\/\/docs.google.com\/gview?url=http:\/\/readytext.co.uk\/files\/samplenodelist.pdf&amp;embedded=true\" style=\"width:100%; height:400px;\" frameborder=\"0\"><\/iframe><\/p>\n<blockquote><p><strong>How to build these node diagrams?<\/strong> I built this diagram using a DLL I wrote for LuaTeX: a customised build of the <a href=\"http:\/\/www.graphviz.org\/\">graphviz library<\/a> with a Lua binding using the excellent <a href=\"http:\/\/luagraph.luaforge.net\/\">LuaGRAPH<\/a> library. I also used <a href=\"http:\/\/www.luatex.de\/\">Patrick Gundlach<\/a>&#8216;s Lua code <a href=\"https:\/\/gist.github.com\/556247\">LuaTeX nodelist visualization<\/a> to create the data for graphviz to process (Thanks Patrick!). The node graphs were converted to EPS (via graphviz) and PDFs were generated on the fly using GhostScript in a DLL with a Lua binding. You can of course use Patrick&#8217;s code to generate the graphviz data and run graphviz via the command line or via system\/shell calls using Lua. I just prefer to have everything callable from DLLs. <\/p><\/blockquote>\n<h1>Basic background information<\/h1>\n<p>Internally, LuaTeX defines quite a number of different node types; for a full list refer to the <a href=\"http:\/\/www.luatex.org\/manuals\/luatexref-t-070.pdf\">LuaTeX Reference Manual<\/a>. You can generate a list of the node types using the LuaTeX API call <\/p>\n<pre class=\"brush: plain; light: false; title: ; toolbar: true; notranslate\" title=\"\">node.types()<\/pre>\n<p> which returns a table. <\/p>\n<p>For example:<\/p>\n<pre class=\"brush: plain; light: false; title: ; toolbar: true; notranslate\" title=\"\">\r\n\\directlua{\r\nfor i,v in pairs(node.types()) do\r\n   print(i,v)\r\nend\r\n}\r\n<\/pre>\n<p>If you look at the sample node structure diagram above you can see that node lists are a nested linked list structure. To process this data structure you need to &#8220;walk over&#8221; the node list with a recursive function. The reason for needing recursion is that internally TeX builds nested data structures and it let&#8217;s you have boxes within boxes within boxes&#8230; These nested structures have to be parsed using recursion. So, the idea is that you start with the first node in the list and then visit and examine each node in turn. As we&#8217;ve noted there are quite a few different types of node, so the &#8220;action&#8221; you may want to perform for each node will depend on the type (id) of that node.<\/p>\n<p>The way I&#8217;ve chosen to do this is to have a set of functions and to excute the appropriate function when you see a node of a particular type. One way to do this is with a table indexed by node id and the table value indexed by the id is a function. For example, suppose we have a function called &#8220;processnode&#8221;:<\/p>\n<pre class=\"brush: plain; light: false; title: ; toolbar: true; notranslate\" title=\"\">\r\n\\directlua{\r\nfunction processnode(node)\r\n   print(&quot;processnode called&quot;)\r\nend\r\n}\r\n<\/pre>\n<p>The argument to the function &#8220;node&#8221; is the particular node you are looking at. Using the LuaTeX API function <code>node.types()<\/code> you can quickly populate a table with code such as this:<\/p>\n<pre class=\"brush: plain; light: false; title: ; toolbar: true; notranslate\" title=\"\">\r\n\\directlua {\r\n   nodedispatch={}\r\n      for i,v in pairs(node.types()) do\r\n         nodedispatch&#x5B;i]=processnode\r\n   end\r\n}<\/pre>\n<p>Here, <code>nodedispatch<\/code> is our table indexed by node type, with each value set to a function called <code>processnode<\/code>.  Calling the <code>processnode<\/code> function is very easy. Suppose you have a node id value <code>idvalue<\/code> then all you need to do is something like this:<\/p>\n<p><code>nodedispatch[idvalue](node)<\/code><\/p>\n<p><code>nodedispatch[idvalue]<\/code> returns the function and <code>(node)<\/code> calls the function with your <code>node<\/code> object.<\/p>\n<h2>And whatsits too!<\/h2>\n<p>One very important node type is the &#8220;whatsit&#8221; (see the <a href=\"http:\/\/www.luatex.org\/manuals\/luatexref-t-070.pdf\">LuaTeX Reference Manual<\/a>). TeX&#8217;s whatsits all have the same node id but the various different whatsits are defined by the subtype field of the main whatsit node. Similar to <code>node.types()<\/code> LuaTeX provides a handy API function <code>node.whatsits()<\/code> which we can use to build another function table, this time for processing whatsits.<\/p>\n<pre class=\"brush: plain; light: false; title: ; toolbar: true; notranslate\" title=\"\">\r\n\\directlua {\r\nwhatsitdispatch={}\r\n   for i,v in pairs(node.whatsits()) do\r\n   whatsitdispatch&#x5B;i]=processwhatsit\r\nend\r\n}<\/pre>\n<p>Where <code>processwhatsit<\/code> is another function to process whatsits.<\/p>\n<h1>Wrapping it all together<\/h1>\n<p>The above gives a brief summary of the approach but we now need to hook this all together into something you can use (you can download the full code below). Firstly, we need our recursive function to process the node list:<\/p>\n<pre class=\"brush: plain; light: false; title: ; toolbar: true; notranslate\" title=\"\">\r\n\\directlua{\r\nfunction listnodes(head)\r\n\twhile head do\r\n\t\tlocal id = head.id\r\n\t\tnodedispatch&#x5B;id](head)\r\n   \t\tif id == node.id('hlist') or id == node.id('vlist') then\r\n    \t\t\tlistnodes(head.list)\r\n\t\tend\r\n\thead = head.next\r\n       end\r\nend\r\n}\r\n<\/pre>\n<p>Note that the recursion happens when we see a node type of <code>hlist <\/code>or <code>vlist <\/code>because these contain links to further lists which we need to &#8220;recurse into&#8221;. We now need to glue this into our TeX code which we can do with a simple TeX macro as follows:<\/p>\n<pre class=\"brush: plain; light: false; title: ; toolbar: true; notranslate\" title=\"\">\r\n\\def\\dobox#1{\\directlua{listnodes(tex.box&#x5B;#1])}}\r\n<\/pre>\n<p>An example of using this would be:<\/p>\n<pre class=\"brush: plain; light: false; title: ; toolbar: true; notranslate\" title=\"\">\r\n\\setbox100=\\vbox{I love Lua\\TeX!}\r\n\\dobox{100}\r\n<\/pre>\n<h1>Download sample code<\/h1>\n<p>I&#8217;ve put some sample code (in a TeX file) for <a href=\"http:\/\/readytext.co.uk\/files\/nodelister.zip\">download here<\/a>. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction LuaTeX provides access to the deepest internal structures of the TeX engine: nodes, the fundamental building blocks created and assembled by the typesetting engine. I won&#8217;t try to explain nodes in detail here but instead refer you to an excellent article on the LuaTeX wiki. If you are interested to explore node structures, for [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[15,3],"tags":[],"class_list":["post-1449","post","type-post","status-publish","format-standard","hentry","category-examples","category-luatex"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/1449","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1449"}],"version-history":[{"count":25,"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/1449\/revisions"}],"predecessor-version":[{"id":3273,"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/1449\/revisions\/3273"}],"wp:attachment":[{"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1449"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1449"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1449"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}