STM publishing: tools, technologies and change A WordPress site for STM Publishing

12Nov/13Off

More progress with HarfBuzz/LuaTeX (update)

Posted by Graham Douglas

Just a short post to share another example from my on-going work on HarfBuzz/LuaTeX. A rather pointless example – without using any code to correctly place mark glyphs (e.g., vowels) – showing randomly coloured Arabic glyphs. Thanks to the power of HarfBuzz and the superb Lua C API (especially C closures and "for loop" iterators) the code to process the Arabic text is about 25 lines of Lua script.

Source of text for typesetting example: BBC Arabic. I don't know what the text says but Google Translate indicated it was neither controversial or offensive – I hope that is the case!

Download PDF

Update

Just to add an example with mark glyph positioning and random colours. Vowel positioning added about 10 lines of Lua script :-) .

Download PDF

24Sep/13Off

Early results of integrating HarfBuzz into LuaTeX

Posted by Graham Douglas

Building on the work of porting LuaTeX to build on Windows I decided to explore adding HarfBuzz to provide Arabic shaping. The excellent HarfBuzz API lends itself to some interesting solutions so here's a quick post to show some early results.

Source of text for typesetting fully vowelled Arabic examples: http://en.wikipedia.org/wiki/Arabic_language#Studying_Arabic

Download PDF

21Sep/13Off

Exploring LuaTeX nodes and boxes with Graphviz on Windows

Posted by Graham Douglas

If you are interested to explore the inner structures of TeX boxes created in LuaTeX you can do this very conveniently using the following free resources:

  • viznodelist.lua by Patrick Gundlach. This is an excellent Lua script that generates a text file containing a graph representation of the structures and nodes inside a \vbox{...} or \hbox{...}. The file output by viznodelist.lua can be opened and displayed using GVEdit (see below).
  • GVEdit is part of the Graphviz distribution and you can download a Windows installer from the Graphviz website

Installing Graphviz should be straightforward using the MSI installer provided. To use viznodelist.lua you'll need to put the file in the appropriate place within your texmf tree. To find the right location you may need to look into your texmf.cnf file to examine the LUAINPUTS variable – which typically looks something like this:

LUAINPUTS = .;$TEXMF/scripts/{$progname,$engine,}/{lua,}//;$TEXMF/tex/{luatex,plain,generic,}//

For example, suppose your texmf folder is located at h:\texmf then you could put viznodelist.lua in the folder h:\texmf\scripts\lua.

Here's an ultra-minimal plain LuaTeX example:

\directlua{require("viznodelist")}
\setbox1001= \vbox{\hsize=50 mm Hello \hbox{Hello}}
\directlua{viznodelist.nodelist_visualize(1001,"h:/texmf/mybox.gv")}
\bye

The above code will parse the contents of box 1001 and output a file called mybox.gv which you can open in GVEdit to view a graph of the the node structures in box 1001. The following screenshot displays this:

GVEdit can export the graph in numerous formats including PDF, PNG etc.

Filed under: Examples, LuaTeX Comments Off
30Aug/13Off

Happy Days: A fully native Windows Build of LuaTeX using Visual Studio

Posted by Graham Douglas

Well, today I finally achieved my ambition to build LuaTeX using Visual Studio. It took me about 25 hours of my evenings to do it but at long last I can now step through the code with a nice visual debugger to begin to understand more about this marvellous TeX engine. It wasn't trivial but neither was it quite as complex as I'd feared. Simply Happy Days! Here's a screenshot of it in action.

23Nov/12Off

Adding a UTF-8-capable regular expression library to LuaTeX

Posted by Graham Douglas

Introduction

In this post I'm going to sketch out adding the free PCRE C library to LuaTeX through a DLL and outline how you can get PCRE to call LuaTeX! The following is just an outline of an experiment, not a tutorial on PCRE, and I've not tried this in a production environment. So, do please undertake all necessary testing and due diligence in your own code!

PCRE: Perl Compatible Regular Expressions

PCRE is a mature C library which provides a very powerful regular expression engine. It is also capable of working with UTF-8 encoded strings, which is, of course, very useful because LuaTeX uses UTF-8 input. I'm not going to cover the entire PCRE build process in this post because, frankly, it'll take too long. But in outline...

Building PCRE as a static library (.lib)

  1. I used CMake to create a Visual Studio 2008 project via the PCRE-supplied CMakeLists.txt file. Using the CMake tool you can set the appropriate compile-time flags for UFT-8 support: PCRE_SUPPORT_UTF and PCRE_SUPPORT_UNICODE_PROPERTIES. The latter is very useful for seaching UTF-8 strings based on their Unicode character properties. Full details are in the PCRE documentation.
  2. After you finish configuring the PCRE build, and have selected your build environment, press Generate and CMake will output a complete Visual Studio project that you can open and start working on. Wonderful!
  3. However, getting PCRE to build as a static library was fine but I did have a few hassles getting the library to correctly link against the DLL I was building. It took me a bit of time to figure out which additional PCRE preprocessor directives I needed to set in the DLL C code to ensure everything was #define'd properly.

Building a DLL for LuaTeX

I wrote a very brief overview of building DLLs for LuaTeX in this post so I won't repeat the details here. Instead, I'll give a summary indicating how you can get PCRE to call LuaTeX. One word of advice, PCRE comes with a lot of documentation and you'll need to read through it very carefully! Asking PCRE to call LuaTeX sounds strange but indeed you can do it because PCRE provides the ability to register a callback function it will call each time it matches a string. Perl has a similar ability to execute Perl code on matching a string. From the PCRE documentation:

"PCRE provides a feature called 'callout', which is a means of temporarily passing control to the caller of PCRE in the middle of pattern matching. The caller of PCRE provides an external function by putting its entry point in the global variable pcre_callout."

Calling LuaTeX

OK, so how do we do that? There are two parts to this story: create a Lua function you want to call from C and create the C function which calls the Lua function.

  1. From within LuaTeX, use \directlua{...} to create a simple Lua function printy that we are going to call from PCRE. This Lua function takes a string and sends it to LuaTeX via tex.print(). In these examples I sent LuaTeX a simple text string "Yo! I was called!", which LuaTeX then typeset. Of course, you could also send LuaTeX the string that was matched by PCRE!
           \directlua{
                  function printy (str)
                  tex.print(str)
                  end
           }
    
  2. The next part is to create the C code to call a Lua function. This C function is the callout that PCRE will call when it matches a string.
           int mycallout(pcre_callout_block *cb){
           lua_State *L;
           L = cb->callout_data;
           if (L){
                  lua_getglobal(cb->callout_data, "printy");
                  if(!lua_isfunction(L,-1)) {
                         lua_pop(L,1);
                         return 0;
                   }
    
                  lua_pushstring(L, "Yo! I was called!");   /* push 1st argument */
                  /* Now make the call to printy with 1 argument and 0 results*/
                  if (lua_pcall(L, 1, 0, 0) != 0) {
                  // report your error
                   return 0;
                  }
        }
        return 0;
    }
    

    A few points here are worth noting.

    • From the PCRE documentation:

      "The external callout function returns an integer to PCRE. If the value is zero, matching proceeds as normal. If the value is greater than zero, matching fails at the current point, but the testing of other matching possibilities goes ahead, just as if a lookahead assertion had failed. If the value is less than zero, the match is abandoned, the matching function returns the negative value"

    • The lua_State variable, *L, is passed in via a mechanism I'll outline below.
    • The line lua_getglobal(cb->callout_data, "printy") does the main work of pushing the value of the gloabal variable printy onto Lua's stack. Of course, in effect this is a pointer to the function we defined in LuaTeX, and which we call through lua_pcall(...). Further details in the Lua documentation.
    • The above code does near-zero error checking, it is purely to demonstrate the ideas!

Other PCRE bits and pieces

There are a few other points to consider, namely how do you setup the callout and how do you pass lua_State *L to the callout? I'm not going to explain in great detail how all these parts hang together in a full application, simply point out some key pieces.

  1. You have to set the PCRE global variable pcre_callout, a function pointer, to your callout function. Simply, pcre_callout = mycallout; Yes, it does work. Here, re represents our compiled regular expression pattern. Note that you must use the PCRE_UTF8 option if you are searching UTF-8 encoded text.
  2. Before you can start searching, you need to "compile" your regular expression pattern.
                  re = pcre_compile(pattern,
    		      PCRE_UTF8|PCRE_UCP,
    		      &err_msg,
    		      &err,
    		      NULL);
    
  3. Note, to use PCRE callouts you need to use the appropriate syntax in your regular expression; from the PCRE documentation, "Within a regular expression, (?C) indicates the points at which the external function is to be called." Once you have compiled your search pattern, and done your error checking, you need to run the search engine using the compiled pattern and your target string (s) in the code below.
  4. The next step is to create a pointer to something called a pcre_callout_block, which is a struct. This struct has a field called callout_data which is a pointer into which you can store whatever you want to pass into the mycallout function: here, I'm setting it to the lua_State variable, L. By doing this, each time PCRE matches a string and calls the callout funtion, the lua_State variable, L will be available for our use! Clearly, you'll need to do this from within the appropriate function you call from LuaTeX. Once this is done you are ready to begin your searching using pcre_exec(...).

                  pcre_extra *p;
                  p = (pcre_extra*) malloc(sizeof(pcre_extra));
                  memset(p,0, sizeof(pcre_extra));
                  p->callout_data = L;
                  p->flags=PCRE_EXTRA_CALLOUT_DATA;
                         res = pcre_exec(re,
                                p,
                                s,
                                len,
                                0,
                                0,
                                offsets,
                         OVECMAX);
    

Summary

PCRE is a marvellous and powerful C library – with copious documentation that you'll need to read very carefully! The ability to provide LuaTeX with a UTF-8-enabled regex engine could open the way to some useful applications, particularly when combined with LuaTeX's own callback mechanism. In particular, the process_input_buffer callback which allows you to change the contents of the line input buffer just before LuaTeX actually starts looking at it. The mind boggles at the possibilities!

19Nov/12Off

Browsing LuaTeX source with NetBeans

Posted by Graham Douglas

Introduction

It's been a long time since I posted anything on this blog, mainly because my job has been keeping me very busy. As time permits I've been reading parts of the LuaTeX source code in an attempt to better understand how it all works: cross-referencing the source code to explanations in the LuaTeX Reference. A couple of days ago I stumbled on the NetBeans IDE – a free Integrated Development Environment. I was interested to see that NetBeans has a Subversion Checkout Wizard (i.e., built-in SVN capabilities), so you can checkout a copy of the LuaTeX code repository and import it directly into NetBeans as a new project. So, I downloaded NetBeans (with C/C++ support) and checked out a copy of the LuaTeX code base, directly from within NetBeans. After completing the download, NetBeans automatically imported the LuaTeX code to create a new project. Very nice!

However, I have not tried to build LuaTeX using NetBeans (because I need to understand more about the build process) but I have found that it provides excellent tools to search and browse the source code, allowing you to very quickly explore and probe some of the deeper mysteries of TeX.

Tip: tell NetBeans about .w files

Much of the LuaTeX code base is written in CWEB (integrated C source code and documentation); consequently, many of the source files have a .w extension. You'll need to configure NetBeans to tell it about .w files: see Tools --> Options --> Miscellaneous.

Here's a screenshot showing a search for the build_page() function, part of TeX's page-building machinery, showing you where and when TeX exercises the page builder.

Filed under: LuaTeX Comments Off