Typesetting Arabic with LuaTeX [via a C plug-in] (Part 1)

Introduction

In this new series of posts I’m going to attempt an overview of the topics, concepts, ideas and technologies involved in typesetting Arabic with LuaTeX, via a DLL I’m writing in C. Actually, the C code is very substantially platform-independent so it should compile on non-Windows machines… one day, when it’s “finished”…

Up until 2 years ago I was teaching myself Arabic (see my Amazon book reviews) and had reached the point where I wanted to write-up my notes and worked exercises: I needed to typeset Arabic and wanted to use a TeX-based solution. Having looked around I stumbled upon some truly amazing video presentations of Arabic typesetting work being undertaken by Idris Hamid and Hans Hagen, using a tool called LuaTeX: something I’d never heard of. I was truly stunned by what I saw, the quality of their Arabic typesetting was (is) incredible, so I had to find out more. A few hours later I’d worked out that the typesetting was being achieved through Hans Hagen’s ConTeXt package, with LuaTeX as the underlying TeX engine. However, I’m personally not a user of ConTeXt, but the LuaTeX engine was just so interesting that I had to explore it. Well, two years later and I’ve not done any further learning of Arabic, having replaced that activity with plenty of explorations into LuaTeX and a host of other technologies, particularly OpenType and Unicode.

Coming up to the present day, I’ve finally reached the point where I have puzzled out enough detail of the “big picture” to attempt a home-grown Arabic typesetting solution for LuaTeX, but one where most of the “heavy lifting” is done in C, with Lua code to interface with and talk to LuaTeX. For sure, there are ready-made options such as XeTeX or the range of Arabic typesetting solutions created by the TeX community. However, my interest is creating a solution that will just as easily output SVG or other non-PDF formats, plus allow the automated production of new and novel “typeset structures” and diagrams that will really help with learning Arabic: things I wish had been present in the many books I have bought and studied but which may just be too time-consuming, or difficult/expensive, to produce by “conventional” applications. These are big goals, but definitely achievable, albeit over a year or two of further work.

Sample

Just by way of an early example, see the following PDF, as usual, through the Google Docs viewer or download PDF here. The trained eye will certainly spot a few issues that need fixing but so far it’s not looking too bad :-). But there is a long, long way to go yet. The font used is Microsoft’s “Arabic Typesetting” because it is contains a substantial number of OpenType features including cursive positioning, mark-to-base positioning, an enormous range of ligatures plus many other features which make it an ideal choice of font to work with (in my opinion). In the example (the made-up words) you can see the non-horizontal baseline achieved with cursive positioning plus the ability to control vowel placement with great flexibility.

But it’s still far from perfect, I’ll readily admit. I hope I can finish this work, and find the time to complete these articles. I’ll certainly try!