From Unicode code points to Arabic text

In Unicode, the range (in hex) 0600 to 06FF is used for Arabic characters. Each value in the range 0600 to 06FF is referred to as a code point. In simple terms, just think of it as a number allocated to an Arabic character. For example, 0630 is allocated to ARABIC LETTER THAL (ذ). In advance of more detailed step-by-step tutorials I thought I would post a small C code program which will convert the 256 values 0600 to 06FF into UTF-8 encoding. The following C code will create a 512 byte UTF-8 encoded text file that you can open with BabelPad, for example. You can download the text file and C source here. I would not enter this code into a beauty contest but it is simple and works.

#include <stdio.h>
void main() {

	unsigned short unicode_min = 0x0600;
	unsigned short unicode_max = 0x06FF;
	unsigned char arabic_utf_byte1;
	unsigned char arabic_utf_byte2;

	FILE * arabic = fopen("arabic.txt", "wb");

	for(unsigned short p = unicode_min; p <= unicode_max; p++)
	{
		arabic_utf_byte1 = (unsigned char)(((p & 0x07c0) >>6) + 0xC0);
		arabic_utf_byte2 = (unsigned char)((p & 0x003F) + 0x80);
		fwrite(&arabic_utf_byte1,1,1,arabic);
		fwrite(&arabic_utf_byte2,1,1,arabic);
	}
	fclose(arabic);
  }

If you open arabic.txt in BabelPad you should see something like the following. Note, from within BabelPad you need to switch off complex rendering otherwise Windows’ Uniscribe shaping engine will be activated. In BabelPad, choose Options --> Simple Rendering. What you see will depend on the font you choose in BabelPad, the following uses the OpenType font “Arabic Typesetting” (shipped with Windows Vista). Of course, some code points do not correspond to actual characters or the Arabic Typesetting font does not have the appropriate glyphs: these are shown by a question mark (?) in a box.

FriBidi and HarfBuzz: bidirectional text and text-shaping

It’s been more than a week since my last post so you may be forgiven for thinking that the blog novelty has worn off, but not so 🙂 Over the last week or so I have been working on building the GNU FriBidi and HarfBuzz engines under Windows using Visual Studio rather than MSYS and MinGW. Been rather short of time recently but today I got them both to build. GNU FriBidi provides support for bidirectional text (e.g., Hebrew or Arabic mixed with English) and HarfBuzz provides an OpenType text-shaping engine for complex scripts. In theory, GNU FriBidi and HarfBuzz could be plugged into LuaTeX to provide typesetting solutions for languages such as Arabic or any other complex script that the HarfBuzz engine provides support for. So, the next step is to create a Lua binding for FriBidi and HarfBuzz and figure out the best way to communicate with the LuaTeX engine. Once I have some working code I’ll post some further notes based on what I find out. Stay tuned…

Compiling the FriBidi Unicode bidi algorithm on Windows

I’m exploring the Unicode Bidi Algorithm (UBA) and found the GNU FriBidi implementation of the UBA (in addition to Unicode’s own implementation). The Unicode implementation compiles quite easily with Visual Studio but GNU FriBidi requires MSYS and some code edits, as documented beautifully on kemovitra.blogspot.com. I know very little about Linux-based builds and usually have to resort to all sorts of edits to get some distros to build with Visual Studio but the excellent notes on kemovitra.blogspot.com worked perfectly, first time, so a huge thank you to the author of that blog post.

Brilliant resources for learning Arabic

Just a short post to suggest a site which has some absolutely superb resources for learning Arabic. About a year ago I purchased Basic Arabic Grammar – Part A and have nothing but the greatest praise for the quality of the materials. Having worked my way through all 250 or so exercises, watched all the videos, I urge you to buy it and support this initiative. I have no relationship with arabic-studio.com other than being a totally satisfied customer. Absolutely, utterly outstanding. If you want to see my reviews of Arabic language resources on amazon.co.uk, then click here.