{"id":838,"date":"2011-02-16T23:26:08","date_gmt":"2011-02-16T23:26:08","guid":{"rendered":"http:\/\/www.readytext.co.uk\/?p=838"},"modified":"2013-11-28T09:00:58","modified_gmt":"2013-11-28T09:00:58","slug":"from-unicode-code-points-to-arabic-text","status":"publish","type":"post","link":"https:\/\/www.readytext.co.uk\/?p=838","title":{"rendered":"From Unicode code points to Arabic text"},"content":{"rendered":"<p>In Unicode, the range (in hex) <code>0600<\/code> to <code>06FF<\/code> is used for Arabic characters. Each value in the range <code>0600<\/code> to <code>06FF<\/code> is referred to as a code point. In simple terms, just think of it as a number allocated to an Arabic character. For example, 0630 is allocated to ARABIC LETTER THAL (\u0630). In advance of more detailed step-by-step tutorials I thought I would post a small C code program which will convert the 256 values <code>0600<\/code> to <code>06FF<\/code> into UTF-8 encoding. The following C code will create a 512 byte UTF-8 encoded text file that you can open with BabelPad, for example. You can download the text file and C source <a href=\"http:\/\/readytext.co.uk\/files\/arabic.zip\">here<\/a>. I would not enter this code into a beauty contest but it is simple and works.<\/p>\n<pre class=\"brush: cpp; light: false; title: ; toolbar: true; notranslate\" title=\"\">\r\n#include &lt;stdio.h&gt;\r\nvoid main() {\r\n\r\n\tunsigned short unicode_min = 0x0600;\r\n\tunsigned short unicode_max = 0x06FF;\r\n\tunsigned char arabic_utf_byte1;\r\n\tunsigned char arabic_utf_byte2;\r\n\r\n\tFILE * arabic = fopen(&quot;arabic.txt&quot;, &quot;wb&quot;);\r\n\r\n\tfor(unsigned short p = unicode_min; p &lt;= unicode_max; p++)\r\n\t{\r\n\t\tarabic_utf_byte1 = (unsigned char)(((p &amp; 0x07c0) &gt;&gt;6) + 0xC0);\r\n\t\tarabic_utf_byte2 = (unsigned char)((p &amp; 0x003F) + 0x80);\r\n\t\tfwrite(&amp;arabic_utf_byte1,1,1,arabic);\r\n\t\tfwrite(&amp;arabic_utf_byte2,1,1,arabic);\r\n\t}\r\n\tfclose(arabic);\r\n  }\r\n<\/pre>\n<p>If you open <code>arabic.txt<\/code> in BabelPad you should see something like the following. Note, from within BabelPad you need to switch off complex rendering otherwise Windows&#8217; Uniscribe shaping engine will be activated. In BabelPad, choose <code>Options --&gt; Simple Rendering<\/code>. What you see will depend on the font you choose in BabelPad, the following uses the OpenType font &#8220;Arabic Typesetting&#8221; (shipped with Windows Vista). Of course, some code points do not correspond to actual characters or the Arabic Typesetting font does not have the appropriate glyphs: these are shown by a question mark (?) in a box.<\/p>\n<p><img decoding=\"async\" src=\"..\/files\/arabicbabelpad.png\" alt=\"\" width=\"100%\" \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In Unicode, the range (in hex) 0600 to 06FF is used for Arabic characters. Each value in the range 0600 to 06FF is referred to as a code point. In simple terms, just think of it as a number allocated to an Arabic character. For example, 0630 is allocated to ARABIC LETTER THAL (\u0630). In [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17,10],"tags":[],"class_list":["post-838","post","type-post","status-publish","format-standard","hentry","category-unicode-arabic","category-unicode"],"blocksy_meta":[],"_links":{"self":[{"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/838","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=838"}],"version-history":[{"count":19,"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/838\/revisions"}],"predecessor-version":[{"id":3254,"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/838\/revisions\/3254"}],"wp:attachment":[{"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=838"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=838"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.readytext.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=838"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}