MetaPost: Direct to PDF via MPlib

Introduction

Using the Cairo graphics library (under Windows/Visual Studio) I have, with some caveats, been able to create a direct-to-PDF backend for MetaPost via the brilliant MPlib C library. Of course, Cairo does not support the CMYK colour space which is a real shame, despite there being a lot of discussion on the need for that. I might look at using LibHaru or possibly PoDoFo, both of which I’ve managed to build on Windows – although I found PoDoFo somewhat difficult to build as a native Windows library. In addition, I have not yet added support for including text in the MetaPost graphics which is, of course, a pretty big omission! That’s on the “TODO” list. An example PDF is included in this post, based on the MetaPost code available on this site. If you look at the example PDF you will see it is created with Cairo 1.12.16, the latest release available at the time I wrote this post (25 October 2014).

Download PDF

Quick overview of the process

At the moment, the PDF backend seems to work well, at least with the MetaPost code I’ve tried it with (minus text, of course!). The lack of CMYK support in Cairo is a nuisance and at the moment I do a very simple, and wholly inadequate, “conversion” of CMYK to RGB, which really makes me cringe. Perhaps I might put in a “callback” feature to use other PDF libraries at the appropriate points in my C code. MPlib itself is a superb C library and the API documentation (version 1.800) that’s currently available was a helpful start but as very non-expert MetaPost user I did need to resort to John Hobby’s original work in order to understand just a little more about some MetaPost internals. In writing the PDF backend I pretty much had to go through the PostScript backend and replace PostScript output with the appropriate Cairo API calls. The trickiest part, at least for me, was implementing management of the graphics state (as MetaPost sees it). In the end, I chose to use MPlib’s ability to register a userdata pointer (void*) with the MetaPost interpreter. In the PostScript backend the graphics state is managed internally by the MetaPost interpreter (MPlib). Can’t quite recall why I chose to externalise the graphics state code but I think it was to give me a bit more flexibility; either way, so far it basically works well. I chose to build MPlib as a static Windows .lib file – no particular reason, just that’s what I prefer to do – although building a DLL is no more difficult. Much of MPlib is released as a set of CWEB files so you will need to extract the C code via CTANGLE.EXE. I use Windows and Visual Studio so, not surprisingly, I found that the MPlib C code would not compile immediately “out of the box” but a few minor (pretty trivial) adjustments to the header files (and some manual #defines) soon resolved the problems and it compiled fine after that.

A little deeper

Assuming you have a working compilation of MPlib, how do you actually use it? I won’t repeat the information available in the the MPlib API documentation but will give a brief summary of additional considerations that might be helpful to others. Firstly, in my implementation I instantiate an instance of the MP interpreter like this:

	MP mp = init_metapost((void*)create_mp_graphics_state());
	if ( ! mp ) exit ( EXIT_FAILURE ) ;

where (void*)create_mp_graphics_state() is a function to create a new graphics state and register this as the userdata item stored in the MPlib instance – see the code for init_metapost(void* userdata) below (Note: this is a work-in-progress and the error checking is very minimal!!! :-)). Providing the initialization succeeds you will get a new MetaPost interpreter instance returned to you. As part of the initialization you have to provide a callback that tells MetaPost how to find input files – my callback is called file_finder which uses recursive directory searching: no kpathsea involved at all. One very important setting in MP-options is math_mode which affects how MetaPost performs its internal calculations: later versions of MPlib (after 1.800) support all 4 of the possible options. As part of the initialization I also preload the plain.mp macro collection.

MP init_metapost(void* userdata)
{

	MP mp;
	MP_options * opt = mp_options () ;
	opt -> command_line = NULL;
	opt -> noninteractive = 1 ;
	opt->find_file = file_finder;
	opt->print_found_names = 1;
	opt->userdata = userdata;

	/*
	typedef enum{
	mp_math_scaled_mode= 0,
	mp_math_double_mode= 1,
	mp_math_binary_mode= 2,
	mp_math_decimal_mode= 3
	}mp_math_mode;
	*/

	opt->math_mode =mp_math_scaled_mode;
        opt->ini_version = 1;
	mp = mp_initialize ( opt ) ;
	if ( ! mp ) 
		//exit ( EXIT_FAILURE )
		return NULL;
	else
	{
		char * input= "let dump = endinput ; input plain; ";
		mp_execute(mp, input, strlen(input));
		mp_run_data * res = mp_rundata(mp);
		
		if(mp->history > 0)
		{
			printf("Error text (%s\n)", res->term_out.data);
			return NULL;
		}
		else{
		
			return mp;
		}
	}

}

Got a working instance, now what?

In you get a working MP instance the next task is, of course, to feed it with some MetaPost code (using mp_execute(mp, your_code, strlen(your_code))😉 and checking to see if MetaPost successfully interpreted your_code. Now I’m not going to give full details of the checks you need to perform as this is pretty routine and the API documentation contains enough help already. In essence, if MPlib was able to run your MetaPost code successfully, it stores the individual graphics (produced from your_code) as a linked list of so-called edge structures (mp_edge_objects). Each edge structure (mp_edge_object) is a graphic that you want to output and, in essence, each edge structure results from the successful execution of the code contained in each beginfig(x) ... endfig; pair. In turn, each edge structure (individual graphic to output) is itself made up from smaller building blocks of 8 types of fundamental graphics object (mp_graphic_object). Each mp_graphic_object has a type to tell you what sort of graphic object it is so you can call the appropriate function to render it – as the equivalent PostScript, PDF, PNG, SVG etc.

In summary

If your MetaPost interpreter instance is called, say, mp, then to gain access to the linked list of edge structures you do something like this:

 
mp_run_data * res = mp_rundata(mp);
mp_edge_object* graphics = res->edges;

Note that the edge structures form a simple linked list but the list of components within each individual edge structure (the mp_graphic_object objects) form a circularly-linked list, so you have to be careful to check when you get to the end of the circular list of the mp_graphic_object objects: see the API docs for an example. In closing, here’s the loop from my code to process an individual edge structure into PDF – not including all the additional functions to process the various types of the mp_graphic_object objects.

int draw_mp_graphic_on_pdf(mp_edge_object* single_graphic, cairo_t *cr)
{

 		mp_graphic_object*p;
 		MP mp = single_graphic->parent;

		 p=single_graphic->body;
		
		// Inherited this weirdness from core MP engine...
		init_graphics_state(mp, 0);
		// Here we are looping over all the objects in a single graphics
		// resulting from a beginfig(x) ... endfig pair
		 while (p != NULL) 
		 {
			mp_gr_fix_graphics_state(mp,p,cr);
 			switch (gr_type(p)) 
			 {
				 case mp_fill_code:

				 {
				
				 if(gr_pen_p((mp_fill_object*)p)==NULL)
					 {
						//mp_dump_solved_path(gr_path_p((mp_fill_object*)p));
						cairo_gr_pdf_fill_out(mp,gr_path_p((mp_fill_object*)p),cr); 
					}
					 else if(pen_is_elliptical(gr_pen_p((mp_fill_object*)p)))
					{
						//mp_dump_solved_path(gr_path_p((mp_fill_object*)p));
						cairo_gr_stroke_ellipse(mp,p,true,cr);
					 }else{
						//mp_dump_solved_path(gr_path_p((mp_stroked_object*)p));
						cairo_gr_pdf_fill_out(mp,gr_path_p((mp_fill_object*)p),cr);
						cairo_gr_pdf_fill_out(mp,gr_htap(p),cr); 
					 }
	
					 if(   ((mp_fill_object*)p)->post_script != NULL)
					 {
					        // just something I'm experimenting with
						//ondraw(cr, ((mp_fill_object*)p)->post_script);
					}
				}
				break;

				 case mp_stroked_code:
				 {
					
					 mp_dump_solved_path(gr_path_p((mp_stroked_object*)p));	
					if(pen_is_elliptical(gr_pen_p((mp_stroked_object*)p)))
						cairo_gr_stroke_ellipse(mp, p, false, cr);
 					else
 					{
						//mp_dump_solved_path(gr_path_p((mp_stroked_object*)p));
						cairo_gr_pdf_fill_out(mp,gr_path_p((mp_stroked_object*)p),cr);
 					}

					 if(((mp_stroked_object*)p)->post_script != NULL)
					{
						ondraw(cr, ((mp_stroked_object*)p)->post_script);
 					}
 				}
 				break;

  			         case mp_text_code: // not yet implemented
				 {
					 mp_text_object* to;
					to = (mp_text_object*)p;
					char * po = to->post_script;
					char * ps = to->pre_script;
				}
				break;

				case mp_start_clip_code:
					cairo_save(cr);
					cairo_gr_pdf_path_out(mp,gr_path_p((mp_clip_object*)p),cr);
					cairo_clip(cr);
				break;
				
				case mp_stop_clip_code:
					cairo_restore(cr);
				break;		
				
				case mp_start_bounds_code: // ignored
					//mp_bounds_object *sbo;
					//sbo = (mp_bounds_object *)gr;
				break;

				case mp_stop_bounds_code: //ignored
					//mp_special_object
				break;
				case mp_special_code: //just more experimenting, ignore
					
					mp_special_object *speco;
					speco = (mp_special_object *)p;
					printf("%s", speco->pre_script);
					ondraw(cr, speco->pre_script);
				break;
			}
				p= gr_link(p);
		}
		return 0;
	}

Conclusion

I wish I could switch on the commenting feature but, sadly, spammers make this impossible. So, I just hope the above is a useful starting point for anyone wanting to explore the marvellous MPlib C library.

Testing embedding some Tweets

PDF file of John Hobby’s original MetaPost code (version 0.64)

MetaPost MPlib

I’m currently implementing a project built around the MetaPost library MPlib. I managed to build MPlib as a Windows .lib (library) file without “too much” difficulty… In order to understand the workings of the powerful, but complex, MPlib library I found it was very helpful to read parts of Hobby’s original code – mainly in relation to generating output from the low-level MPlib/MetaPost edge structures. I also benefitted enormously from reading the C code of the Lua binding so a huge thank you to Taco Hoekwater for his utterly brilliant work on the MPlib/lmplib source code.

I tracked down the MetaPost 0.64 source code (the .web code) and ran TIE and WEAVE to generate the TeX documentation. After a few tiny fixes (for fonts I don’t have) I produced a PDF file which I thought others might find useful. You can download it here. The MPlib API documentation (again by Taco) was also very helpful – documentation for version 1.800 of the MPlib API is available here.

Building Cairo 1.12.16 as a .lib on Windows using Visual Studio

A real gotcha! (well, it got me)

I recently built Cairo 1.12.16 as a Windows .lib file using Visual Studio. Somewhat of a painful process but it seems to work fine. One detail that caught me out (and took hours to track down) was that I did not set a critically important preprocessor setting: HAVE_FT_LOAD_SFNT_TABLE. This is important if you are using FreeType: Without setting HAVE_FT_LOAD_SFNT_TABLE Cairo uses a “fallback” process for embedding fonts, which is not ideal.

The C source files for a Windows build

Through trial-and-error I eventually reduced the C files I needed to the list below. So, the .lib file I built is a slightly cut-down build of Cairo but so far it seems to work OK, at least for what I need. You will also need to manually create a header file called cairo-features.h.

Preprocessor definitions

I used the following:

WIN32
_DEBUG
_LIB
CAIRO_WIN32_STATIC_BUILD
CAIRO_HAS_FT_FONT
HAVE_FT_LOAD_SFNT_TABLE

Other libraries you will need (I do recommend using FreeType)


FreeType
Pixman
libpng
ZLib

List of C source files required (for a cut-down build)

cairo-analysis-surface.c
cairo-arc.c
cairo-array.c
cairo-atomic.c
cairo-base64-stream.c
cairo-base85-stream.c
cairo-bentley-ottmann-rectangular.c
cairo-bentley-ottmann-rectilinear.c
cairo-bentley-ottmann.c
cairo-botor-scan-converter.c
cairo-boxes-intersect.c
cairo-boxes.c
cairo-cache.c
cairo-cff-subset.c
cairo-clip-boxes.c
cairo-clip-polygon.c
cairo-clip-region.c
cairo-clip-surface.c
cairo-clip-tor-scan-converter.c
cairo-clip.c
cairo-color.c
cairo-composite-rectangles.c
cairo-compositor.c
cairo-contour.c
cairo-damage.c
cairo-debug.c
cairo-default-context.c
cairo-deflate-stream.c
cairo-device.c
cairo-error.c
cairo-fallback-compositor.c
cairo-fixed.c
cairo-font-face-twin-data.c
cairo-font-face-twin.c
cairo-font-face.c
cairo-font-options.c
cairo-freed-pool.c
cairo-freelist.c
cairo-ft-font.c
cairo-gstate.c
cairo-hash.c
cairo-hull.c
cairo-image-compositor.c
cairo-image-info.c
cairo-image-source.c
cairo-image-surface.c
cairo-lzw.c
cairo-mask-compositor.c
cairo-matrix.c
cairo-mempool.c
cairo-mesh-pattern-rasterizer.c
cairo-misc.c
cairo-mono-scan-converter.c
cairo-mutex.c
cairo-no-compositor.c
cairo-observer.c
cairo-output-stream.c
cairo-paginated-surface.c
cairo-path-bounds.c
cairo-path-fill.c
cairo-path-fixed.c
cairo-path-in-fill.c
cairo-path-stroke-boxes.c
cairo-path-stroke-polygon.c
cairo-path-stroke-traps.c
cairo-path-stroke-tristrip.c
cairo-path-stroke.c
cairo-path.c
cairo-pattern.c
cairo-pdf-operators.c
cairo-pdf-shading.c
cairo-pdf-surface.c
cairo-pen.c
cairo-png.c
cairo-polygon-intersect.c
cairo-polygon-reduce.c
cairo-polygon.c
cairo-ps-surface.c
cairo-raster-source-pattern.c
cairo-recording-surface.c
cairo-rectangle.c
cairo-rectangular-scan-converter.c
cairo-region.c
cairo-rtree.c
cairo-scaled-font-subsets.c
cairo-scaled-font.c
cairo-script-surface.c
cairo-shape-mask-compositor.c
cairo-slope.c
cairo-spans-compositor.c
cairo-spans.c
cairo-spline.c
cairo-stroke-dash.c
cairo-stroke-style.c
cairo-surface-clipper.c
cairo-surface-fallback.c
cairo-surface-observer.c
cairo-surface-offset.c
cairo-surface-snapshot.c
cairo-surface-subsurface.c
cairo-surface-wrapper.c
cairo-surface.c
cairo-svg-surface.c
cairo-time.c
cairo-tor-scan-converter.c
cairo-tor22-scan-converter.c
cairo-toy-font-face.c
cairo-traps-compositor.c
cairo-traps.c
cairo-tristrip.c
cairo-truetype-subset.c
cairo-type1-fallback.c
cairo-type1-glyph-names.c
cairo-type1-subset.c
cairo-type3-glyph-surface.c
cairo-unicode.c
cairo-user-font.c
cairo-version.c
cairo-wideint.c
cairo-win32-debug.c
cairo-win32-device.c
cairo-win32-display-surface.c
cairo-win32-font.c
cairo-win32-gdi-compositor.c
cairo-win32-printing-surface.c
cairo-win32-surface.c
cairo-win32-system.c
cairo.c

Looking inside TeX: strings and pool files

Introduction

In this post we’ll cover TeX’s handing of strings and explain .pool files. Using Web2C to build (Knuthian) TeX from Knuth’s TeX.WEB source code involves many steps as explained elsewhere on this site. One of the initial steps when building TeX is combining Knuth’s master source file (TeX.WEB) with a “change file” (TeX.CH) to produce a modified WEB source file (let’s call it TeXk.WEB) which can be processed via the Web2C process. The TeX.CH change file applies many modifications to the master TeX.WEB source code – i.e., in preparation for conversion to C code and adding support for the kpathsea file-seaching library. After the change file has been applied, the next step is to process our modified TeX.WEB (i.e., TeXk.WEB) via the TANGLE program. If TANGLE successfully parses our TeXk.WEB source code it will output two files (download links are provided for the inquisitive):

  • TeXk.p: the source code of TeX (in Pascal).
  • TeXk.pool: a file containing the string constants defined in TeXk.WEB

Here’s a small fragment of TeXk.pool as produced during my Web2C process:

....
11expandafter
04font
09fontdimen
06halign
05hrule
12ignorespaces
10mathaccent
08mathchar
10mathchoice
08multiply
07noalign
10noboundary
08noexpand
04omit
07penalty
08prevgraf
07radical
04read
05relax
06setbox
03the
06valign
07vcenter
05vrule
09save size
15grouping levels
08curlevel
09retaining
09restoring
05SAVE(
28Incompatible magnification (
02);
36 the previous value will be retained
58I can handle only one magnification ratio per job. So I've
59reverted to the magnification you used earlier on this run.
46Illegal magnification has been changed to 1000
52The magnification ratio must be between 1 and 32768.
...
*413816964

TeXk.pool consists of many lines of the format [string length][string text][end_of_line] and final containing *CHECKSUM, where CHECKSUM in the above example is 413816964. Once upon a time, .pool files had to be preserved as an external file for use when building .fmt files via INITEX but in 2008 this was changed and the .pool file is now compiled into the TeX binaries – I’ll explain this below. For example, the following note is contained in more recent texmf.cnf files:

As of 2008, pool files don't exist any more (the strings are compiled into the binaries), but just in case something expects to find these:
TEXPOOL = .;$TEXMF/web2c
MFPOOL = ${TEXPOOL}
MPPOOL = ${TEXPOOL}

As you can see from the above fragment, the TeXk.pool file contains string constants for TeX’s primitive commands plus all the strings contained in help/error messages that TeX outputs to the terminal and/or log file.

TeX’s internal handling of strings

In addition to the string constants defined in TeXk.pool, TeX will, of course, encounter new strings – for example, when you define new macro names; consequently, TeX needs a way to store the string constants in TeXk.pool and the strings it encounters during its run-time processing of your TeX files. It should not be a surprise that TeX’s internal handling of strings is achieved through methods designed to ensure portability.

From TeX.WEB: The TEX system does nearly all of its own memory allocation, so that it can readily be transported into environments that do not have automatic facilities for strings, garbage collection, etc., and so that it can be in control of what error messages the user receives... Control sequence names and diagnostic messages are variable-length strings of eight-bit characters. Since PASCAL does not have a well-developed string mechanism, TeX does all of its string processing by homegrown methods.

How does TeX use/store strings?

In vanilla C, a simple 8-bit string is an array of characters terminated by the null character ('\0'). TeX does not store is strings as individually named string variables but allocates a single large array and uses integer offsets into that array to identify strings (and calculate lengths). Here’s how it works.

From TeX.WEB: The array |str_pool| contains all of the (eight-bit) ASCII codes in all of the strings, and the array |str_start| contains indices of the starting points of each string. Strings are referred to by integer numbers, so that string number |s| comprises the characters |str_pool[j]| for |str_start[s]<=j<str_start[s+1]|. Additional integer variables |pool_ptr| and |str_ptr| indicate the number of entries used so far in |str_pool| and |str_start|, respectively; locations |str_pool[pool_ptr]| and |str_start[str_ptr]| are ready for the next string to be allocated.

It is worth noting that when TANGLE produces Pascal code (from the WEB source) it strips out all underscores from variables defined in the WEB code. For example, the |str_pool| variable mentioned above is called strpool in the final C code produced from the Pascal.

After processing via Web2C, the WEB variables |str_pool|, |str_start|, |pool_ptr| and |str_ptr| are global variables declared as follows (near the start of TeX.C):


packedASCIIcode * strpool ;
poolpointer * strstart ;
poolpointer poolptr ;
strnumber strptr

The types packedASCIIcode and poolpointer are simply typedefs:


typedef unsigned char packedASCIIcode ;
typedef int integer;
typedef integer poolpointer ;

Stripping away all typedefs introduced by Web2C gives:


unsigned char* strpool ;
int* strstart ;
int poolptr ;
int strptr ;

To see what’s going on, i.e., how TeX identifies a string, let’s first look at the global variable strpool (practically all key variables are declared with global scope in TeX.C…!). During initialization (in INITEX mode, and when TeX is reading/unpacking a .fmt file to initialize a particular format (plain.fmt, latex.fmt etc)) the strpool and strstart variables are initialized as follows:

strpool = xmallocarray (packedASCIIcode , poolsize) ;
strstart = xmallocarray (poolpointer , maxstrings) ;

where xmallocarray is a #define:


/* Allocate an array of a given type. Add 1 to size to account for the fact that Pascal arrays are used from [1..size], unlike C arrays which use [0..size]. */
#define xmallocarray(type,size) ((type*)xmalloc((size+1)*sizeof(type)))

and xmalloc(...) is a small utility function wrapped around the standard C function malloc(...).

A Pascal legacy: In many places within TeX.C you have to account for that fact that Pascal arrays start at index 1 but C arrays start at index 0. This is a consequence that Knuthian TeX is written in Pascal, not C.

The allocation of memory for strpool uses an integer variable called poolsize: the value of poolsize is calculated at run-time from the value of other variables – including some variables whose value can be defined by settings in texmf.cnf. So, in essence:

strpool = (char *) malloc(sizeof(unsigned char)*(poolsize +1));

– which looks very much like one huge C string. And, of course, it is. strpool stores all TeX’s strings BUT within strpool all strings are contiguous (stored end-to-end) without any delimiter characters between them (such as NULL, ('\0'), space, etc). Clearly, there needs to be a mechanism to define where each individual string starts and stops: i.e., to partition strpool into individual strings. That mechanism is the task of the integer array variable called strstart. Perhaps an example will make this clearer.

We can declare a variable myfakestrpool as follows:

unsigned char fakestrpool[]="ThisismyfakeTeXstrpool";

Here, we have concatenated the 6 strings "This", "is", "my", "fake","TeX" and "strpool" into one long string. These 6 strings start at the following offsets in fakestrpool:


string 0 ("This"): offsets 0
string 1 ("is"): offset 4
string 2 ("my"): offset 6
string 3 ("fake"): offset 8
string 4 ("TeX"): offset 12
string 5 ("strpool") offset 15

So, if we define an array of integers, strstart, to record these offsets:

int strstart[6] ; // for 6 strings numbered 0 to 5


strstart[0]=0
strstart[1]=4
strstart[2]=6
strstart[3]=8
strstart[4]=12
strstart[5]=15

Then for some string identified by a number k (where 0 =< k <= 5), strstart[k] gives the offset into fakestrpool where the kth string starts. And this is exactly how TeX identifies strings: it identifies them using some integer value, k, say, where strstart[k] tells you where that string starts (in strpool) and allows the length (length(k), of string number k) to be easily be calculated using

length(k) = strstart[k + 1] - strstart[k]

For example, let us use this method to calculate the length of the string with number 4 (k=4) ("TeX" in our test array fakestrpool).


length(4) = strstart[5] - strstart[4]
length(5) = 15 - 12 = 3

Of course there is one minor complication – calculating the length of string 5, but we have other variables (poolptr and strptr) to solve issues like this.

Back to .pool files

We started this discussion by noting that running the TANGLE program on TeXk.WEB produces two output files:

  • TeXk.p: the source code of TeX (in Pascal).
  • TeXk.pool: a file containing the string constants defined in TeXk.WEB

The next stage in the discussion covers the mechanisms for processing .pool files – introduced in circa 2008. Prior to (circa) 2008, you needed to keep .pool files available (part of the TeX distribution) as separate files for use whenever you ran INITEX to generate a new .fmt file. As noted, the contents of the .pool files are string constants generated by TANGLE from string constants defined in main WEB source code to TeX. Given that those strings they don’t change (they are constants), it makes more sense to build them into the TeX executable file rather than having to access them each time a new .fmt file created by INITEX. Part of the Web2C process now involves using a small utility program called makecpool.exe (on Windows) – makecpool.C was written by Taco Hoekwater. The input to makecpool.exe is the TeXk.pool file and the output is another C file (called texpool.C or similar) which defines a function called loadpoolstrings(...):

int loadpoolstrings (int spare_size)

Downloads

If you just want to see the inputs/outputs you can download the files I produced during my private build of Knuthian TeX:

  • TeXk.pool: The .pool file input for makecpool.exe
  • texpool.C: The C file output by makecpool.exe, defining the function loadpoolstrings(...).

Once you have generated texpool.c you no longer need the original TeXk.pool file because the contents of TeXk.pool are now stored within texpool.C, stored as array of strings:

static const char *poolfilearr[] = {
  "buffer size",
  "pool size",
  "number of strings",
  "" "?" "?" "?",
  "m2d5c2l5x2v5i",
  "End of file on the terminal!",
  "! ",
  "(That makes 100 errors; please try again.)",
  "" "? ",
  "Type <return> to proceed, S to scroll future error messages,",
  "R to run without stopping, Q to run quietly,",
  "I to insert something, ",
...
...
...
NULL };

Of course, when you build TeX you will need to compile TeXk.C and texpool.C so that the function loadpoolstrings(...) is made available. The function loadpoolstrings(...) is called from TeX.C when TeX is in INITEX mode (i.e., the --ini option is set on the command line). Specifically, loadpoolstrings(...) function is called by the function getstringsstarted(...) just after it has initialized the first 256 strings in TeX’s main string container: the strpool array discussed above.

Modifying loadpoolstrings (…) to see what it does

The function loadpoolstrings(…) depends on a few of TeX’s internal global variables and the function makestring() (we’ll discuss that shortly), notably we need to declare the following vaiables as extern to texpool.C:


extern int makestring ( void ) ;
extern unsigned char * strpool;
extern int poolptr;

Here is my slightly modified version of loadpoolstrings(...) which outputs a file called "datadump.txt" to list the strings and corresponding string numbers generated by makestring():

int loadpoolstrings (int spare_size) {
  const char *s;
  int g=0;
  FILE* dumpvals;
  int i=0,j=0;
  dumpvals=fopen("datadump.txt", "wb");

  while ((s = poolfilearr[j++])) {
    int l = strlen (s);
	fprintf(dumpvals, "//string \"%s\" = number ", s);
    i += l;
    if (i>=spare_size) return 0;
    while (l-- > 0) strpool[poolptr++] = *s++;
    g = makestring();
	fprintf(dumpvals, "%ld\n", g);
  }
  fclose(dumpvals);
  return g;
}

datadump.txt

Those who might be interested to see the contents of datadump.txt can download it here. In any case, here’s a listing of the first few lines in datadump.txt:

//string "buffer size" = number 256
//string "pool size" = number 257
//string "number of strings" = number 258
//string "???" = number 259
//string "m2d5c2l5x2v5i" = number 260
//string "End of file on the terminal!" = number 261
//string "! " = number 262
...
...
//string "Using character substitution: " = number 1329

As you can see, the string number of the first string is 256 (i.e., the first string originally contained in TeXk.pool). Assuming that the string numbers start at 0 (they do), TeX has already initialized strings 0..255 before loading the strings from the TeXk.pool file. I hate to do this to you, dear reader, but can you guess what those 256 strings (0..255) might be?

The function makestring()

Here is TeX’s makestring() function which returns a string number after checking for overflows – i.e., TeX has enough space to store another string.

strnumber makestring (void) 
{
  register strnumber Result; makestring_regmem
  if (strptr == maxstrings) 
  overflow (258 , maxstrings - initstrptr) ;
  incr (strptr) ;
  strstart[strptr] = poolptr ;
  Result = strptr - 1 ;
  return Result ;
}

Time to stop

Dear reader, writing this post has absorbed the greater part of my Sunday (14 September 2014) so you’ll forgive me if I call it a day and leave it here – I’ll fix any typos tomorrow :-). I hope it is of use, or interest, to someone “out there”, somewhere.