Regular expressions are part of many programmer’s toolkit but they can be quite fiddly to get right. At the moment, I’m trying to “sanitize” the C code generated for TeX (via Web2C) by post-processing the TeX.c file to make the C source code far more readable. To do that I’m using the original definitions in TeX.WEB to generate C #define
statements that I can use in TeX.c. For example, in TeX.WEB you see the following “WEB macros” related to entries in TeX’s “equivalence table”:
@d eq_level_field(#)==#.hh.b1 @d eq_type_field(#)==#.hh.b0 @d equiv_field(#)==#.hh.rh @d eq_level(#)==eq_level_field(eqtb[#]) {level of definition} @d eq_type(#)==eq_type_field(eqtb[#]) {command code for equivalent} @d equiv(#)==equiv_field(eqtb[#]) {equivalent value}
When WEB expressions using the above macros are processed by TANGLE and Web2C the resulting C code contains many statements that look like the following:
eqtb [curval ].hh.b1 = 1 ; eqtb [curval ].hh.b0 = c ; eqtb [curval ].hh .v.RH = o ;
Not very readable but, of course, it is machine-generated C code so what would you expect. Through regular expressions I’m (slowly/carefully) replacing many raw C statements using #define
s, such as the following:
#define equivalence_level(a) eqtb[a].hh.b1 #define command_code_equivalence(a) eqtb[a].hh.b0 #define set_value_of_equivalent(a) eqtb[a].hh.v.RH
As part of this work, I use two very useful tools for building and testing regular expressions: RegexBuddy and RegexMagic (the tools are compared/explained here). They help you build, test/develop regular expressions and support the syntax and options of many regular expression engines. Once you have a working regex, RegexBuddy and RegexMagic will generate code that allows you to use the regex in a language of your choice (many languages are supported), including C code to use the regex with PCRE – which is my favourite regex library. Again, this is not an advert for these tools, just some notes from someone who has found them to be extremely useful – and have saved me considerable amounts of time in building, testing/using powerful regular expressions with PCRE.
Screenshot: RegexBuddy
Processing INITEX’s primitive(...)
function code with RegexBuddy to extract data for preparing C #define
s.