User:Benwbrum/Cuneiform Perl Scripts

Goals

edit

I'm trying to develop some perl or sed scripts for processing transliterated cuneiform, as found on the Old Hittite and Codex Hammurabi articles.

The constraints are as follows:

  1. Convert input Wikisource to output Wikisource
  2. Render all output as 7-bit ASCII, with special characters HTML-encoded.

The goals are as follows:

  1. Convert bad source encodings such as "0xab" to good source encodings like «
  2. Convert ASCII encodings like $ to standard ANE representation like š (š)
  3. Convert 2 and 3 signs to the accented forms (e.g. u3 becomes ù)
  4. Add subscripts to other numbered signs (e.g. ma4 becomes ma4)
  5. Add superscripts to determinatives (e.g. DINGIR becomes DINGIR or d


Tests

edit

o User:Benwbrum/Cuneiform Perl Scripts/Hittite Test 1

Scripts

edit

o User:Benwbrum/Cuneiform Perl Scripts/Hittite Cleaning Script o User:Benwbrum/Cuneiform Perl Scripts/Akkadian Consonant Script

Problems

edit