Blog
Press Release: December 7, 2011
The importance of unicode
This week’s guest poster is LBT Missionary Paul Federwitz, who serves as an Information Technology Consultant and Trainer at the Ghana Institute of Linguistics, Literacy, and Bible Translation.
I wrote some time back about what Unicode is and why it is important. I just ran into a situation this week that I think illustrates the point.
A New Testament translation was published in 1995 and now the group has started working on the Old Testament. They have the printed copy of the New Testament to reference, but for some reason did not have the original electronic copy. This week it was found and brought to me because they could not read it. Here is what it looked like (Matthew 1:1):
Yesu Kirisito gewo-tama giidu waan¥ ye tiitu Abiraam be sere egyu Dafidi awi yaan¥ w¨ lœ.
You may not notice any problems yet because it all looks like Greek to you. The problem is that every time there is a character that is not in the English alphabet it displays a weird symbol, not the special character that it should.
The reason this is happening is because a special font was created to display these characters. We no longer have access to that font so we get what you see above whenever we open the file. Even if we could find the font, any computer that did not have that font installed would see what you are seeing above.
I had a printed copy of the New Testament in front of me so I knew what each of those characters were supposed to look like. For example, I knew that every time I saw ‘œ’ it was supposed to be ‘ɛ’. That seems simple enough, but it meant that for every special character in the language, I needed to find a verse in the Bible that had that character (I found out later that in two cases the first time the character showed up was in Revelations…I would have been doing a lot of reading). The language had all of these special characters (Note: if you are on an older computer you may still not be seeing all of the characters below correctly):
ɛɛ̀ƐƐ̀, ɔɔ̀ƆƆ̀, ŋŊ, ɩƖɩ̀Ɩ̀,ʋƲʋ̀Ʋ̀, àÀ, èÈ, ìÌ, òÒ, and ùÙ
I started looking through the computer archives to see if I could find the original font somewhere. I could plug this into a font mapping program and create a conversion file from that font to Unicode. That is by no means an automated task, but it would have made life simpler and the conversion more accurate. I did not find the font, but I found the second best thing to it…the typesetting files which had the special characters in a way that I could more easily do a find and replace. They used slashes to mark a special character. For example, it would use /e for ɛ. For the tone mark it would be `o for ò and `/e for ɛ̀. The same verse from Matthew above looked like this:
Yesu Kirisito gewo-tama giidu waan/i ye tiitu Abiraam be sere egyu Dafidi awi yaan/i w/u l/e.
This was still not very readable by the human eye, but it was much easier to convert. I did not have to find a verse where a special character was used in order to figure out what it showed up as because I knew the system. I was able to do a lot of find and replaces across the entire New Testament and came out with the verse the way that it is supposed to look:
Yesu Kirisito gewo-tama giidu waanɩ ye tiitu Abiraam be sere egyu Dafidi awi yaanɩ wʋ lɛ.
Now that it is in Unicode I no longer need to find the correct font in order to read it. I can send this file to almost any computer in the world and it would be readable.
For more information about Paul and Ali Federwitz, visit their page on the LBT web site. Or, you can visit their personal blog.
Leave a Reply
You must be logged in to post a comment.