Monday, 16 November 2009

Re: [Tyndale STEP - Programming] Hebrew transliterration

Tyndale STEP Project wrote:
> <https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2Byn9yl_72O3ZjFJ8J4UBgz65wTHFAf2Mm5EN5Ld-nXGsXUTQU6uYRsfWoquW7BLu-ZMnFkykhVyrYOl1xfI3WzF36Fo76PK2KejGWI97rowaGG3egq9CKmAOJAsIaSPMbU_ECtVS5RDQ/s1600-h/Trans-egs-776642.gif>
>
> <https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCD6eiTpI3KFjmBEeWPAU9ZlSTzbB341yhDdECRhshDvaukoTA5lnYfGJBh2PDvYAr4j_d8hyFUqXs2P_OSI5ceBK4aFBXxNSHtlzLPMqudEe6TprTO0yYfWLop41EZRBkJwzLAASrmznU/s1600-h/Trans-scheme-777866.gif>
>
> Colin, you have a good point about /hesed
> /and an especially good point about not going against a standard.
>
> I'm sorry if this discussion is continuing longer than you wanted,
> but your question has prompted something which I think could be very useful
> for users who want to get to the original language, but don't have Hebrew.
> I think we are on the cusp of coming up with a system which is both
> easy to read in a fairly intuitive way
> and conveys the actual Hebrew to someone who needs it.

This is definitely something sufficiently important we want to spend
time coming up with the best answer, so no problem prolonging it!

Essentially we have four design aims:

1. We want a system which provides a means of pronouncing the Hebrew in
a standard manner as easily as possible, so that the user who knows no
Hebrew can communicate with other people, talk about "hesed" and be
understood.

2. We want a system which enables the user to look up Hebrew terms in
lexicons and other reference resources. This includes resources other
than ours.

3. We want to serve a varied user base, from those who are totally
ignorant of Hebrew, through those who know a bit (can read the Hebrew
alphabet) to those who can read Hebrew.

4. We want a transliteration scheme which can be generated automatically
(or with relatively little manual intervention) from Hebrew unicode or
equivalent. I don't think this rules out anything we've considered, so
just including for completeness.

The first two to some extent conflict - for aim 2 we need to include
redundant information not in the pronunciation to distinguish homophones
like sin and samek, so we need to compromise. (Unless of course we
provide two separate transliterations - we could, for instance, provide
a mouseover over a Hebrew or Greek word which says "pronounced [however]".)

Which users we're intending to focus on more in 3 also affects things.
I'd argue that we certainly need to cater for the "ignorant" end. The
experts can take care of themselves, and will probably want to use the
genuine article anyway.

I think the choice for our main transliteration comes down to using one
or other of the scholarly approaches, or one we grow ourselves. Let's
take consonants and vowels separately.

For consonants, there seems to be a relatively standard approach between
scholarly systems, with each letter mapped the same. The only common
variations are whether waw is v or w, and whether bgdkpt letters are
distinguished, which is done by underlining the non-dagesh version.

The differences between your system and the IVP one which I'll use as a
representative of the standards are primarily:
* you've generally used underline where IVP would use a dot (eg in chet).
* in yours, sin gets s, shin gets sh, samekh s with underline, whereas
IVP gives samekh s and sin and shin are differently accented s.
* tsade becomes a z rather than an s.
* I think you're intending to distinguish bet-with-dagesh at least by
doubling it if it's not at the start of a word. (IVP doesn't distinguish
bgdkpt forms.)

In terms of pronunciation, your system I think wins, but not by a lot
(there isn't a lot of mileage to be gained over the IVP system).

When the user looks something up, they can do it in one of three ways:

1. Clicking on a word and being taken straight to the resource.
2. Typing the word into a search box. (They've read about hesed
somewhere before and want to know more about it now.)
3. Scrolling through a dictionary (or reading a book).

I'm assuming our resources will be in Hebrew word order, regardless of
how they're indexed internally. External resources will be.

For 1, the transliteration is irrelevant.
For 2, we need to consider what they're going to type in. Do we expect
them to type in the whole word (including vowels with English on)? Or
just the consonants? (There are wider questions here.)
For 3, they're going to need to know the order of the Hebrew alphabet,
at least approximately.

If we're expecting that they're going to have to, or will benefit from,
having an idea of the Hebrew alphabet order, then it's a reasonable
expectation that as they learn it, they'll also pick up the
pronunciations. This would lessen the advantages of an easier-to-read
prounciation scheme.

The reason for raising 2 is that if we're going for consonant-only
schemes, then rendering shin as "sh" may cause us difficulties
distinguishing it from sin + het. (And if we're happy with shin as "sh",
why not tsade as "ts" or "tz", and chet as "ch" or "kh"?)

The other issue is to what extent people will be confused when, having
got used to, for instance sin as "s", they come across other resources
in which s is pretty consistently samekh. (Or _t_ which is tet in our
system, but a soft tav in others.) Do other advantages outweigh this
problem?


Onto the vowels:

Here there's a little less standardisation, although in general the
differences reflect whether pairs of vowels are distinguished by accents
or not, and if they are, then the accents used seem to be relatively
consistent. The more substantive disagreements seem to be:

* how are vocal schwa and the schwa-plus vowels represented?
* are combinations like tsere + yod represented using a y or an accented e?

Here I think the issues are rather less clear cut. The user is going to
have to make a little effort in working out how to pronounce the vowels
in any case, since there's no one way of doing it in English. There will
be less confusion with standards, since there are no clear standards.
And the vowels are less important in looking things up, so the whole
issue becomes less important.

The things we need to watch out for are in typing things in and looking
things up:

* Is there a danger of confusion between "vowel" yod and "consonant"
yod, both for the user and for our system when it tries to work out what
word the user wants?
* Typing in accented letters is in general a pain, so it would be good
to reduce that.
* Is the fact that a vowel is represented by a waw or yod a sufficiently
important thing to denote to mark separately?


So, my thinking currently is:

1. This "search" issue seems to me to be a fairly important factor we
haven't previously considered and needs to be resolved first. It seems
to favour a less complicated vowel system, and possibly a consonant
system which involves less accented letters, provided we can distinguish
those that map to a double letter from two single letters.

2. The consonant system as proposed looks sufficiently close to the
existing standards to be confusing when people use other resources, and
I'm not convinced the gains are that substantial. I'd favour either
sticking to something consistent with them, or moving further away (if
possible).

3. Vowels I'm more ambivalent about, except that I remain to be
convinced that marking the waw/yod vowels is worthwhile, particularly if
we're going to for a simpler system in other regards. I can't see many
people who will care about the distinction and not be able to read the
Hebrew and see it for themselves, to be honest.

I wonder if we could do some user tests on people who are theologically
aware and know variable amounts of Hebrew, and see what they think?

Colin

No comments:

Post a Comment