Thursday, 19 November 2009

Re: [Tyndale STEP - Programming] Re: Hebrew transliteration

Colin, I think you may have solved all the issues - great!

And I hang my head in shame about the composite shewa.
If you ever want to put me in my place, you can remind me about that.

The big problem I had was the sin/shin one, and your  ss/sh solution is perfect.
My main problem with the 'standard' systems is that they simply can't be typed.
 
All word processors can underline but have you tried putting a dot under narrow and wide letters in Word?
(I have a macro for it, but it isn't perfect!).
Displaying dots under letters on our webpages won't be easy either. There aren't standard glyphs available.

I like the idea of a consistent rule "underline instead of underdot",
and with the sin/shin problem solved, there is no reason we shouldn't follow that.
I'm also sympathetic to using dots if we can figure out how to do it.

BTW I didn't mean to imply dropping the distinction between aleph and ayin.
The 'ir example was a typo.

A lot of systems allow the use of "(" or ")" instead of curly single quotes.
I'm not sure we can allow the dropping of these, or the amalgamation into '
because it will make lexical lookups very difficult.

So, the alphabet would be:

        )-`  b-v  g  d  h  w  z  h  t  y-î  k  l  m  n  s  (-´  p  s  q  r  ss/sh  t/th  

The display alphabet could use curly single quotes and and we could allow
users to type ) and ( instead.

But I'm not sure about the vowels, especially e.
I'd like to keep simple "e" for shewa, so the vowels would be:

        a  â  e  ?  ê  êy  i  î  o  ô  u  û 

I take your point about ë for tsere - it causes unnecessary confusion
though I'm not sure what to do about segol.
I'd like to use "e" for shewa for simplicity of transliteration, so we'd need a different accent for segol.
Suggestions?
Also, any suggestions for qames hatuph?

I didn't know about diaresis for non-diphong (eg hammaïm).
Do you know if this works for every case? - ie does this problem only happen with "i"?


I'm also uncertain about how to tell people to type in a word.
I don't want them to bother with the accents on vowels, cos they are so hard to type,
but I do want them to mark the presence of letters used as vowels.
(I'm not sure why you think this is unimportant - how could they look up an entry with some letters missing?
That's why I gave the example of 'city' - the search algorithm needs to know that the "i" shouldn't be stripped out as a vowel)
If we used underlining of vowels which are letters, they could follow a simple general rule and type them as upper case.

So I would prefer for the vowels:


        a  â  e  ?  ê  êy  i  î  o  ô  u  û

or

        a  â  e  ?  ê  êy  i  iou

so that they would read hesed and ´ir
and they could type in Hesed and (Ir
or they could use h_esed or h.esed (better, esp if we can figure out how to display dots under letters)

I think the only important issue of disagreement left is:
4.* Tagging long o and long u because they happen to be represented by
waws is pointless - anyone who knows any Hebrew will know this, anyone
who doesn't won't care, and it doesn't help us in searches (see above).

You'll have to explain this one to me.
How do we look up a word if we don't know whether there are missing yods or vavs?
Clearly there are differences in plene spelling (esp at Qumran)
but it is easier to get rid of surplus yods and vavs than to have to worry that there might be a letter missing.
And someone who knows Hebrew well could guess the absence of a yod or vav.
But how do we get a computer to look up an identical form accurately?
We could probably get it to do a dictionary lookup without too many errors, but not a lookup of the actual form.

I suspect that there is something simple which I have misunderstood.

I haven't got to your post about qamats  yet, so perhaps the answer is there.


David IB


At 12:15 18/11/2009, Tyndale STEP Project wrote:
Tyndale STEP Project wrote:
[a fair bit which I'm entirely happy with has been excised. Will address
the qamats issue in a separate post]

> Typing is the main reason I have avoided dots under letters and complex
> accents over letters like S.
> I have aimed at something which uses only the extended ASCII character set.

Which extended character set are you looking at? I haven't found one
which contains underlined consonants - so at least for display and
typing there seems to be no difference between underline and underdot.
(Agreed for typing in that going for capitals makes sense.) Underline
has one advantage in that it's easier to see. However it has one
disadvantage in that other sources use it for bgdkpt differences, but
that's livable with.

> You asked:
>> * how are vocal schwa and the schwa-plus vowels represented?
>
> I don't think there is any need to distinguish because the the
> shewa+vowel is simply the way that the vowel is written under letters
> like Aleph
> - ie every time a vowel occurs under an Aleph etc, a shewa is added. So
> we only need to record the vowel itself.

Unless I've misunderstood what you're saying, this isn't true. (Look at
"elohim" and "erets" in Gen 1:10 for instance - both with segols on
alephs, one with a shewa, one not.) However, I think it's a further
distinction in the transliteration we can disregard.

> I don't share your concern that people are going to confuse this system
> with others,
> because other system don't use underlining and they have lots of funny
> accents.
>
> You make a good point about samek being normally "s".
> Do you think it would be better if /sin/ were _s_ and samek s ?
>
> I'm not sure about confusion with _t_. I think people will quickly get
> used to the idea
> that letters which normally have a dot under them in other systems tend
> to be underlined in this one.

Your first and third paragraphs can be paraphrased as:
* people are going to regard our system as completely different from the
standard ones.
* people are going to regard our system as the same as the standard ones
but with underlines replacing dots.

Which do you want? ;)

I don't know why I'm failing to convince you that when, as far as I'm
aware, _every single book in English_ which uses a scholarly-type
transliteration maps the consonants in almost the same way, doing
something which is subtly different (for a marginal advantage) is going
to confuse and frustrate people the minute they encounter other books.
I'll have one more go and then shut up.

1. Some changes between systems are fine. So if we _consistently_ use
underlines where other books use dots, that's an easy rule to remember.
(Particularly as they'll have to remember underline->capital for typing
things in.) So it's not a major problem that waw is w in some sources, v
in others, because they can mentally say "v and w" are the same.

2. Similarly, if we drop '` for aleph/ayin consistently, and use ' for a
vocal break instead, that's fine. (We'd presumably drop the ' when doing
the search, so people could type in ...aim words without the '.)

3. When we're remapping individual letters, however, this is going to
throw people. We expect (and hope) that people using our system will
internalise the transliteration and probably learn key words as well.
However, if they have learnt _h_e_s_ed from our system, and then
encounter h.esed in a book, they're going to do a double-take and think
"is this the same word or not" because their natural assumption is to
expect h.es.ed. Things are going to be worse coming the other way.
People are going to read a Hebrew transliterated word in a book, and
want to know what STEP says about it, so type the word into our browser
as Hesed (to type into STEP "convert dots -> capitals and ignore all
accents on vowels") and get back "word not found". And I would expect
even experts to make this mistake regularly. To be able to use STEP
alongside other resources, you're requiring people to carry around two
different transliteration schemes and some fairly subtle differences to
map between the two, and this is lot of mental baggage that could be
more usefully employed in thinking about the theological point at issue!

4. It looks like accessing STEP resources by type-in is our worst
use-case, so let's consider what the major differences between
transliteration and type-in are:

a. dots (in standard)/underlines (in our system) -> capitals. Ignore
bgdkpt underlines in standard system for the (rare) sources that have
them. Nice and consistent. No problem.

b. ignore accents on vowels. Ditto. This gets rid of most of the
differences between transliteration schemes.

c. different use of '`. In general this isn't a problem, because we can
just tell people not to type in the '` (or ignore it if they do). The
only disadvantage is we lose the distinction between words that are
identical except for having an aleph or ayin. I don't think there are
too many of these though.

d. yod and waw as vowels.

e. different consonants and the multiple S problem. (In the standard
system, sin and shin are the only consonants written with an accent
other than dot.)

On d: we're proposing to render (eg) tsere-yod as ëy. Other systems vary
in whether they render these as "ey" or "e", with varied accents. It
would be helpful if people could type in the "e" form and get the right
entry back. This should be easy to do.

On e: if we were to adopt the standard system, we would need to find a
means of entering sin and shin. (s is samekh, S is tsade), and so we
need to make a change here anyway. One common option seems to be {,},
however "sh" for shin and perhaps "ss" for sin would work better, and we
could reasonably use these for the transliterations too. (Although see
below on how lookup is going to work - this may scupper this idea.)

This is the minimal change and just requires people to remember the
correspodence between ss and sh and the s's with the various accents -
it also doesn't contradict the basic dot/underline->capital rule.

> You ask about bothering with yod and vav when used as a vowel.
> I think we need to mark these for looking up as a concordance and lexicon.
> If they double-click on '_i_r (ie 'city') we need the software to look
> up ayin-yod-resh,
> so even if the user doesn't care whether the "i" is underlined, the
> software needs this information.

Where's this going to come from if the user types in the word "ir"? We
won't have the underlining information, or the ayin (it could have been
aleph). We're going to need a transliteration -> Hebrew table for these
entries, and probably also for words involving shin, so we know that sh
is shin and not sin (or samekh) + het.

Also, can I confirm we're requiring people to type in the vowels, and
won't permit "Hsd" for hesed? (If we permit consonant-only systems, we
have to be more careful with the transliteration to prevent ambiguities.)


So to summarise and try and focus future discussions, if given a free
rein, this is what I'd do, together with my opinion on proposed
differences. My starting point is the usual standards, and I'm trying to
make changes only where they're either needed or useful.

The ones I've marked with * are points which, to be honest, I now feel
sufficiently strongly about that I wouldn't bother trying to convince me
otherwise, and we'll just have to agree to disagree. I can work with
whatever you decide, however much I like it or not.

Consonants: start with the standard system
1. waw not standardised: w preferable, but v would be OK.
2. would probably prefer to mark bgdkpt differences (as some do) to aid
pronunciation, but not strongly.
3. underline instead of dot for display: seems a largely pointless
change, unless there are advantage for typing that I haven't seen.
4. dot/underline -> capital for typing in. Necessary change.
5. sin/shin -> "ss"/"sh" for typing in. Have to do something, and this
seems best.
6. sin/shin -> "ss"/"sh" in transliteration. Given 5 this seems a good
idea. Would be fine to stick with the standard accents though.
7. drop representation of aleph/ayin unless necessary for pronunciation.
Not sure if this is being proposed or not. Would prefer to keep them in
display, to link with what's in the Hebrew, but drop them for search.
Could live with losing them.
8.* Any other consonant changes. Don't gain us anything and just add
confusion. (So stick with samekh = s, tsade = _s_.)

Vowels: no one standard system to compare against.
1. I'd prefer long-short vowels to be marked (more information and easy
for the user to filter out) but no problem if we decide to drop this
information.
2.* However we mark vowels, it needs to be consistent, so if we're
distinguishing long/short vowels, use the standard circumflex for long
vowels, rather than the current mix (a circumflex is long, e and o
circumflex are short!) I'm not convinced that "visual link" to Hebrew is
a good reason for these differences, particularly as it's not a strong
link and above the letter.
3. yod as vowels. Adding y for yods seems reasonable and generally
reflects pronunciation. Would prefer to do it in all cases, but i
circumflex a reasonable alternative for i plus yod.
4.* Tagging long o and long u because they happen to be represented by
waws is pointless - anyone who knows any Hebrew will know this, anyone
who doesn't won't care, and it doesn't help us in searches (see above).
5.* Similarly ë for tsere. Pointless and an unnecessary difference from
(I think) everyone else.
6. Effectively dropping schwa from schwa+ vowels. Fine, and no obvious
sensible way to represent them otherwise.
7. Vocal schwah. Can be either e or '. Not sure what's being proposed.
Either would be fine - e is more standard, ' reflects pronunciation
better, so undecided.
8. ' for representing "non-dipthong" in eg hammaim ending. Using a
diaresis (hammaïm) is standard orthography (in linguistics generally,
including in NT Greek) so would recommend going for that - prevents
confusion with aleph/ayin as well. (I think we only need to do this with
a short i, correct? If we need to do it before a long vowel, obviously
this doesn't work so back to ').

--
Posted By Tyndale STEP Project to Tyndale STEP - Programming on 11/18/2009 04:15:00 AM

No comments:

Post a Comment