Tyndale STEP - Programming: Re: [Tyndale STEP - History] Bible dictionary format

Tyndale STEP Project wrote:
> Colin, I think Dropbox works better than GoogleDocs.
> It is probably best to regard our use of GoogleDocs as a failed experiment.

Fair enough. This is what I thought, but wanted to check. Will transfer
it across.

> I like the idea of a default display of a Bible Reference, Geographical
> location, and/or timeline link for display.
> I don't think we need more than one, because people can click on refs or
> names or dates in the article itself.
> When there are many and no obvious default one, it is still nice to have
> one - so pick one arbitrarily.
> Where there aren't any, it might look silly to add something. We don't
> want "Aaron" appearing as a default person on every article where no
> people are referred to. So it is probably best to leave it blank and we
> can think about what to do in that situation - we might put the start of
> an alphabetic list, or a zoomed out timeline.

Good, that's what I thought. Just for clarity:
- each entry will have one pointer to the Bible, unless none make sense.
- each entry will have one pointer to the timeline, unless none makes sense.
- each entry will have one pointer to the geographical location, unless
none makes sense.

On the third, we have all the minor characters in the genealogies, lists
of priests, etc. I guess these go to Israel unless there's a reason to
associate them with a particular place. (The Nehemiah lists could all
point to Jerusalem, I guess - this would be easy to do automatically.)

There's no intention to have default displays of people in articles,
since that would be a link to another article in the dictionary (the
other three are external resources).

This also raises an interface question. I'm assuming things would work
as follows:
- If the user clicks on a Bible reference in the article, the
appropriate text appears but we stay in the same article.
- If the user clicks on a link which is the name of a person, we go to
that article. If they put the mouse over that link, we display the
summary of that article.
- So far so good. What about links which are the name of a place, eg a
reference to Bethel? They will have an article attached, so we want the
user to be able to get to the article. But we would also like them to be
able to use the geographical information - for instance one natural
semantic would be that putting the mouse over the name could highlight
the location on the map. However this conflicts with the mouseover
bringing up the summary. We could probably resolve this by highlighting
links to places differently from links to everything else. This
information would be available. (Note that I'm making the assumption
that everything in the geographical database will have an associated
article in the bible database - this is virtually true already, since
Easton has articles on almost all placenames in the Bible.)

This issue doesn't affect my work in any way, since the information
needed to make these decisions is already in the table.

> Does each name have a unique ID? - ie when a person clicks on "Saul"
> could the link search for "Saul01" or "Saul02" depending on the article?

Absolutely. I've gone for "Saul (1)" and "Saul (2)" as the headwords
which works for both search and display - it's consistent over all
articles and ([number]) isn't used for anything else so it's easy to
handle. (I don't think parentheses appear in any other headword as it
happens.)

It's easy to get the vast majority of such links done automatically: if,
for instance the article in Easton for Zechariah (6) mentions a verse,
then link the occurrence of Zechariah in that verse to Zechariah (6). If
no such link occurs then link to Zechariah (1). Plus a few one-off rules
like "Saul in Acts is always Saul (2)". They'll need checking, but
most will be right.

Fortunately, in almost all these cases, nobody apart from the most
significant appears in more than one verse or a short range which is
already listed in Easton. The only case I can see which is likely to be
a pain is distinguishing references to Israel which refer to the whole
land from those which refer to to the northern kingdom. (Some are
ambiguous anyway.)

> I like the idea of ranges, but I think the best way to store them is
> expanded into individual verses.
> For retrieval purposes, it would be best to have a database of every
> verse containing every name which occurs in it.
> This would look and act like a Bible version, so we can use the same
> search tools which already exist for searching Bibles.
> So every verse in Gen.1.26--5.5 would have the name "Adam01" in it,
> and every verse in Gen.1.27-4.25 would have the name "Eve01" in it,
> and each verse in Josh.3.14-17 would have "Adam02" in it
> OR (and this is easier) every verse which contains the name "Adam" would
> contain "Adam01" or "Adam02"

> The way this would be used:
> When someone is looking at a verse, the names of people and places would
> appear as a list or dropdown automatically.
> If we employ ranges then these names would include those who are not
> actually named in the verse but who are 'active' at the time.
>
> But I have no idea how you would construct these ranges!

If I supply a database with ranges, this can all be generated easily.
Start with "Adam" "Gen1.26-5.5" "Adam (1)", run through your text of
those verses, tag any which include the word "Adam" with "Adam (1)"

If you only want to do this tagging to distinguish people, we could be
cruder with our ranges - tag Adam (1) for the whole of Genesis, Adam (2)
for the whole of Joshua for instance.

If we want links for active people, then this will require some
hand-editting, but the number of people who feature for long enough
(more than one verse) is not actually that large, and the articles about
such people are the ones that will need attention anyway, so providing
the correct range won't be extra effort.

We can generate approximate ranges by doing something like the following
(making the assumption that we've correctly disambiguated people/places
with the same name):

- if we only have references in one chapter, create a range which
encompasses all of them. (So: if someone appears in 1 Kings 6, verses 5,
12, 13, 14 and 26, mark them as 1 Kings 6:5-26.)
- if we have references in two or more adjacent chapters, mark them for
the whole of those chapters. (So for the above plus two references in 1
Kings 7 and six in 1 Kings 8, put them down for 1 Kings 6-8.)

This is a reasonable starting point, and means that we won't miss any
when we come to hand-edit.

Active places might be harder, since there's a lot of problems as to
where things took place. Just like we had to make some arbitrary choices
with timing, there are plenty of events where we aren't told explicitly
in the text, but there are enough clues for scholars to express an
opinion, with whatever degree of certainty.

> What kind of editing do you mean? I have someone who is happy to do
> editing which doesn't involve programming.

From the automatic process we're going to end up with a copy of Easton
which is tagged with links to the Bible and cross-references to other
articles, plus information to tag the Bible with references into Easton.
This is being done by searching the Bible text, extracting all words
which match an Easton headword and tagging them. In the same process I'm
compiling a list of KJV->ESV spelling differences and converting over,
so our dictionary defaults to modern spelling.

To get to a full "perfect" bible dictionary, we are going to need to do
the following. This isn't a suggested order since the tasks overlap, and
we can get to something usable without doing all of them.

1. Go through the Bible, check the references. Add references which this
process has missed (multiple word entries like "Mount Ephraim",
singular-plural-inflected cases like "Egyptians" to "Egypt", spelling
differences missed). Confirm the right person is being talked about
where it's ambiguous.

2. For the articles on people/places/groups:

a. Same process as 1 for cross-references.
b. Check/edit the default bible-reference, timeline-reference and
geographical-reference.
c. Check/edit the active-people (etc) ranges

(b and c are going to be trivial for most articles since the people
appear in only one verse)

3. For major articles (>50 words):
a. Review text. (As well as Easton, we have text for three other Bible
dictionaries, plus there's Wikipedia, but that largely draws on these
four sources, so we can either pick the best of those, or produce a
composite. I also have scans of the Hastings DB, which could be OCRed,
but the output is at the "just about comprehensible" stage only from the
trials I've done. However, it provides a lot more information so could
be used as a reference source if we're doing a composite.)
b. Write a <=50 word summary.
c. There are some articles we probably want to write from scratch,
notably "historical survey" articles, like "Paul's second missionary
journey" which doesn't occur in any of the sources we have, I think. (In
general not difficult - this one would cover Paul wanting to go back and
see churches, one or two major events, places he went and a link to a map.)

4. For articles on theological concepts:
a. Most of these are going to need a rewrite, and this obviously needs
appropriate theological knowledge.
b. "Commentary" articles (survey article on "The Gospel of Mark" for
instance) go in this category if we're intending to keep them rather
than relying on the actual commentaries we're supplying. I assume we aren't.

5. For "object" articles:
a. Some are fine as they stand.
b. Some have been superceded by modern archaeology (so we'll need to
compare equivalent articles in, say, an IVP dictionary, just to see
whether this is so.)
c. Some depend on translations (eg. the articles on animal/bird/plant
types typically say "the text uses [this term] but in reality it's more
one of [these]." Here the text is the KJV or RSV and modern translations
often do something different, so these articles need review depending on
what our "reference translations" say. We're not going to want to cite,
in all cases, which translations use the particular word or not, so we
need some formulation.

7. Language and other standardisations:
a. For (eg) a Hebrew name, our sources either render it in
transliteration, in Hebrew (with points or not), or don't render it at
all. We'd want a standard way of doing this. (Probably full Hebrew which
we can then autotransliterate.)
b. Other "style guide" things for consistency.
c. Spell-checking/proofreading/general review.

The easiest way to proceed is to do 1 and 2 (for the short articles) in
parallel in Bible order and that's what I was intending to do next
(automatic extraction is virtually done.) I'm writing a mini-GUI (in
Perl/Tk, since it's something I know already - could do the same in
Javascript but I don't know enough of it) to do this - effectively it
just presents things in the right order, enables you to see what bible
reference/article is tagged by clicking on it, and edit if need be. It
also provides an easy way to type and edit Hebrew and Greek (which I've
stolen from another program). Crude, a few hours to get working, but
will save a lot of pain later.

3/4/5 can be done article-by-article, and that seems like the best thing
to give our editor to do. (I don't know how they are theologically, so
they might want to avoid some of the articles in 4.) I can supply them
with a list of articles, the text from our various sources and the
relevant page scans from Hastings (if that's useful) and they can then
edit them in whatever text/WP package they choose, return them in .txt
format and we can reparse and put back in the DB when they're done.

All make sense?

Colin

Tyndale STEP - Programming

Wednesday, 25 November 2009

Re: [Tyndale STEP - History] Bible dictionary format

No comments:

Post a Comment

Documents:

Blog Archive

About Me