Wednesday, 25 November 2009

Dictionary: scale of task

Just done some statistics on what we have out of Easton, to give a feel
for the amount of work required.

There are 5079 articles in total. (This is having split articles on two
or more people with the same name into separate articles.)

Of these, 2105 are (human) people, 991 are geographical (locations or
regions), 692 are objects, 474 are "non-transliterated words" or "words
used in odd senses in the KJV", 370 are theological concepts, 144 are
groups of people, and the rest total 303.

2980 are 50 words or under (so the entire article can serve as the
summary), 2099 are over. 105 articles are 500 words or over, and 23
exceed 1000 words (so would need editting down, but most of the long
articles will need attention for content anyway). The longest article is
on David at 3330 words, closely followed by Paul at 3310.


In terms of linking the Bible up, I did a preliminary run to try and
find all proper names (in the KJV) which don't correspond to a headword
in Easton (so would need manual attention), by extracting words with a
capital letter not at the start of a sentence. Disregarding "I", "Lord"
and a few others, there's still about 12000 (i.e., one per 3 verses, but
it looks like the majority are the beginning of speech (which the KJV
marks off with a comma but no speech marks so can't be distinguished
automatically ). Probably not so bad to patch up.

No comments:

Post a Comment