Monday, 25 January 2010

Scripts uploaded, future plans

I've now sorted out the Perl scripts I've been working and uploaded
them. Some will be of more use than others of course, but they include
code for parsing the dictionary as it stands (in a plain text format,
which is going to be easier for automated tools - DOC and RTF are both
pretty badly-specified...) and assembling things.

I haven't put the dictionaries under SVN on the grounds the files may
change quite radically in the near future (as advised by Troy). The Perl
scripts could well go in though. I'm not sure how the editting workflow
is going to work out.

In terms of dictionary content, we now have plenty to be getting on
with, and the issue is more linking it up and editting, I think. I'm not
sure what the most useful things to do next are. Options are:

1. Make sure the things that need to be Unicode-safe are. (Fairly
trivial, and definitely needs to be done.)

2. Set up "verse ranges" in the dictionary for efficiency, and also to
disambiguate people of the same name. (Also fairly important, I think,
and not too much work.)

3. Convert Easton headwords to ESV spelling of proper names. I think
this can largely be done automatically. May be a nice-to-do rather than
a priority though.

4. Try and get Hebrew and Greek headwords into articles. This might be
possible from the Wikipedia entries, but I haven't considered what the
practicalities are - we might find it needs a lot of manual work.

5. Try and join up articles in different dictionaries. I suspect this
may be too hard to do automatically, and would be more easily done by
the editor at the point when they're compiling articles.

6. Something else?

7. (Something completely different). I have access to the original
Expositor's Bible set of commentaries (late 19th century) which are in
the process of being digitised and proofed for Project Gutenberg and
CCEL. Would they be useful for STEP? I'm already going to be producing
them in three separate formats (plain text, HTML, CCEL-flavour XML) so a
fourth for STEP would be fairly minimal work. (A working day to write
appropriate conversion scripts, perhaps.) I've put a copy of the Hebrews
volume on the Dropbox so you can see the text.

For reference, I'm now looking at full-time jobs which will cut down
what I'm able to do for STEP. However (on the assumption that they'll be
reasonably paid) that means I'll probably be able to offer what time I
do give for free. I'll keep you posted.

Colin

No comments:

Post a Comment