Wednesday, 13 January 2010

Wiki Bible Dictionary

Colin,
You are probably wise not to push their servers too much but I think they allow a huge amount before they complain
(eg people sharing large video files with lots of users).
I looked inside the Zip to G - looks great!
Don't worry if the headwords are a little non-standard. A human will be looking at these.

I was amazed to see Unicode in the TXT files - how does that work?
Am I behind the times?

David IB

David,

Yes, Bible places is in there too (as well as a lot of stuff we don't
want - went for an inclusivist approach on the first pass).

I've uploaded what I have to the dropbox (A-G - rest of the alphabet
downloading at one page per 5 seconds to be nice to their servers) so
you can have a look. The format is unicode text which is what it arrived
in, but could be converted to something else easily, as could discarding
the excess Wiki formatting.

Linking with headwords can obviously be done if the headwords are
identically the same, but gets harder for different spellings (soundex
may work but will give false positives). This goes for the various
different PD dictionaries as well as Wikipedia of course.

Colin

--
Posted By Tyndale STEP Project to Tyndale STEP - Programming on 1/13/2010 05:16:00 AM

No comments:

Post a Comment