> Colin,
> You are probably wise not to push their servers too much but I think
> they allow a huge amount before they complain
> (eg people sharing large video files with lots of users).
> I looked inside the Zip to G - looks great!
> Don't worry if the headwords are a little non-standard. A human will be
> looking at these.
>
> I was amazed to see Unicode in the TXT files - how does that work?
> Am I behind the times?
The full set is now uploaded.
The 5 seconds is from general etiquette from scraping I've seen. It's
not like we're in a major rush... I just set the computer running and
let it work in the background.
Afraid you are behind the times a bit!... It's been possible to have
unicode in .txt files for ages. They're just encoded differently (these
ones using UTF-8 which is the most common way of doing Unicode) with the
more complex characters getting two or more bytes.
Colin
No comments:
Post a Comment