Crowdsourcing at Minerva
Our session at the Minerva conference took place this morning from 11.30 – 13.00. Prior to that I sat in on a session about APENET (Archives Portal Europe) and access to the Swedish National Archives’ digitized collection (AKA the Sondera project)…not something I deal with regularly, but as an ex-Medieval studies student I found the post-presentation discussion very interesting. The idea of crowdsourcing the archives’ users to deal with tricky issues such as unintelligible scripts was very novel to me. Anyone who’s ever tried to read a medieval manuscript, and who doesn’t have amazing paleography skills, will know how difficult it is to interpret the text. By providing an option for digital archive users to post their transcripts of the documents this problem can be tackled, using the expertise of someone infinitely better at paleography than well, most of us.
And this wasn’t the only crowd-sourcing thing I came across today. After our session, which discussed some of the problems of using OCR (Optical Character Recognition) on Arabic text, an audience member came up to me and suggested that we might want to look at some of the work of the reCAPTCHA project (”Digitizing books one word at a time”!). This project utilises those verification systems (CAPTCHAs) we have all used at some point, the bit on a website where it asks you to look at a wavvy image of a word and then type it into a textbox. Get it right and you are not designated a spambot and can proceed to register for the charity race or buy that toaster you have always wanted etc. The reCAPTCHA project tries to solve the problem of imperfect OCR in digitisation projects by sending words that cannot be read by computers to the web in the form of CAPTCHAs for humans to decipher. The findings are channeled back to the digitization project. There’s a lot more about the workings of it here. Our audience member suggested it might work for Arabic scripts – I can safely say that no one on our panel had thought of this and its something we could look into…
So yes, the session, despite some inevitable technical problems, went well and we had plenty of questions and interested folk at the end. I’ll be adding the PowerPoint presentation to the IFLA website at some point when I return from this trip, but for now let me say that we heard about projects to digitise Palestinian newspapers at two libraries, the Givat Haviva Peace Library in Israel and the Al Aqsa Mosque library in East Jerusalem. IFLA, through the FAIFE Committee, has been supporting both projects to share experience and expertise to develop best practice in the digitisation of Arabic newspapers. There is still plenty of distance to go before these projects become truly successful, and this will depend on how much more support we can get for the activities – historical Palestinian newspaper collections exist in several archives in Israel and Palestine (they are also found at the Dayan Centre in Tel Aviv, Nablus University and the Jewish National and University Library in Jerusalem) and increased communication and co-operation between these institutions, and those elsewhere dealing with digitisation of Arabic script, would really move the project forward, and perhaps even have an exponential effect on access to information through Israeli and Palestinian libraries and archives. We are hoping that the work between Givat Haviva and Al Aqsa might lead to something bigger.
Thanks today must go to Merav Mack of the Van Leer Institute for all of our panel organisation, Dudu and Samira for coming down from Givat Haviva to give presentations, and Qasem from Al Aqsa for his great presentation also. And to everyone who came up afterwards, lots of new contacts…