Workshop Using Large-Scale Text Collections for Research: Status and Needs

Date: Wednesday 21 November 2012 – Place: Huygens ING, The Hague, The Netherlands
This is the first workshop of the working group Using Large-Scale Text Collections for Research of the Network for Digital Methods in the Arts and Humanities (NeDiMAH). It is organized to be a pre-conference workshop to the 9th conference of the European Society for Textual Scholarship, Editing Fundamentals: Historical and Literary Paradigms in Source Editing, 22-24 November 2012, Amsterdam, The Netherlands.
ICT tools and methods, such as information retrieval and extraction methods including text and data mining for example, can reveal new knowledge from large amounts of textual data, extracting hidden patterns, analysing the results and summarising them in a useful format. The NeDiMAH working group ‘Using Large-Scale Text Collections’ (WP 5) will examine practices in this area, building on the work of corpus linguistics and related disciplines to develop a greater understanding of how large-scale text collections can be used for research.
The first workshop of the group takes place at the 21st of November 2012. The meeting will be used to inventorize the availability of text corpora for researchers from different disciplines in the participating countries and languages. How large are the available corpora? For what purposes were they created? What kinds of mark-up do they contain? And which tools are available to help mining the corpora? What is missing in both texts and tools to make the corpus also useful for other research disciplines than the one it was originally created for?
The first part of the day-long workshop will be used for an introductory paper by the group leader, followed by short papers of the participants sketching the situation in their country and language(s) and the needs of their own specific research discipline. The rest of the day will be dedicated to discussions about the topics addressed during the first parts: what are the shared positive points in the different countries/languages/disciplines? Is there an overlap in the different needs that were expressed? What can we learn from each other? Where can we push the developments further through a shared approach? At the end of the day, the participants will have an overview of the current status. The needs that were addressed will be used by NeDiMAH to decide on the topics of the next workshops and/or seminars to be organized by this working group.
Ten participants will be reimbursed for their travel and subsistance to a maximum of € 700 per person. These will be selected based on abstracts to be submitted before 10 September 2012. Decisions will be mailed at the end of September 2012. Abstracts of 300-500 words are expected to describe which country or area the applicant will deal with, listing the languages and time periods in question as well as the research discipline(s) covered. Some preliminary remarks about concrete plans for the future development of a large corpus or about specific needs already identified are welcomed. Abstracts can be sent to
Based on the submissions, the Steering Committee of NeDiMAH will select a diverse group of participants for reimbursement, making sure the program of the workshop will cover an optimum of different countries, languages and research disciplines. Applicants whose submissions will not be chosen for reimbursement, are very much welcome to join the workshop at their own costs (there will be no fee for attending).