Workshop on Natural Language Processing for Digital Humanities (NLP4DH)
The Workshop on Natural Language Processing for Digital Humanities is co-located with ICON 2021 with proceedings published in the ACL anthology. The workshop will take place virtually on the 19th of December 2021.
The focus of the workshop is on applying natural language processing techniques to digital humanities research. The topics can be anything of digital humanities interest with a natural language processing or generation aspect. A list of suitable topics includes but is not limited to:
- Text analysis and processing related to humanities using computational methods
- Dataset creation and curation for NLP (e.g. digitization, digitalization, datafication, and data preservation).
- Research on cultural heritage collections such as national archives and libraries using NLP
- NLP for error detection, correction, normalization and denoising data
- Generation and analysis of literary works such as poetry and novels
- Analysis and detection of text genres
Paper submission
We solicit original and unpublished work related to digital humanities and natural language processing. Short papers can be up to 4 pages in length and long papers up to 8 pages. Both submission formats can have an unlimited number of pages for references. All submissions must follow the ACL stylesheet (Overleaf template). We don’t accept submissions that consist of an abstract only.
The submissions must be anonymous and they will be peer-reviewed by our program committee. The peer review is double blinded. Please see “Paper Submission Information” on the main conference website for more information.
Papers must be submitted using SoftConf by the workshop deadline. At least one of the authors of an accepted paper must register for the main conference and present the paper.
Accepted papers (short and long) will be published in the workshop proceedings that will appear in the ACL Anthology. Accepted papers will also be given an additional page to address the reviewers’ comments. The length of a camera ready submission can then be 5 pages for a short paper and 9 for a long paper with an unlimited number of pages for references.
New: The authors of the accepted papers will be invited to submit an extended version of their workshop paper to a special issue in the Journal of Data Mining & Digital Humanities.
Important dates
- Paper submission (full and short): the 14th of November 2021 (extended)
- Notification of acceptance: the 28th of November 2021
- Camera ready deadline: the 5th of December 2021
- Workshop: the 19th of December 2021
Workshop schedule
All these times are in Finnish time (GMT+2)
December 19, 2021 | |
---|---|
9:45 – 10:00 | Opening |
10:00 – 11:00 | Session 1: Sentiment |
10:00 – 10:20 | Sentiment Dynamics of Success: Fractal Scaling of Story Arcs Predicts Reader Preferences |
Yuri Bizzoni, Telma Peura, Mads Thomsen and Kristoffer Nielbo | |
10:20 – 10:40 | The Validity of Lexicon-based Sentiment Analysis in Interdisciplinary Research |
Emily Öhman | |
10:40 – 11:00 | How Does the Hate Speech Corpus Concern Sociolinguistic Discussions? A Case Study on Korean Online News Comments |
Won Ik Cho and Jihyung Moon | |
11:00 – 11:15 | Coffee break |
11:15 – 12:15 | Session 2: Historical data |
11:15 – 11:35 | MacBERTh: Development and Evaluation of a Historically Pre-trained Language Model for English (1450-1950) |
Enrique Manjavacas Arevalo and Lauren Fonteyn | |
11:35 – 11:55 | Named Entity Recognition for French medieval charters |
Sergio Torres Aguilar and Dominique Stutzmann | |
11:55 – 12:15 | Processing M.A. Castrén’s Materials: Multilingual Historical and Handwritten Manuscripts |
Niko Partanen, Jack Rueter, Khalid Alnajjar and Mika Hämäläinen | |
12:15 – 13:15 | Lunch |
13:15 – 14:15 | Session 3: Literature |
13:15 – 13:35 | Lotte and Annette: A Framework for Finding and Exploring Key Passages in Literary Works |
Frederik Arnold and Robert Jäschke | |
13:35 – 13:55 | Using Referring Expression Generation to Model Literary Style |
Nick Montfort, Ardalan SadeghiKivi, Joanne Yuan and Alan Zhu | |
13:55 – 14:15 | The concept of nation in nineteenth-century Greek fiction through computational literary analysis |
Fotini Koidaki, Despina Christou, Katerina Tiktopoulou and Grigorios Tsoumakas | |
14:15 – 14:30 | Coffee break |
14:30 – 16:00 | Session 4: Posters |
14:30 – 16:00 | Logical Layout Analysis Applied to Historical Newspapers |
Nicolas Gutehrlé and Iana Atanassova | |
14:30 – 16:00 | “Don’t worry, it’s just noise'”: quantifying the impact of files treated as single textual units when they are really collections |
Thibault Clérice | |
14:30 – 16:00 | NLP in the DH pipeline: Transfer-learning to a Chronolect |
Aynat Rubinstein and Avi Shmidman | |
14:30 – 16:00 | Using Computational Grounded Theory to Understand Tutors’ Experiences in the Gig Economy |
Lama Alqazlan, Rob Procter, Michael Castelle | |
14:30 – 16:00 | Can Domain Pre-training Help Interdisciplinary Researchers from Data Annotation Poverty? A Case Study of Legal Argument Mining with BERT-based Transformers |
Gechuan Zhang, David Lillis, Paul Nulty | |
14:30 – 16:00 | Japanese Beauty Marketing on Social Media: Critical Discourse Analysis Meets NLP |
Emily Öhman and Amy Metcalfe | |
14:30 – 16:00 | Text Zoning of Theater Reviews: How Different are Journalistic from Blogger Reviews? |
Mylene Maignant, Thierry Poibeau, Gaëtan Brison | |
14:30 – 16:00 | Word Sense Induction with Attentive Context Clustering |
Moshe Stekel, Amos Azaria and Shai Gordin | |
14:30 – 16:00 | Transferring Modern Named Entity Recognition to the Historical Domain: How to Take the Step? |
Baptiste Blouin, Benoit Favre, Jeremy Auguste and Christian Henriot | |
14:30 – 16:00 | TFW2V: An Enhanced Document Similarity Method for the Morphologically Rich Finnish Language |
Quan Duong, Mika Hämäläinen and Khalid Alnajjar | |
14:30 – 16:00 | Did You Enjoy the Last Supper? An Experimental Study on Cross-Domain NER Models for the Art Domain |
Alejandro Sierra-Múnera and Ralf Krestel | |
14:30 – 16:00 | An Exploratory Study on Temporally Evolving Discussion around Covid-19 using Diachronic Word Embeddings |
Avinash Tulasi, Asanobu Kitamoto, Ponnurangam Kumaraguru and Arun Balaji Buduru |
Organizers
Mika Hämäläinen, Rootroo Ltd and University of Helsinki
Khalid Alnajjar, Rootroo Ltd and University of Helsinki
Niko Partanen, University of Helsinki
Jack Rueter, University of Helsinki
You can contact us by email hello@rootroo.com
Program committee
Iana Atanassova, Université de Bourgogne Franche-Comté
Yuri Bizzoni, Aarhus University
Miriam Butt, University of Konstanz
Jeremy Bradley, University of Vienna
Won Ik Cho, Seoul National University
Stefania Degaetano-Ortlieb, Saarland University
Quan Duong, University of Helsinki
Valts Ernštreits, University of Latvia Livonian Institute
Luke Gessler, Georgetown University
Hugo Gonçalo Oliveira, University of Coimbra
Kenichi Iwatsuki, ARIKTTA
Maciej Janicki, University of Helsinki
Heiki-Jaan Kaalep, University of Tartu
Maximilian Koppatz, Sanoma Media Finland
Mikko Kurimo, Aalto University
Leo Leppänen, University of Helsinki
Enrique Manjavacas Arevalo, University of Leiden
Matej Martinc, Jozef Stefan Institute
Flammie Pirinen, UiT The Arctic University of Norway
Lidia Pivovarova, University of Helsinki
Tyler Shoemaker, University of California, Davis
Liisa Lotta Tarvainen-Li, Acolad
Jörg Tiedemann, University of Helsinki
Jouni Tuominen, Aalto University
Linda Wiechetek, UiT The Arctic University of Norway
Joshua Wilbur, University of Tartu
Shuo Zhang, Bose Corporation
Emily Öhman, Waseda University