Carola Trips, chairholder of English Linguistics /Diachrony: Natural Linguistic Processing (November 2022)
What is your current research topic?
My current research project is part of the DFG research unit “Structuring the Input in Language Processing, Acquisition and Change” (SILPAC, FOR 5157). Here we are investigating language change from the psycholinguistic perspective, i.e. how changes in the input and during language processing may lead to language change.
For those who have not yet delved deeply into the topic of Data Science: How would you explain to a child what you are working on?
What I am working on can probably be described as “language archaeology”: I use texts from Medieval English and try to find out how people back then spoke/
Everyone talks about Data Science – how would you describe the importance of the topic for yourself in three words?
Corpus linguistics, linguistic annotation, natural language processing
What points of contact with Data Science does your work have? Which methods do you already use, and which would be interesting for you in the future?
I primarily work with linguistically annotated corpora. Enriching existing corpora with further information (e.g. verb lemmatisation) which can be automatically extracted is often necessary to answer a research question. These annotations are a prerequisite for the use of quantitative and statistical methods. Since I'm also interested in language contact in historical translations methods and tools that align texts and make them comparable (historical parallel corpora) will be of interest for me in the future.
How high is the value of Data Science for your work? Would your research even be possible without Data Science?
According to my definition of Data Science the focus is on Natural Linguistic Processing. Text annotation and respective tools are highly valuable for my research at least for quantitative studies. However, the non-linguistic, philological properties of these texts shouldn't be neglected. The kind of historical linguistics that I use in my projects is heavily impacted by Data Science but also combines both approaches in a fruitful way.
What development opportunities do you see for the topic of Data Science in relation to your field?
Working with annotated corpora is essential for historical linguistics today and fortunately many good corpora exist for older periods of a language (cross-linguistically). One problem that I have encountered, however, is that still not many corpus resources are shared within the community, often because they are developed in research projects and not known or not available to everyone. This is a point that should definitely be improved.