Dr. Dmitry Ustalov

Post-Doctoral Researcher

B6, 26, Room C 1.02
D-68159 Mannheim

Email: dmitry (at) informatik.uni-mannheim.de

Research Group: Natural Language Processing and Information Retrieval

About

I joined the Data and Web Science group in November 2017. My research is mainly focused on different aspects of computational semantics, especially on word sense induction and disambiguation as well as on automatic thesaurus construction and evaluation using unstructured data and crowdsourcing. In February 2018 I defended my Kandidat Nauk (PhD) thesis.

I am working on the JOIN-T project funded by the Deutsche Forschungsgemeinschaft (DFG).

You can find my publications listed on Google ScholarScopusdblp, and arXiv.

My ORCID iD is https://orcid.org/0000-0002-9979-2188 and my ResearcherID is P-6307-2014.

Research Interests

  • Natural Language Processing
  • Computational Semantics
  • Crowdsourcing

Selected Results

  • Watset, an efficient meta-algorithm for fuzzy graph clustering. This algorithm creates an intermediate representation of the input graph that naturally reflects the “ambiguity” of its nodes. Then, it uses hard clustering to discover clusters in this intermediate graph. Watset shows excellent results on synset induction task for multiple languages as reported in our ACL 2017 paper: doi:10.18653/v1/P17-1145.
  • Hyperstar, a regularized projection learning approach that transforms hyponym word embeddings into the corresponding hypernym word embeddings. The asymmetry of the “is-a” semantic relation is enforced by adding a regularization term to the loss function. As the result, more accurate hypernyms are generated using the same data as reported in our EACL 2017 paper: doi:10.18653/v1/E17–2087.
  • Mechanical Tsar, an open source engine for microtask-based crowdsourcing. This highly customizable engine enables Web-based data annotation by inviting volunteers either from the Internet or from a private crowd. Mechanical Tsar automatically does task allocation, worker ranking, answer aggregation, and agreement assessment as described in the paper: doi:10.15514/ISPRAS-2015-27(3)-25.

Invited Talks and Tutorials

Graph Clustering for Natural Language Processing at AINL 2018 in Saint Petersburg, Russia on 17 October–19, 2018

Related Projects

Recent Publications

  • Logacheva, V., Teslenko, D., Shelmanov, A., Remus, S., Ustalov, D., Kutuzov, A., Artemova, E., Biemann, C., Ponzetto, S. P. and Panchenko, A. (2020). Word sense disambiguation for 158 languages using word embeddings only. In , LREC 2020 : Twelfth International Conference on Language Resources and Evaluation : May 11–16, 2020, Palais du Pharo, Marseille, France, conference proceedings (S. 5943-5952). , European Language Resources Association (ELRA): Paris.
  • Anwar, S., Ustalov, D., Arefyev, N., Ponzetto, S. P., Biemann, C. and Panchenko, A. (2019). HHMM at SemEval-2019 Task 2: Unsupervised frame induction using contextualized word embeddings. In , NAACL HLT 2019 : The International Workshop on Semantic Evaluation, Proceedings of the Thirteenth Workshop, June 6-June 7, 2019, Minneapolis, Minnesota, USA (S. 125–129). , Association for Computational Linguistics, ACL: Stroudsburg, PA.
  • Panchenko, A., Lopukhina, A., Ustalov, D., Lopukhin, K., Arefyev, N., Leontyev, A. and Loukachevitch, N. (2018). RUSSE'2018 : a shared task on word sense induction for the Russian language. In , Computational Linguistics and Intellectual Technologies : Papers from the Annual conference “Dialogue” 2018 : 24th International Conference on Computational Linguistics and Intellectual Technologies, May 30 – June 2, 2018 Moscow (S. 547–564). Dialogue, RSUH: Moscow, Russia.
  • Panchenko, A., Ustalov, D., Faralli, S., Ponzetto, S. P. and Biemann, C. (2018). Improving hypernymy extraction with distributional semantic classes. In , LREC 2018, 11th International Conference on Language Resources and Evaluation : 7–12 May 2018, Miyazaki (Japan) (S. 1541-1551). , European Language Resources Association, ELRA-ELDA: Paris.
  • Ustalov, D., Chernoskutov, M., Panchenko, A. and Biemann, C. (2018). Fighting with the sparsity of the synonymy dictionaries for automatic synset induction. In , Analysis of Images, Social Networks and Texts : 6th International Conference, AIST 2017, Moscow, Russia, July 27–29, 2017, Revised Selected Papers (S. 94–105). Lecture Notes in Computer Science, Springer: Berlin [u.a.].
  • Ustalov, D., Panchenko, A., Biemann, C. and Ponzetto, S. P. (2018). Unsupervised sense-aware hypernymy extraction. In , Proceedings Konvens 2018 = Proceedings of the 14th Conference on Natural Language Processing : Vienna, Austria, September 19–21, 2018 (S. 192–201). , ÖGAI, Österreichische Akademie der Wissenschaften: Vienna.
  • Ustalov, D., Panchenko, A., Kutuzov, A., Biemann, C. and Ponzetto, S. P. (2018). Unsupervised semantic frame induction using triclustering. In , The 56th Annual Meeting of the Association for Computational Linguistics : ACL 2018 : proceedings of the conference, vol. 2 (short papers) : July 15 – 20, 2018 Melbourne, Australia (S. 55–62). , Association for Computational Linguistics: Stroudsburg, PA.
  • Ustalov, D., Teslenko, D., Panchenko, A., Chernoskutov, M., Biemann, C. and Ponzetto, S. P. (2018). An unsupervised word sense disambiguation system for under-resourced languages. In , LREC 2018, 11th International Conference on Language Resources and Evaluation : 7–12 May 2018, Miyazaki (Japan) (S. 1018-1022). , European Language Resources Association, ELRA-ELDA: Paris.
  • Chernoskutov, M. and Ustalov, D. (2017). Synonymy graph connectivity in graph-based word sense induction. In , 2017 SSDSE : 2017 Siberian Symposium on Data Science and Engineering (SSDSE) : Novosibirsk, Akademgorodok, Russia, 12–13 Apr 2017 (S. 14–17). , IEEE: Piscataway, NJ.
  • Panchenko, A., Marten, F., Ruppert, E., Faralli, S., Ustalov, D., Ponzetto, S. P. and Biemann, C. (2017). Unsupervised, knowledge-free, and interpretable word sense disambiguation. In , The Conference on Empirical Methods in Natural Language Processing – proceedings of System Demonstrations : September 9–11, 2017, Copenhagen, Denmark : EMNLP 2017 (S. 91–96). , Association for Computational Linguistics: Stroudsburg, PA.
  • Panchenko, A., Ustalov, D., Arefyev, N., Paperno, D., Konstantinova, N., Loukachevitch, N. and Biemann, C. (2017). Human and machine judgements for Russian semantic relatedness. In , Analysis of Images, Social Networks and Texts : 5th International Conference, AIST 2016, Yekaterinburg, Russia, April 7–9, 2016, Revised Selected Papers (S. 221–235). Communications in Computer and Information Science, Springer: Cham.
  • Ustalov, D. (2017). Expanding hierarchical contexts for constructing a semantic word network. In , Computational Linguistics and Intellectual Technologies : Papers from the Annual conference “Dialogue 2017”, Moscow, May 31-June 3, 2017 (S. 369–381). Dialogue, RSUH: Moscow, Russia.
  • Ustalov, D., Arefyev, N., Biemann, C. and Panchenko, A. (2017). Negative sampling improves hypernymy extraction based on projection learning. In , 15th Conference of the European Chapter of the Association for Computational Linguistics : proceedings of conference : April 3–7, 2017, Valencia, Spain : EACL 2017 ; Vol. 2 : Short papers (S. 543–550). , Association for Computational Linguistics: Stroudsburg, PA.
  • Ustalov, D. and Panchenko, A. (2017). A tool for effective extraction of synsets and semantic relations from BabelNet. In , 2017 SSDSE : 2017 Siberian Symposium on Data Science and Engineering (SSDSE) : Novosibirsk, Akademgorodok, Russia, 12–13 Apr 2017 (S. 10–13). , IEEE: Piscataway, NJ.
  • Ustalov, D., Panchenko, A. and Biemann, C. (2017). Watset : automatic induction of synsets from a graph of synonyms. In , The 55th Annual Meeting of the Association for Computational Linguistics : proceedings of the conference : July 30-August 4, 2017, Vancouver, Canada : ACL 2017 ; Vol. 1 (Long papers) (S. 1579-1590). , Association for Computational Linguistics: Stroudsburg, PA.
  • Ustalov, D., Teslenko, D., Panchenko, A. and Chernoskutov, M. (2017). Mnogoznal : an unsupervised system for word sense disambiguation. In , 2017 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON) (S. 147–150). , IEEE: Piscataway, NJ.
  • Braslavski, P., Ustalov, D., Mukhin, M. and Kiselev, Y. (2016). YARN : spinning-in-progress. In , Proceedings of the Eighth Global WordNet Conference (GWC-16) : January 27–30, Bucharest, Romania (S. 58–65). , Global WordNet Association: Bucarest.
  • Kiselev, Y., Ustalov, D. and Porshnev, S. (2016). Eliminating fuzzy duplicates in crowdsourced lexical resources. In , Proceedings of the Eighth Global WordNet Conference (GWC-16) : January 27–30, Bucharest, Romania (S. 161–167). , Global WordNet Association: Bucarest.
  • Ustalov, D. and Igushkin, S. (2016). Sense inventory alignment using lexical substitutions and crowdsourcing. In , Proceedings of the International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT 2016) : 28 August-4 September 2016, Saint-Petersburg, Russia (S. 56–61). , IEEE: Piscataway, NJ.
  • Panchenko, A., Loukachevitch, N., Ustalov, D., Paperno, D., Meyer, C. and Konstantinova, N. (2015). RUSSE: the first workshop on Russian semantic similarity. In , Computational Linguistics and Intellectual Technologies : Papers from the Annual conference “Dialogue 2015”, Moscow, May 27 – 30, 2015 (S. 89–105). Dialogue, RSUH: Moscow.
  • Ustalov, D. (2015). Crowdsourcing synset relations with genus-species-match. In , Proceedings of Artificial Intelligence and Natural Language & Information Extraction, Social Media and Web Search (AINL-ISMW) FRUCT Conference : 9–14 November 2015, St. Petersburg, Russia (S. 118–124). , IEEE: Piscataway, NJ.
  • Ustalov, D. (2015). Russian thesauri as Linked Open Data. In , Computational Linguistics and Intellectual Technologies : Papers from the Annual conference “Dialogue 2015”, Moscow, May 27 – 30, 2015 (S. 616–625). Dialogue, RGGU: Moscow.
  • Ustalov, D. (2015). TagBag: annotating a foreign language lexical resource with pictures. In , Analysis of Images, Social Networks and Texts : 4th International Conference, AIST 2015, Yekaterinburg, Russia, April 9–11, 2015, Revised Selected Papers (S. 361–369). Communications in Computer and Information Science, Springer: Cham.
  • Ustalov, D. (2015). Towards crowdsourcing and cooperation in linguistic resources. In , Information Retrieval : 8th Russian Summer School, RuSSIR 2014, Nizhniy, Novgorod, Russia, August 18–22, 2014, Revised Selected Papers (S. 348–358). Communications in Computer and Information Science, Springer: Cham.
  • Ustalov, D. and Kiselev, Y. (2015). Add-remove-confirm: crowdsourcing synset cleansing. In , 2015 9th International Conference on Application of Information and Communication Technologies (AICT 2015) : Rostov-on-Don, Russia, 14 – 16 October 2015 (S. 143–147). , IEEE ; Curran: Piscataway, NJ ; Red Hook, NJ.
  • Braslavski, P., Ustalov, D. and Mukhin, M. (2014). A spinning wheel for YARN : user interface for a crowdsourced thesaurus. In , Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014 : April 26–30, 2014 Gothenburg, Sweden (S. 101–104). , Association for Computational Linguistics: Stroudsburg, PA.
  • Ustalov, D. (2014). Enhancing Russian wordnets using the force of the crowd. In , Analysis of Images, Social Networks and Texts : Third International Conference, AIST 2014, Yekaterinburg, Russia, April 10–12, 2014, Revised Selected Papers (S. 257–264). Communications in Computer and Information Science, Springer: Cham.
  • Ustalov, D. (2014). NLPub: каталог и сообщество русских лингвистических ресурсов. In , RCDL-2014 : Selected Papers of XVI All-Russian Scientific Conference “Digital libraries: Advanced Methods and Technologies, Digital Collections” Dubna, Russia, October 13–16, 2014 (S. 56–60). CEUR Workshop Proceedings, RWTH Aachen: Aachen, Germany.
  • Ustalov, D. (2014). Words worth attention: predicting words of the week on the Russian Wiktionary. In , Knowledge Engineering and the Semantic Web : 5th International Conference, KESW 2014, Kazan, Russia, September 29–October 1, 2014. Proceedings (S. 196–207). Communications in Computer and Information Science, Springer: Cham.