Still has the tagger5/10/2023 In the example below, Dutch words are marked in italics: Speakers often fill in the lexical gaps with Dutch words and, in everyday communication, code-switching is almost the norm. Sranan Tongo has a somewhat small vocabulary. Zakharov is with the Mathematical Linguistics Department, Saint Petersburg State University, Saint Petersburg, Russia, ORCID 0000-00030522-7469 (e-mail: and, despite attempts at standardization, written Sranan Tongon has significant variation in spelling. Cortegoso Vissio is with the Mathematical Linguistics Department, Saint Petersburg State University, Saint Petersburg, Russia, ORCID 00000003-1683-7270 (e-mail: V. Although literary works have been published in Sranan Tongo since 1960, this is still primarily a spoken language used in everyday ![]() After the independence from the Netherlands, Dutch remained as the official language of Suriname, and for this reason, the press and the government administration are carried out in the language of the former metropolis. ![]() There is no corpus available for Sranan Tongo in the public domain and, since written material is scarce, compiling one is not an easy task. Dutch became the superstratum when the colony passed from British hands to the Netherlands.įrom an NLP perspective, Sranan Tongo is a low-resource language. English is the main lexifier of Sranan Tongo, while Gwe and other languages from west-Africa are considered to be its substratum. It also counts two hundred thousand speakers from the Surinamese diaspora living in The Netherlands. Nowadays Sranan Tongo is spoken by more than four hundred thousand people in urban areas along the coastline of the country and it is often used as lingua franca between the different ethnic groupManuscript received October 14, 2021.s. Like many other Atlantic Creoles, it emerged among the slaves that were brought to America five centuries ago to work on the plantations. Sranan Tongo (literally "language of Suriname") is the most widespread Creole of the Republic of Suriname in South America. Keywords- part-of-speech tagger, Sranan Tongo, low-resource, Hidden Markov Model A comparison is shown between the performance of the POS tagger on three texts before and after the inclusion of the new training data. The tagging results were hand-corrected and employed to retrain the model. For this matter, the tagger was used to annotate 2,406 sentences. In this new contribution, the development of the POS tagger for Sranan Tongo goes a step further with the addition of more training data. Since Sranan Tongo does not have a written corpus and text annotation is an expensive and time-consuming task, it was proposed to take a first step in training a POS tagger using only 550 hand-annotated sentences with part of speech tags. On that occasion, a rule-based stochastic hybrid part-of-speech tagger (POS) was introduced for Sranan Tongo, a Creole language from South America with around half a million speakers. Nicolás Cortegoso Vissio, Viktor ZakharovĪbstract-This paper is the continuation of a work submitted to the International Conference Corpus Linguistics 2021. Towards a part-of-speech tagger for Sranan Tongo ![]() This paper is the continuation of a work submitted to the International Conference Corpus Linguistics 2021.
0 Comments
Leave a Reply. |