R codes and curated dataset for “EnoLEX: A Diachronic Lexical Database for the Enggano Language”
A repository of R codes and curated dataset for a Shiny web application titled “EnoLEX, a diachronic lexical database for the Enggano language” (Krauße et al. 2024). Access the database via https://enggano.shinyapps.io/enolex/. See also the source GitHub repository and its Releases page.
This repository captures the latest released version (1.0.0) on GitHub. For future updates of this repository, see the released versioning on the source repository on GitHub and its Zenodo archive.
EnoLEX represents a network of independent research materials consisting of a source dataset, this repository of source R codes and the curated data, the online database, and a conference paper.
------------------------------------------------------
In this repository, we also include the underlying data for the Austronesian Comparative Dictionary Online (ACD, see the file data/acd.rds), which also needs to be cited independently (Blust, Trussel, and Smith 2023). This Cross-Linguistic Data Format (CLDF) data was read into R using the rcldf R package by Simon J. Greenhill. This ACD dataset is used to link an Enggano form to its Proto-Austronesian and/or Proto-Malayo Polynesian reconstructed forms to be viewed online in the ACD.
The pyconcepticon Python module was used to create the first mapping of the English gloss with the Concepticon concepts, followed by manual post-editing (changes tracked on the GitHub repository).
Funding
Lexical resources for Enggano, a threatened language of Indonesia
Arts and Humanities Research Council
Find out more...Enggano in the Austronesian family: Historical and typological perspectives
Arts and Humanities Research Council
Find out more...History
Usage metrics
Categories
- Database systems
- Historical, comparative and typological linguistics
- Lexicography and semantics
- Language documentation and description
- Digital heritage
- Digital curation and preservation
- Digital history
- Computational linguistics
- Corpus linguistics
- Linguistics not elsewhere classified
- Indonesian languages
- Linguistic structures (incl. phonology, morphology and syntax)
- Programming languages
- Data management and data science not elsewhere classified
- Indigenous data and data technologies
- Data communications
- Other Indigenous data, methodologies and global Indigenous studies not elsewhere classified
- Natural language processing
- Comparative language studies
- Language studies not elsewhere classified
- Data models, storage and indexing