Digital Museum: Crowdsourcing historical and cultural data in Namibian languages (Oshiwambo and Khoekhoegowab) for Automatic Speech Recognition and edutainment.
The project builds on ongoing efforts to realise the goal metric of co-creating 1 million language sentences with war veterans and Indigenous Knowledge holders, to power speech and text technologies as part of Masakhane – A grassroots organisation whose mission is to strengthen and spur NLP research in African languages, for Africans, by Africans.
- WiMLDS Project Lead: Wilhelmina Onyothi Nekoko (Windhoek, Namibia) LinkedIn , Twitter
- Passionately curious, African, cultured, storyteller, wildlife enthusiast, ex-rugby player, software engineer, data scientist, Researcher @ Masakhane
We are seeking donations to help support this project!
To donate, please use our Paypal link and put a note that says “NamibiaNLP” so we know you’d like your funds to be directed towards this project.
Project Abstract
Namibia is home to 2.5 million people with a rich culture and colonial history spanning over 100 years. However, the stories of the Namibian people have not been told with regards to their history, cultural practices, and knowledge from the perspectives of the Namibian people. As Goring said at the Nuremberg trials “The victor will always be the judge, and the vanquished the accused.” As such, this project builds on prior efforts made, to co-create cultural and historical texts, and now aims to capture a sizeable knowledge and practical dataset, in the historical and cultural context, for building tools for endangered and low resource Namibian languages, towards:
- Story telling for healing and rehabilitation of veterans.
- Documenting stories of the women of the liberation struggle for policy support.
- Provide data for Natural Language Processing (NLP) tasks.
- Build an automated speech recognition system, to enable Namibians to have access to technology in their own languages, as well as ownership to document and preserve their cultures further.
- For edutainment – to empower creativity with content, to tell the Namibian story, by Namibians for Namibians. Thus, history is told and not censored.
The project thus aims to crowdsource a text and speech dataset from a focus group of 90 war veterans and a potential 10,000 Namibian war veterans in their native languages. The project will consider various data gathering methods such as interviews, surveys, ideation and web apps to capture the data. The speech data will be annotated and translated into English, and other languages.
Project Media Coverage
Read about the poetic process in this feature article on Quartz Africa
Watch our invited talk titled “Culture in Data” at the Africa Futures and Beyond Session (2021)
and “The wisdom of ages: Our Kinda Oil – Building NLP Datasets” an invited talk at the dotdotAstronomy Conference (2020)