Preprocessing of Nepali News Corpus for Downstream Tasks

Published in Nepalese Linguistics, 2022

In this paper, we discuss a Nepali text preprocessing pipeline to generate clean corpus. This pipeline is tested using a language model to observe impact of each steps in learning task. The relevancy of this work lies in systematizing the procedure in the development of standard Nepali corpus.

Recommended citation: Awale, S., Prasai, S., Rijal, B., & Basnet, S. B. (2022). Preprocessing of Nepali News Corpus for Downstream Tasks. Nepalese Linguistics, 1-6.
Download Paper