Current Projects

(2021-2023) Chinese Intelligent Language Learning (CHILL)
This is Marie Skłodowska-Curie Action funded by the European Commission to work on the development of theoretical and computational implementions of the Mandarin NP in HPSG, along with the design of mal-rules targetting common errors made by learners of Mandarin Chinese. A full description of this project and its results can be found

(2019-) The Cantonese Wordnet
This project is done in collaboration with Joanna Sio, Palacky University. The project is still in its early stages of development, and aims to provide a linguistically rich lexicon of Hong Kong Cantonese. The project will be used for both documentation and research purposes.

(2018-) The Coptic Wordnet
The Coptic Wordnet is an international collaboration, coordinated by the University of Oslo. It is contextualized within the broader scope of Digital Humanities, specifically in Coptic Studies. Early experiments are being conducted in the field of text-reuse. The data is currently maintained on GitHub, here.

(2017-) iTELL: Intelligent Technologically Enhanced Language Learning
This project is the focus of my PhD studies. Its goal is the application of rich linguistic parsing models in grammatical error detection and correction. This work is being conducted for both English and Mandarin Chinese. It is a web-based open-source project, currently maintained on github, here.

(2017-) NTUCLE: NTU Corpus of Learner's English
This learner corpus is a collaboration with the Language and Communication Centre, at Nanyang Technological University. It contains writing samples from engineering students, annotated for grammatical and stylistic errors. The corpus contains both human-annotated and automatically-annotated sections.

(2017-) The Open Kristang Wordnet (OKWN) & Pinchah Kristang
The Open Kristang Wordnet and its companion, Pinchah Kristang (an online dictionary) were developed within the context of Kodrah Kristang (“Awaken, Kristang”). Kodrah Kristang is a grassroots community initiative to revitalise Kristang — a critically endangered language in Singaporean and Malaysia. I have volunteered with Kodrah Kristang since 2017, providing guidance and support to the digital efforts to aid in Kristang's education and maintenance.

(2016-) Collaborative Interlingual Index (CILI) & Global Wordnet Grid (GWG)
As a member of the Global Wordnet Association, I am currently an active contributor to the global effort to link concepts across all languages in a single interlingual index. I am one of the main developers of the machinery enabling the online governance of this project, facilitating multiple wordnet projects to collaboratively enrich and link to this interlingual index with concepts from wordnets around the world.

(2015-) The Chinese Open Wordnet (COW)
I've been involved with the development of the Chinese Open Wordnet (COW) since early 2015. This wordnet is one of multiple wordnets being developed in parallel with the NTU Multilingual Corpus (see below). I've assisted in coordinating multiple phases of its expansion, including: training students to sense-tag the NTU Multilingual Corpus using COW; adding classifiers, chengyu and exclamatives; coordinating efforts to translate Princeton WordNet definitions into Mandarin Chinese; and coordinating sense-tagging efforts of Mandarin educational data to evaluate and expand its potential as an educational tool.

(2014-) The Open Multilingual Wordnet
Within the Open Multilingual Wordnet (OMW), I have been one of the main contributors (both front and back-end development) to this rich multilingual semantic resource. I've been involved in multiple levels of decision, from the redesign and implementation of database schemas, to the development of the web interface that serves this resource. I was the main developer of OMWEdit, a web service to edit and expand wordnets. And I've also been involved in the expansion of wordnet resources to include new concept classes like pronouns, determiners, classifiers and interjections.

(2014-) NTU Multilingual Corpus & IMI
I’ve been assisting the development and the tagging of this multilingual corpus since 2014. I’ve been a main contributor to IMI – A Multilingual Semantic Annotation Environment, that covers multiple levels of mono-lingual and cross-lingual annotation (e.g. sense annotation, sentiment, etc.), as well as tools to display and search this corpus.

back to top

Previous Projects

(2013-2014) QTLeap
Within the QTLeap project, at NLX-Group – University of Lisbon, I was mainly responsible for the maintenance and development of Language Resources for Deep Machine Translation, namely a Portuguese Deep Parallel TreeBanking using the LxGram;

(2011-2013) Centro Virtual Camões
My main responsibilities at the Centro Virtual Camões, Camões, I.P. – Institute for Cooperation and Language included managing and producing online content concerning Portuguese culture and language teaching worldwide; maintaining the Camões Digital Library; as well as giving support to the e-learning center and Portuguese language certification division;

(2009-2010) Orion: Portuguese Orientalism (19th-20th centuries)
As a member of the Centre for Comparative Studies – Faculty of Letters, University of Lisbon, I was mainly responsible for the conceptualization and development of an online database destined to collect and analyse primary written sources of Portuguese Orientalism;

back to top