First, this action works towards an in-depth syntactical analysis and computational implementation of Mandarin Chinese Noun Phrases (NPs). It uses an implementable framework (Head-Driven Phrase Structure Grammar, HPSG) to design and test high-precision linguistic analyses for Mandarin Chinese text. The linguistic analysis will pay particular attention to NP quantification, reference/deixis, cognitive status and modification. The computational implementation of theoretical analyses will be made available as part of the Mandarin Resource Grammar.
Second, this action will also work towards exploiting the computational implementations, produced as part of this project, to create a high-precision error detection parser for Mandarin Chinese. To accomplish this, we use a concept known as "mal-rules". Mal-rules are able to detect grammatical errors by allowing violation/relaxation of linguistic constraints. Mal-rules are able to pinpoint, with high precision, the source of language problems, and can be used to correct or provide corrective feedback to ungrammatical sentences.
This action is strongly data-driven, and employs an interdisciplinary methodology, integrating standard methodologies in formal linguistic analysis with those employed in software development, in Computer Science. It is broadly inserted in the context of DELPH-IN — an international consortium that shares a commitment to develop open-source resources for deep linguistic processing using Head Driven Phrase Structure Grammar and Minimal Recursion Semantics.
Luis Morgado da Costa: Luis is the ‘excellent fellow’ of this action. He is an interdisciplinary researcher with a strong background in computational linguistics. He recently completed his PhD in Using Rich Models of Language in Grammatical Error Detection, from the Interdisciplinary Graduate Program at Nanyang Technological University, Singapore. Luis brings ample expertise in computational linguistics, grammar engineering and in the development of grammatical error detection applications targeting educational contexts. lmorgado.dacostagmail.com / luis.morgadodacostaupol.cz
Joanna Ut-Seong Sio:
Joanna is the supervisor of this action. She is a linguist and comedian, originally from Hong Kong. She received her PhD from Leiden University, in the Netherlands, where she worked on modification and reference in the Chinese nominal. Her research interests include Chinese languages, especially in the area of syntax and semantics, as well as the use of verbal arts in the training of communication skills. She brings her unparalleled knowledge concerning syntax of the Chinese NP to this project. joannautseong.sioupol.cz
Owsiankova Hana:
Hana is the project officer of this action. She is a project manager of national and Horizon Europe projects and mobility coordinator at the Faculty of Arts, Palacký University in Olomouc. Her main interests are science diplomacy and science business. Hana provides invaluable administrative support to the execution of this action. hana.owsiankovaupol.cz
Latest Events and Milestones
(2022.05.13) Workshop at the Department of Asian Studies (UPOL):
I was happy to run a workshop entitled "Learner Treebanks: Building Richer Error Detection Models through Learner Corpora", where I covered some topics concerning the future direction of computer assisted language education. This workshop was hosted for colleagues from the Faculty of Arts. I discussed some previous work on the development and exploitation of learner corpora in automatic error-detection and the production of automatic corrective feedback. I've also discussed how my MSCA project aligned within these larger goals. I was happy to have participants from different departments, and also from UPOL's Language Learning Center and UPLift. I was particularly happy to have met Dr. Silvie Válková, from UPOL's Language Learning Center, with whom I discussed potential lines of collaboration in the near future.
(2022.03.17) MSCA seminar for future supervisors of MSCA Fellows (UPOL/MUNI): : I took part in an online seminar looking to promote Postdoctoral Fellowships within the context of Marie Skłodowska-Curie Actions. The workshop's main goal was to coach potential future supervisors of MSCA Postdoctoral Fellows (MSCA PF) on best practices and on what to expect from the process (from ideation to supervision of MSCA PF projects). The workshop was a joint effort between Masaryk University and Palacký University Olomouc. I participated, with Joanna Sio (my supervisor), to provide a successful example on how to approach MSCA PFs by sharing our experience.
(2022.03.02) Leadership skills for Researchers (Czech Chapter of the Marie Curie Alumni Association): I attended a workshop entitled 'Leadership skills for researchers', organized by the Czech Chapter of the Marie Curie Alumni Association. The workshop was led by Radka Pittnerová, targeting specifically current and alumni MSCA researchers, and discussed essential tools and concepts related to leadership and HR management, preparing researchers who wished to pursue their academic career by setting up their own research lab.
(2022.01.19-20) Invited talk at the International Forum on Education of the English Language and Literature in the New Normal Era (INU): I had the privilege of being invited to attend and give an invited talk at the International Forum on Education of the English Language and Literature in the New Normal Era. This international forum was organized by the Department of English Language and Literature, Incheon National University (INU). The forum's central topic was on the recognition of the need to define new goals and use new methodologies for teaching language at university level, with particular interest in English. My talk was entitled "Learner Treebanks. Building Richer Error Detection Models through Learner Corpora". This talk summarized some of my previous research, and motivated the current direction of my MSCA project -- presenting computational grammars as invaluable resources for computer assisted language education.
This project uses constraint-based linguistic language models (i.e., computational grammars) to explicitely model common grammatical errors made by learners of Mandarin Chinese. It implements a theoretical concept known as mal-rules to identify and reconstruct ungrammatical sentences with enough precision to perform grammatical error detection, and to provide clear linguistic explanations of why a given sentence is ungrammatical.
In constraint-based linguistic language models, such as HPSG, robustness is an early and ever present concern. When compared with shallow parsing methods (i.e., statistical methods that analyze sentences without fully specifying their internal structure, or accounting for deep linguistic features such as agreement), the explicit nature of constraint-based linguistic language models tends to make these models much less robust. In other words, forms of input that were not explicitly accounted for in grammar are simply rejected. This is not necessarily a bad thing, since constraint-based models, such as HPSG, are theorized to make an implicit grammaticality judgment when they parse or reject an input – which is usually not not true for statistical-based parsers.
And so, this rigidity that may be considered a problem for some Natural Language Processing (NLP) applications, becomes an invaluable tool to deal with problems concerning grammaticality.
In HPSG, mal-rules can be seen as drawing inspiration from constraint relaxation or partial constraint satisfaction. However, instead of relaxing existing constraints, mal-rules effectively perform targeted constraint relaxation by adding new rules that are less constrained than what would be expected in a prescriptive grammar – i.e., they can parse ungrammatical input which should, in principle, be rejected by the grammar.
Within implemented grammars, mal-rules can be selectively available for parsing but not for generation, or to allow certain types of errors but not others. For grammars that produce a semantic representation, as is the case in this project, mal-rules can be designed to reconstruct the semantics of ungrammatical sentences in a way that allows the generation of corrected counterparts. And, in some cases, a single ungrammatical sentence can trigger multiple parses using mal-rules, each reconstructing different semantics that define different possible intended meanings of that specific ungrammatical input.
The implementation of mal-rules in HPSG grammars can be done through three major classes of linguistic objects: syntactic rules, lexical rules, and lexical items. Each method has some degree of specificity, making them useful in detecting different kinds of errors, but there is also some overlap in their explanatory power (i.e., similar errors can be captured in more than one way). Using different combinations of mal-rules essentially enables a grammar to offer multiple ways to correct a single sentence.
Below, there is an example of what this project is able to achieve. We contrast the parses the Mandarin Resource Grammar produces for two different sentences: (1) * 我买了二只狗。 (2) 我买了两只狗。
Sentence (1) is ungrammatical. It contains a common grammatical mistake made by learners of Mandarin Chinese. Mandarin Chinese has two words for the numeral two. Sentence (1) is ungrammatical due to the incorrect use of the numeral 二 (èr, two) as a numeral quantifier. Instead, 二 (èr, two) can be seen as the cardinal version of the concept two. When used as a quantifier, the word 两 (liǎng, two) should be used instead — as shown in (2).
The Mandarin Resource Grammar is able to make this clear distinction because the words 二 (èr, two) and 两 (liǎng, two) have profoundly different representations within the computational grammar. In order to capture this common mistake, a mal-rule (named mal_card_二_j) allows the word 二 (èr, two) to behave as if it was identical to 两 (liǎng, two). Without this mal-rule, the Mandarin Resource Grammar would not provide a parse for sentence (1). Finally, using this mal-rule, the Mandarin Resource Grammar is not only able to detect the error, it is also able to provide a span (i.e. which words are involved) of the error, along with a linguistically strong understanding of why sentence (1) is ungrammatical. This information can be used to provide corrective feedback (useful for learners) or it could ultimately be used to correct the sentence automatically.
The mal rule named mal_card_二_j could, for example, be linked to a more proper feedback message targeting learners of Mandarin Chinese. One possible feedback message would be: “It seems you have used the character 二 (èr, two) to count something in your sentence. Please remember that Mandarin Chinese has a special form for the word 'two' that must be used when counting. Try to use 两 (liǎng) instead of 二 (èr)”. It is, however, important to note that the form and quality of corrective feedback messages is not within the scope of this MSCA project.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No.101028782.