LuisMC

Datasets

Swadesh Lists linked to the Open Multilingual Wordnet (OMW)

This resource includes the compilation and mapping of 1212 Swadesh lists to the Open Multilingual Wordnet, through Princeton Wordnet 3.0 synsets.

The canonical citation and description of this resource is:


@inproceedings{MorgadoDaCosta:Bond:Kratochvil:2016,
 title      = {Linking and Disambiguating Swadesh Lists: Expanding the
               {Open Multilingual Wordnet} Using Open Language Resources}},
 author     = {Morgado da Costa, Luis and
               Bond, Francis and
               Kratochv{\'\i}l, Franti{\v{s}}ek},
 booktitle  = {Proceedings of GLOBALEX 2016 Lexicographic Resources for
               Human Language Technology, 10th edition of the International Conference
               on Language Resources and Evaluation (LREC 2016)},
 pages      = {29--36},
 year       = {2016}
}

Both this work and all 1212 original lists used in this work are shared under a CC-BY-3.0 license. The original data and metadata shared by the The Rosetta Project can be found as an appendix to this data.

Click here to download this dataset.

_{back to top}

Chinese Classifiers linked to the Chinese Open Wordnet (COW)

This resource includes the compilation of a mapping between Mandarin Chinese lemmas and their candidate classifiers to the Open Multilingual Wordnet, through Princeton Wordnet 3.0 synsets. Two files are made available:

(Raw, text-based) Chinese lemma mapping, with frequencies, to a list of 204 sortal classifiers
Princeton Wordnet synset mapping, with frequencies, to a list of 204 sortal classifiers The frequencies presented here are raw (i.e. no filtering was applied), refering to the dataset described as Tau=1, in the appended publication.

Canonical Citation and description of this resource:


@inproceedings{Morgado:Bond:Gao:2016,
  title     = {Mapping and Generating Classifiers using an Open Chinese Ontology},
  author    = {Morgado da Costa, Luís  and
               Bond, Francis and
               Gao, Helena},
  booktitle = {Proceedings of the 8th Global WordNet Conference (GWC 2016)},
  address = {Bucharest, Romania},
  year      = {2016}
}

Both files in this dataset (i.e. lemma and wordnet mappings) are released under a CC-BY-4.0 license.

Click here to download this dataset.

_{back to top}