The first version of the Basque to English translator has been released.
This translator was produced with funding from the European Association for Machine Translation, based on data from the Basque to Spanish package, and from Matxin en-eu.
Accompanying the data is a small corpus, of "Metamorphosis" by Kafka, taken from the Basque and English editions of Wikisource (cc-by-sa/gfdl and public domain, respectively), aligned using mALIGNa.
Thanks to the developers of apertium eu-es and matxin en-eu; to Mireia and Mikel for answering numerous questions; and to Gema for her help.
This is a new release of apertium-en-ca!
New things:
- Wide support for Valentian forms (financed by Universitat Politècnica
de Valéncia, developed by Prompsit and Jim O'Regan)
- Improved vocabulary and transfer rules (financed by Universitat Oberta
de Catalunya, developed by Prompsit and Jim O'Regan)
This is a note to declare the release of the first version of
apertium-is-en (Icelandic → English). This has been the joint work of
Tungutæknisetur (Icelandic Centre for Language Technology) and the
Universitat d'Alacant.
This is the first released pair to include a rule-based lexical
selection module based on Constraint Grammar.
Some statistics below:
==Coverage==
Fri Mar 5 12:51:17 GMT 2010 Total: 2069639, Known: 1731691 (83.67%)
==Dictionary and rules==
* Lexical entries (is): ~9,069
* Lexical entries (is-en): ~23,305
* Disambiguation: 162
* Lexical selection: 30
* Transfer: 106 (t1x: 75, t2x: 2, t3x: 24, t4x: 5)
==Edit distance==
Number of words in reference: 1202
Number of words in test: 1141
Edit distance: 500
Word error rate (WER): 43.82 %
Number of position-independent word errors: 330
Position-independent word error rate (PER): 28.92 %
The evaluation text can be found in:
https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-is-en/dev/eval
==Future work==
In version 0.2.0 we hope to:
* Include support for IceNLP as well as lttoolbox/CG/apertium-tagger for
analysis and disambiguation
* Increase coverage of lttoolbox dictionary.
* Improve disambiguation with constraint grammar.
* Increase number of multiword units recognised.
* Improve transfer rules to cover more cases.
* Increase number of lexical selection rules
The pages
* http://wiki.apertium.org/wiki/Icelandic_and_English/Regression_tests
* http://wiki.apertium.org/wiki/Icelandic_and_English/Pending_tests
also give an idea of future work.
A fully compatible Java port of the upcoming lttoolbox 3.2 has been released.
Compared to lttoolbox some experimental code for compounding and flag match (for multiwords and inner inflection) is included.
Contact Jacob Nordfalk for more information on how (and whether) to use this.
Acknowledements
Nic Cottrell contributed an initial version of a Java port of lttoolbox.
During GSOC2009 Raphaël and Sergio worked on it, but didnt get processing to work (compilation and expansion worked)
November 2009 Jacob Nordfalk finished it up and optimized it
For more information, see http://wiki.apertium.org/wiki/Lttoolbox-java
Jacob Nordfalk
Apertium is growing to be a rather large project, and is ready for a more formal system of governance.
Voting for members of the governing board is currently taking place on the apertium-stuff mailing list (https://sourceforge.net/mailarchive/forum.php?forum_name=apertium-stuff); a status page is being maintained on our wiki: http://wiki.apertium.org/wiki/Governance
All developers are invited to register their votes.
Great news: Valencian forms according to the language model agreed by all the Valencian universities can be generated from now on as a new feature of the Apertium Spanish-Catalan translator (for more information see http://www.ua.es/spv/assessorament/criteris.pdf\). This is possible thanks to the Servei de Promoció del Valencià and the Transducens research group at the Departament de Llenguatges i Sistemes Informàtics from the Universitat d'Alacant, and Prompsit Language Engineering. In the next months this will be the default translation system to generate raugh translations from Spanish to Catalan (or the other way round) of all the web contents at the Universitat d'Alacant. The university has decided to strongly support the free/open-source technology of Apertium adapting it to their needs. Well done!
1.1 English:
We've just released a new language pair: Norwegian Nynorsk–Norwegian Bokmål, apertium-nn-nb. It's the first released automatic translator for Norwegian developed with the free and open-source Apertium machine translator engine. The pair will be available for testing at at http://www.apertium.org/index.php?id=translatetext .
In developing this system, we used the Free language resources Norsk Ordbank (a full form dictionary with morphological annotations, http://www.edd.uio.no/prosjekt/ordbanken/\) and the Oslo-Bergen tagger (a Constraint Grammar disambiguator, http://omilia.uio.no/obt/\). Both of these resources are released under the GPL as Free software.
Although a lot of conversion work was involved, the availability of high quality Free data led to a much higher coverage (~88%) and accuracy than would have been possible otherwise.
In addition to the reuse and conversion of these existing monolingual resources, a lot of work was done on the translational dictionary (partly assisted by the tool ReTraTos which turns Giza++ corpus alignments into bi-dictionary entries), and we have added transfer rules to handle eg. the differences in passive verbs phrases, gender system and possessive noun phrases.
Future goals include handling simple coordination in possessives, improving the rule-based disambiguator along with retraining the statistical tagger, and of course expanding and improving the translational dictionary.
This language pair was developed as part of a Google Summer of Code (GsoC) project. For more information on Apertium and GsoC, see http://socghop.appspot.com/org/home/google/gsoc2009/apertium . Many thanks to mentors Trond Trosterud (University of Tromsø) and Francis Tyers (Universitat d'Alacant and Prompsit Language Engineering) for advice and help on development, and to the other members of the Apertium project; also to Paul Meurer (Unifob AKSIS) and Kristin Hagen (University of Oslo) for help on the GPL Oslo-Bergen tagger, and to various Wikipedia contributors for help on the translation dictionary. Many thanks to all those who developed the open-source tools and free language resources which contributed in developing this new translator.
For more details on development and the language pair, see http://wiki.apertium.org/wiki/Norsk
1.2 Norsk:
Vi har nettopp gjeve ut eit nytt språkpar: nynorsk–bokmål, apertium-nn-nb. Dette er den første automatiske omsetjaren for norsk som er utvikla i med Apertium – ein maskinomsetjingsmotor med fri og open kjeldekode. Språkparet vil vere mogleg å teste på http://www.apertium.org/index.php?id=translatetext&lang=nn .
Til utviklinga av systemet nytta me dei frie språkressursane Norsk Ordbank (ei fullformsordliste med ordklasse- og bøyingsinformasjon, http://www.edd.uio.no/prosjekt/ordbanken/\) og Oslo-Bergen-taggaren (ein føringsgrammatikk for å eintydiggjere ordklasse mm., http://omilia.uio.no/obt/\). Båe desse ressursane er utgjeve under GPL-lisensen som Fri Programvare. Vi hadde ein del arbeid med konvertering av format, men det at vi hadde tilgjenge til slike frie ressursar av høg kvalitet førte til ein mykje høgare dekningsgrad (ca. 88%) og grannsemd enn vi elles kunne fått til på så kort tid.
I tillegg til gjenbruk og konvertering av desse einspråklege ressursane, arbeida me mykje med omsetjingsordboka (delvis hjelpt av verktøyet ReTraTos, som konverterer Giza++-samanstillingar frå parallellkorpus til oppslag i omsetjingsordboka), og me la til overføringsreglar for å handsame t.d. forskjellane i passive verbfrasar, grammatisk kjønn, og genitiv i substantivfrasar.
I framtida vil me gjerne få systemet til å handsame enkel koordinasjon i eigedomsfrasar, betre på båe den regelbaserte disambiguatoren og trene den statistiske taggaren om igjen, i tillegg til at me sjølvsagt gjerne vil utvide og betre på omsetjingsordboka.
Dette språkparet fekk prosjektstønad frå Google Summer of Code (GsoC). Meir informasjon om Apertium og GsoC finn du på http://socghop.appspot.com/org/home/google/gsoc2009/apertium . Mange takk til rettleiarane mine Trond Trosterud (Universitetet i Tromsø) og Francis Tyers (Universitat d'Alacant og Prompsit Language Engineering) for gode råd og hjelp med utviklinga, og til dei andre medlemmene av Apertium-prosjektet; takk òg til Paul Meurer (Unifob AKSIS) og Kristin Hagen (Universitetet i Oslo) for hjelp med den frie Oslo-Bergen-taggaren, og til ymse Wikipedia-forfattarar for hjelp med omsetjingsordboka. Mange takk til alle som har vore med og utvikla dei frie verktøya og språkressursane som medverka til utviklinga av omsetjaren.
Sjå http://wiki.apertium.org/wiki/Norsk om du vil vite meir om utviklinga av språkparet.
Deuet eo er-maez stumm 0.1.0 hon troer brezhoneg-galleg. Disoc'h ur c'henlabour etre Prompsit Language Engineering, Skol-veur Alacant hag Ofis ar Brezhoneg eo an troer-mañ. Well-wazh e c'holo an troer 85% a skrid ha sevel a ra e feur faziañ gerioù e-tro 35%--45%.
We have released version 0.1.0 of our Breton--French translator. The translator is a result of a collaboration between Prompsit Language Engineering, the Universitat d'Alacant and Ofis ar Brezhoneg. The translator has a coverage of around 85% and a word error rate between 35--45%.
The free/open-source machine translation project Apertium (http://www.apertium.org), one of the 150 projects of the 2009 edition of their Google Summer of Code (http://socghop.appspot.com), has finally been granted 9 three-month scholarships.
Nine students have been selected to work this summer for Apertium (see the list and their tasks at
http://socghop.appspot.com/org/home/google/gsoc2009/apertium\), and will get a US$4,500.00 dollar scholarship.
The list of students, mentors and projects can be found on our Wiki (http://wiki.apertium.org/wiki/Google_Summer_of_Code#Active_projects)
The free/open-source machine translation project Apertium (http://www.apertium.org) has been selected by Google as one of the 151 projects of the 2009 edition of their Google Summer of Code (http://socghop.appspot.com).
Students wishing to work this summer in one of the ideas that project Apertium has published in http://socghop.appspot.com/org/show/google/gsoc2009/apertium, can apply starting the 23 of March to try and get a 4,500 dollar scholarship.