Publications
Refereed Publications
- Factored Translation Models, Philipp Koehn and Hieu Hoang, EMNLP 2007.
- Moses: Open Source Toolkit for Statistical Machine Translation, Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, Evan Herbst, ACL 2007, demonstration session.
- (Meta-) Evaluation of Machine Translation, Chris Callison-Burch, Cameron Fordyce, Philipp Koehn, Christof Monz and Josh Schroeder, ACL Workshop on Statistical Machine Translation 2007.
- Experiments in Domain Adaptation for Statistical Machine Translation, Philipp Koehn and Josh Schroeder, ACL Workshop on Statistical Machine Translation 2007.
- CCG Supertags in Factored Statistical Machine Translation, Alexandra Birch, Miles Osborne and Philipp Koehn, ACL Workshop on Statistical Machine Translation 2007.
- Statistical Post-Edition on SYSTRAN Rule-Based Translation System, Loïc Dugast, Jean Senellart, Michel Simard and Philipp Koehn, ACL Workshop on Statistical Machine Translation 2007.
- Multi-Engine Machine Translation with an Open-Source SMT Decoder, Yu Chen, Andreas Eisele, Christian Federmann, Eva Hasler, Michael Jellinghaus and Silke Theison, ACL Workshop on Statistical Machine Translation 2007.
- English-to-Czech Factored Machine Translation, Ondrej Bojar, ACL Workshop on Statistical Machine Translation 2007.
- The University of Edinburgh System Description for IWSLT 2007, Josh Schroeder and Philipp Koehn, International Workshop on Spoken Language Translation, 2007.
- Dr. Gábor Prószéky Presentation of the EuroMatrix project at the Annual Conference of Hungarian Computational Linguistics, Dec 6-7, 2007, Szeged, Hungary
- The Challenge of Syntax-Based Machine Translation.Ondřej Bojar. HCSNet SummerFest 2007, speed paper presentation, December 2007.
- Large and Diverse Language Models for Statistical Machine Translation, Holger Schwenk and Philipp Koehn, International Joint Conference on Natural Language Processing 2008
- Towards English-to-Czech MT via Tectogrammatical Layer. Ondřej Bojar, Silvie Cinková, and Jan Ptáček. In Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories (TLT 2007), Bergen, Norway, December 2007.
- CzEng 0.7: Parallel Corpus with Community-Supplied Translations. Ondřej Bojar, Miroslav Janíček, Zdeněk Žabokrtský, Pavel Češka, and Peter Beňa. In Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco, May 2008. ELRA.
- Improving Statistical Machine Translation Efficiency by Triangulation Yu Chen, Andreas Eisele, and Martin KayIn Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco, May 2008.
- Edinburgh University System Description for the 2008 NIST Machine Translation Evaluation, Philipp Koehn, Josh Schroeder and Miles Osborne, NIST MT Evaluation Meeting.
- Design of the Moses Decoder for Statistical Machine Translation. Hieu Hoang and Philipp Koehn, ACL Workshop on Software engineering, testing, and quality assurance for NLP 2008.
- Further Meta-Evaluation of Machine Translation, Chris Callison-Burch, Cameron Fordyce, Philipp Koehn, Christof Monz and Josh Schroeder, ACL Workshop on Statistical Machine Translation 2008.
- Can we Relearn an RBMT System? Loic Dugast, Jean Senellart and Philipp Koehn, ACL Workshop on Statistical Machine Translation 2008.
- Phrase-Based and Deep Syntactic English-to-Czech Statistical Machine Translation. Ondřej Bojar and Jan Hajič. In Proceedings of the Third Workshop on Statistical Machine Translation, pages 143-146, Columbus, Ohio, June 2008. Association for Computational Linguistics.
- The MetaMorpho translation system.Attila Novák, László Tihanyi and Gábor Prószéky. In Proceedings of the Third Workshop on Statistical Machine Translation, pages 111-114, Columbus, Ohio, June 2008. Association for Computational Linguistics.
-
Enriching Morphologically Poor Languages for Statistical Machine TranslationEleftherios Avramidis, and Philipp KoehnProceedings of ACL-08: HLT, p. 763–770. Columbus, Ohio, USA, June 2008
-
Using Moses to Integrate Multiple Rule-Based Machine Translation Engines into a Hybrid SystemAndreas Eisele, Christian Federmann, Herve Saint-Amand, Michael Jellinghaus, Teresa Herrmann, and Yu Chen In Proceedings of the Third Workshop on Statistical Machine Translation, p. 179–182. Columbus, Ohio, USA, June 2008.
-
Towards better Machine Translation Quality for the German–English Language Pairs Philipp Koehn, Abhishek Arun, and Hieu Hoang In Proceedings of the Third Workshop on Statistical Machine Translation, p. 139–142. Columbus, Ohio, USA, June 2008.
- Hybrid Machine Translation Architectures within and beyond the EuroMatrix project Andreas Eisele, Christian Federmann, Hans Uszkoreit, Herve Saint-Amand, Martin Kay, Michael Jellinghaus, Sabine Hunsicker, Teresa Herrmann, Yu Chen, European Machine Translation Conference, Hamburg, September 2008.
- UMC 0.1: Czech-Russian-English Multilingual Corpus. Natalia Klyueva and Ondřej Bojar. In Proc. of International Conference Corpus Linguistics, October 2008. in print.
- Hybrid Architectures for Multi-Engine Machine Translation Andreas Eisele, Christian Federmann, Hans Uszkoreit, Herv´e Saint-Amand, Martin Kay, Michael Jellinghaus, Sabine Hunsicker, Teresa Herrmann, and Yu Chen, Translating and the Computer 30, London, November 2008.
- Predicting Success in Machine Translation. Alexandra Birch, Miles Osborne and Philipp Koehn, EMNLP 2008.
-
Evaluation of Machine Translation Metrics for Czech as the Target Language, Kamil Kos and Ondrej Bojar. Prague Bulletin of Mathematical Linguistics, 90, December 2008. in print.
- English-Hindi Translation in 21 Days. Ondřej Bojar, Pavel Straňák, and Daniel Zeman. In Proceedings of the 6th International Conference On Natural Language Processing (ICON-2008) NLP Tools Contest, Pune, India, December 2008. NLP Association of India.
-
Combining Multi-Engine Translations with Moses. Yu Chen, Michael Jellinghaus, Andreas Eisele, Yi Zhang, Sabine Hunsicker, Silke Theison, Christian Federmann, and Hans Uszkoreit Proceedings of the Fourth Workshop on Statistical Machine Translation, p. 42-46. March, 2009, Athens, Greece
-
Translation Combination using Factored Word Substitution Christian Federmann, Silke Theison, Andreas Eisele, Hans Uszkoreit, Yu Chen, Michael Jellinghaus, and Sabine Hunsicker Proceedings of the Fourth Workshop on Statistical Machine Translation, p. 70-74. March, 2009, Athens, Greece.
-
Edinburgh's Submission to all Tracks of the WMT 2009 Shared Task with Reordering and Speed Improvements to Moses Philipp Koehn, and Barry Haddow Proceedings of the Fourth Workshop on Statistical Machine Translation, p. 160-164. March, 2009, Athens, Greece.
-
Findings of the 2009 Workshop on Statistical Machine Translation Chris Callison-Burch, Philipp Koehn, Christof Monz, and Josh Schroeder, Proceedings of the Fourth Workshop on Statistical Machine Translation, p. 1-28. March, 2009, Athens, Greece
-
A Systematic Analysis of Translation Model Search Spaces Michael Auli, Adam Lopez, Hieu Hoang, and Philipp Koehn Proceedings of the Fourth Workshop on Statistical Machine Translation, p. 224-232. March, 2009, Athens, Greece.
- MorphoLogic's submission for the WMT 2009 Shared Task. Attila Novák. In Proceedings of the Fourth Workshop on Statistical Machine Translation, Athens, Greece, March 2009. Association for Computational Linguistics.
- Improved minimum error rate training in Moses. Nicola Bertoldi, Barry Haddow, and Jean-Baptiste Fouet. Prague Bulletin of Mathematical Linguistics, No. 91:7-16, 2009.
- Word Lattices for Multi-Source Translation, Josh Schroeder, Trevor Cohn and Philipp Koehn, EACL 2009.
- Translation as Weighted Deduction. Adam Lopez, EACL 2009.
- Improving Mid-Range Re-Ordering using Templates of Factors. Hieu Hoang and Philipp Koehn, EACL 2009.
- Findings of the 2009 Workshop on Statistical Machine Translation, Chris Callison-Burch, Philipp Koehn, Christof Monz and Josh Schroeder, EACL Workshop on Statistical Machine Translation 2009.
- English-Czech MT in 2008.Ondřej Bojar, David Mareček, Václav Novák, Martin Popel, Jan Ptáček, Jan Rouš, and Zdeněk Žabokrtský. In Proceedings of the Fourth Workshop on Statistical Machine Translation, Athens, Greece, March 2009. Association for Computational Linguistics.
- Statistical Post Editing and Dictionary Extraction: Systran/Edinburgh submissions for ACL-WMT2009. Loïc Dugast, Jean Senellart and Philipp Koehn, EACL Workshop on Statistical Machine Translation 2009.
- Edinburgh’s Submission to all Tracks of the WMT2009 Shared Task with Reordering and Speed Improvements to Moses, Philipp Koehn and Barry Haddow, EACL Workshop on Statistical Machine Translation 2009.
- A Systematic Analysis of Translation Model Search Spaces. Michael Auli, Adam Lopez,Hieu Hoang and Philipp Koehn, EACL Workshop on Statistical Machine Translation 2009.
- Statistical Machine Translation. Philipp Koehn, textbook, Cambridge University Press, Spring 2009.
- Intersecting Multilingual Data for Faster and Better Statistical Translations Yu Chen, Martin Kay, Andreas Eisele, accepted for publication at NAACL-HLT, June 2009.
- Evaluation of Machine Translation Metrics for Czech as the Target Language. Kamil Kos and Ondřej Bojar. Prague Bulletin of Mathematical Linguistics, 92, 2009. in print.
- Ondřej Bojar, Chris Callison-Burch, Jan Hajič and Philipp Koehn (guest editors). Prague Bulletin of Mathematical Linguistics - Special Issue on Open Source Machine Translation Tools, 91, 2009.
Theses
- "Automatic Domain Adaptation for Statistical Machine Translation", Dimitrios Mavroeidis, Msc thesis, University of Edinburgh, 2007.
- "Enriching the Input to Machine Translation", Elefteris Avramidis, Msc thesis, University of Edinburgh, 2007.
- "Improving Machine Translation with Linguistic Information", Grace Mbipom, Msc thesis, University of Edinburgh, 2007.
- “Very large language models for machine translation”, Christian Federmann, Diploma thesis, Saarland University, 2007
- "Automatic Acquisition of Semantic Transfer Rules for Machine Translation", Michael Jellinghaus, Diplom thesis, Saarland University, 2007.
- Exploiting Linguistic Data in Machine Translation", Ondřej Bojar, PhD thesis, UFAL, MFF UK, Prague, Czech Republic, October 2008.
- Silke Theison: “Optimizing Rule-Based Machine Translation Output with the Help of Statistical Methods”, Diploma Thesis, October 2007.
- Herve Saint-Amand: “Gathering a Parallel Corpus from the Web”, Master’s Thesis, September 2008.
- Alejandra Lopez-Fernandez: “Error characterization of Rule-based translation with statistical post-editing.”, Master’s Thesis, September 2008.
- Yu Chen: “Improving Statistical Machine Translation Efficiency by Triangulation”, Master’s Thesis, October 2008.
- Teresa Herrmann: “Comparing Hybrid Approaches to Machine Translation”, Master’s Thesis, October 2008.
- Tobias Kellner: “Combining Machine Transliteration and Language Modeling for Input of Non-Roman Languages”, Master’s Thesis, October 2008.
Invited talks
- November 22, 2006: Networking Session on "Multilinguism and Language Technology: a Challenge for Europe " at IST conference, with two contributions on EuroMatrix
- September 13, 2007: Invited Talk by Philipp Koehn at MT Summit XI in Copenhagen
- October 1, 2007: Presentation by Andreas Eisele at SMART project meeting in Bled, Slowenia.
- Invited Talk by Philipp Koehn at NLP Winter School, Hyderabad, India, January 2008
- Invited Talk by Philipp Koehn at LangTech2008, Rome, February 2008
- November 2008: “Hybrid Architectures for Machine Translation”, presentation by Andreas Eisele at “Translating and the Computer 30”, London.
- January 2009: "Research Avenues: A new Hype? A Paradigm Shift?", presentation by Hans Uszkoreit at Language Technology Days, Luxemburg.
Other Talks/Events
- November 22, 2006: Networking Session on "Multilinguism and Language Technology: a Challenge for Europe " at IST 2006 (Helsinki), with two contributions on EuroMatrix
- April 16..20, 2007: First Machine Translation Marathon at the University of Edinburgh
- ACL Workshop on Statistical Machine Translation, ACL 2007, Prague, Czech Republic.
- Course on Statistical Machine Translation, Philipp Koehn, Kevin Knight, and Philip Resnik, Linguistic Summer School, Stanford University, July 2007
- Tutorial on Machine Translation, Philipp Koehn and Kevin Knight, MT Summit, September 2007
- October 1, 2007: Presentation by Andreas Eisele at SMART project meeting in Bled, Slowenia.
- Statistical Machine Translation, Philipp Koehn and Ashish Venugopal, Nordic Doctoral School, Tartu University, Estonia, November 2007
- Talk by Michael Jellinghaus at the PIRE Colloquium, Saarland University, March 2008.
- Hloubkový syntaktický strojový překlad: dosud nesplněná očekávání. Ondřej Bojar. In MIS 2008, Josefův Důl, Czech Republic, January 2008. MATFYZPRESS.
- Strojový překlad přes tektogramatickou rovinu. Ondřej Bojar. Talk at ÚFAL seminar, March 2008.
- English-to-Czech Machine Translation: Should We Go Shallow or Deep? Ondřej Bojar. Talk at KEG seminar, University of Economics, Prague, March 2008.
- Second Machine Translation Marathon, Berlin, Germany, May 12-20, 2008
- A Data-Driven Approach to Deep Machine Translation. Michael Jellinghaus. Research talk at the Second MT Marathon, Berlin, Germany, May 2008.
- Tree-based Translation. Ondřej Bojar and Adam Lopez. Handout for MT Marathon Tutorial, May 2008.
- Problems of Deep Syntactic English-to-Czech MT. Ondřej Bojar. Research talk at MT Marathon, May 2008.
- ACL Workshop on Statistical Machine Translation, ACL 2008, Columbus, Ohio.
- Acquiring Rules for Machine Translation. Michael Jellinghaus. Talk at the 4th DELPH-IN Summit, Keihanna, Japan, August 2008.
- Wrestling with Deep Syntactic Translation from English to Czech. Ondřej Bojar. Talk at Saarland University, August 2008.
- Talk by Michael Jellinghaus at the 8th IRTG Annual Meeting, University of Edinburgh, September 2008.
- Prószéky Gábor. Többnyelvűség és gépifordító-szolgáltatások: EuroMatrix és EuroFord. (Multilinguality and Machine Translation Services: EuroMatrix and EuroFord). Presentation at the Autumn Conference of Translators and Interpreters. 26 September 2008, Budapest University of Technology
- Statistical Machine Translation, Chris Callison-Burch and Philipp Koehn, European Summer School for Logic, Language and Information (ESSLLI), 2008.
- AERFAI Summer School Phrase-Based and Factored Statistical Machine Translation,Philipp Koehn , AERFAI Summer School, Basque Country, Spain, 2008.
- Prószéky Gábor. Fordítás, többnyelvűség, szótárak (Translation, Multilinguality, Dictionaries). Presentation at the Conference of the Hungarian Platform on Language and Speech Technology: With Machines - in Human Language
- The Best of Two Worlds – Combining Rule-Based and Statistical Approaches to Machine Translation. Michael Jellinghaus. Talk at CLSP (The Center for Language and Speech Processing) Seminar, Johns Hopkins University, Baltimore, Maryland, November 2008.
- 'Deep' Rule-Based MT without Hand-Written Rules. Michael Jellinghaus. Talk at PIRE Annual Meeting, Charles University, Prague, Czech Republic, December 2008.
- Introduction to Statistical Machine Translation,Philipp Koehn and Dennis Mehay Meeting of the American Association for Machine Translation (AMTA), 2008.
- "State of the Art in Statistical Machine Translation", workshop with EU translation services, Luxembourg, 2009
- "Statistical Machine Translation at the University of Edinburgh", Microsoft Research Asia, Beijing, 2008
- "Introduction to Statistical Machine Translation", Chinese Workshop for Machine Translation, 2008
- "The German Challenge to Statistical Machine Translation", Polytechnic University of Catalonia, Barcelona, 2008
- "The German Challenge to Statistical Machine Translation", Dublin City University, DCU, 2008
- "The Emerging Role of Grammars in Statistical Machine Translation", Workshop on Formal Grammars, Hamburg, 2008
- "State of the Art in Statistical Machine Translation", Systran, 2008
- "Statistical Machine Translation - Where are we now?", SMART project meeting, Bristol, United Kingdom, 2008
- "Results of the EuroMatrix Shared Task in Machine Translation", Translingual Europe, Berlin, 2008
- "State of the Art in Statistical Machine Translation", Translingual Europe, Berlin, 2008
- "Open Source Tools for Statistical Machine Translation", LinguaTech, Rome, Italy, 2008
- "Moving towards Linguistically Grounded Statistical Machine Translation with an Open Source Research Environment", MT Symposium, Tokyo, Japan, 2008
- "Moses: Moving Open Source MT towards Linguistically Richer Models", Workshop on Mixing Approaches to Machine Translation (MAMT), 2008
- "Statistical Machine Translation with an Eye on Linguistics", NLP Winter School 2008, Hyderabad, India, 2008
- Third Machine Translation Marathon, Charles University, Prague, Czech Republic, January 26-30, 2009
- HLT in Hungary - 2009. Presentation at CORDIS Language Technology Days.Gábor Prószéky 14-15 January 2009, Luxembourg.
Press coverage
(see also folder with press releases)
- April 22, 2007: Article about EuroMatrix in heise.de
- April 27, 2007: Article about MT, with section on EuroMatrix in computerwoche.de
- May 2007: Article about the EuroMatrix project, Süddeutsche Zeitung
- April 21, 2007: "Report on EuroMatrix by the German Press Agency dpa." Several German newspapers have taken this report as the basis for own articles.
- February 2007: “European project to develop automatic learning and translation system for EU languages, article in CORDIS focus Newsletter 275
- January 24, 2007: “Für das Europa der 23 Sprachen: Der Computer lernt das Übersetzen/ For the Europe of 23 languages: the computer learns to translate”, press release of the Department for Computational Linguistics (e.g. idwonline, innovationsreport)
TV or radio presentations
- Report in 3sat nano, broadcast on May 18, 2007 (170MB MPEG with some technical problems, 43 MB AVI, also see accompanying text)
- Radio feature on EuroMatrix in the dience programme “IQ-Wissenschaft und Forschung” on the German radio station “Bayern 2”, aired on 19.04.2008
- Feature on the Euromatrix project on the Czech TV, public channel 2, in a scientific weekly series called "Port", aired on 13 June 2007 at 17:30.
- Phone interview (with USAAR) in the German internet radio programme Computer:club2, June 4, 2007
- RTL Klub - biggest commercial channel in Hungary, In: Infomania - IT magazine, April 26, 2007, Ep. 109
Release of resources
- Tree Aligner, Tree Decoder, and QuickJudge: http://ufal.mff.cuni.cz/euromatrix/
- CzEng 0.7: http://ufal.mff.cuni.cz/czeng/
- UMC 0.1, Czech-English-Russian Corpus: http://ufal.mff.cuni.cz/umc/
- Test sets and human evaluation data, WMT 2008, http://www.statmt.org/wmt08/
- Test sets and human evaluation data, WMT 2009, http://www,statmt.org/wmt09/
- Fourth release of Europarl parallel corpus, http://www.statmt.org/europarl/
- Incremental release of News Commentary multi-parallel corpus, http://www.statmt.org/wmt09/
- Monolingual training data (100s of millions of words per language), http://www.statmt.org/wmt09/
- Billion word French-English parallel corpus - through Callison-Burch/JHU contracting, http://www.statmt.org/wmt09/
- German-English word alignment data (in progress, to be completed within EuroMatrixPlus)
- Europarl corpus v3: 11 language parallel corpus, 40 million word per language, http://www.statmt.org/europarl/
- Moses MT system: open source statistical machine translation system, with documentation, tutorials and email mailing list. http://www.statmt.org/moses/
- Moses-based demo for English-Czech available online at https://blackbird.ms.mff.cuni.cz/cgibin/bojar/mt_cgi.pl
Public demonstrations
- Public translation web site to demonstrate Moses, http://demo2.statmt.org/
- Online evaluation campaign, http://matrix.statmt.org