Background

Daniel Ortiz-Martínez is a researcher on natural language processing at Webinterpret. Formerly, he was a member of the PRHLT research centre as well as an assistant professor at the Statistics Department of the Technical University of Valencia. His research interests are focused on the field of pattern recognition and machine learning and its application to statistical machine translation. His most recent work consists in the application of online learning techniques to incrementally train the model parameters of statistical translation tools. Daniel has worked in several research projects, including the MIPRCV project (funded by the Spanish Government and the European Commission within the Consolider programme) and the CASMACAT project (funded by the European Commission under the Seventh Framework Programme). He has published near 50 research papers in international conferences and journals and has co-supervised one PhD thesis. Additionally, he has served as a scientific reviewer for the European Commission as well as for different scientific and program committes of conferences and journals. Daniel is also the creator and maintainer of the open source Thot toolkit, a software package for statistical machine translation. Finally, in recent times Daniel has also become interested in the field of Bioinformatics, completing a Master's degree in this area offered by the University of Valencia.

Professional Experience

Natural Language Processing Research Scientist

Webinterpret

Feb 2016-Today

  • Research on natural language processing and statistical machine translation techniques

Research Consultant

Webinterpret

April 2015-Jan 2016

  • Introduction of natural language processing techniques into the Webinterpret's workflow

Assistant Professor

Technical University of Valencia

Dec 2010-Jan 2016

  • Statistics (undergrad level, Spanish and English): Courses 2011/12, 2012/13, 2013/14, 2014/15
  • Operational Research (undergrad level, Spanish): Courses 2010/11, 2012/13

Post-Doctoral Researcher

PRHLT Research Centre, Technical University of Valencia

Feb 2012-Dec 2014

  • CASMACAT research project, funded by the 7th Framework Programme of the European Commission

Independent Expert (FP7 research project reviewer)

European Commission

Mar 2011-May 2011

Research Assistant

Instituto Tecnológico de Informática, Technical University of Valencia

Jul 2008-Jan 2012

  • MIPRCV research project, part of the CONSOLIDER programme of the Spanish Government

Computational Linguistics Researcher

Technical University of Valencia

Mar 2003-Jun 2008

  • Participation in several research projects funded by the Spanish Government

Education

MSc in Bioinformatics

University of Valencia, Valencia, Spain

Sep 2016

  • Thesis title: Systems Biology Strategies to Study Cancer Metabolism

PhD in Pattern Recognition and Artificial Intelligence

Technical University of Valencia, Valencia, Spain

Oct 2011

  • Thesis title: Advances in Fully-Automatic and Interactive Phrase-Based Statistical Machine Translation

MSc in Pattern Recognition and Artificial Intelligence

Technical University of Valencia, Valencia, Spain

Nov 2005

  • Thesis topic: Search Algorithms for Phrase-based Statistical Machine Translation

BSc in Computer Science Engineering

University of Castilla La Mancha, Albacete, Spain

Sep 2003

  • Specialization in program of Pattern Recognition and Artificial Intelligence
  • Thesis topic: Stack Decoding Algorithms for Statistical Machine Translation

Selected Publications

  • Daniel Ortiz-Martínez. Online learning for statistical machine translation. Computational Linguistics, Vol 42, No. 1, 2016; DOI: 10.1162/COLI_a_00244
  • Antonio L. Lagarda, Daniel Ortiz-Martínez, Vicent Alabau, Francisco Casacuberta. Translating without In-domain Corpus: Machine Translation Post-Editing with Online Learning Techniques. Computer Speech & Language Journal, 11/2014; DOI: 10.1016/j.csl.2014.10.004
  • Daniel Ortiz-Martínez, Francisco Casacuberta. The New Thot Toolkit for Fully Automatic and Interactive Statistical Machine Translation. Proceedings of the European Chapter of the Association for Computational Linguistics (EACL) conference, Gothenburg, Sweden, April 2014
  • Daniel Ortiz-Martínez, Ismael García-Varea, Francisco Casacuberta. Online Learning for Interactive Statistical Machine Translation. Proceedings of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT) conference, Los Angeles, US, 2010
  • Daniel Ortiz-Martínez, Ismael García-Varea, Francisco Casacuberta. Phrase-level alignment generation using a smoothed loglinear phrase-based statistical alignment model. Proceedings of the XII European Association for Machine Translation (EAMT) conference, Hamburg, Germany, October 2008 (Best paper award)
  • Full publication list

    Open Source Software

    Daniel is the creator and maintainer of the Thot toolkit for statistical machine translation. This toolkit, written in C, C++ and shell scripting and publicly available under LGPL license, is composed of more than 50 000 lines of code and offers many useful tools for statistical modelling and search in the field of statistical machine translation. The Thot toolkit is strongly focused on the use of online learning techniques to incrementally train statistical model parameters. Among the different functionalities provided by the toolkit, it can be found an implementation of the incremental EM algorithm for HMM models that can be applied on datasets of an arbitrary size using Map-Reduce. The Thot toolkit has been one of the two official statistical toolkits used within the CASMACAT project.