Daniel Ortiz Martínez is an assistant professor at the Mathematics and Computer Sciences Department of the University of Barcelona. His research interests are focused on the fields of machine learning and data science and their application in biomedical and natural language processing research. Daniel has worked in several research projects, including the MIPRCV project (funded by the Spanish Government and the European Commission within the Consolider programme) and the CASMACAT project (funded by the European Commission under the Seventh Framework Programme). He has published more than 50 research papers in international conferences and journals and has co-supervised one PhD thesis. Additionally, he has served as a scientific reviewer for the European Commission as well as for different scientific and program committees of conferences and journals. Daniel is also the creator and maintainer of various open source software packages.

Professional Experience

Assistant Professor

University of Barcelona

Feb 2021-Today

Postdoctoral Researcher

Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS)

Jun 2020-Jan 2021

  • Translational genomics research using statistical and machine learning methods

Adjunct Professor

University of Barcelona

Oct 2019-Jan 2021

  • Computer Vision (undergrad level, Spanish): Course 2019/20
  • Algorithms (undergrad level, Spanish): Course 2019/20, 2020/21
  • Object-Oriented Programming (undergrad level, Spanish): Courses 2019/20, 2020/21

Research Assistant

Institute for Research in Biomedicine (IRB)

Apr 2018-May 2020

  • Study cancer genomics under a data science and machine learning perspective

Adjunct Professor

Technical University of Catalonia

Sep 2018-Aug 2019

  • Programming (undergrad level, Spanish): Course 2018/19

Natural Language Processing Engineer / Technical Leader


Feb 2016-Mar 2018

  • Development of natural language processing and statistical machine translation techniques

Visiting Lecturer

University of Valencia

Mar 2017

  • Introduction to big data in natural language processing at the Master's Degree in Data Science offered by the University of Valencia

Research Consultant


April 2015-Jan 2016

  • Introduction of natural language processing techniques into the Webinterpret's workflow

Adjunct Professor

Technical University of Valencia

Dec 2010-Jan 2016

  • Statistics (undergrad level, Spanish and English): Courses 2011/12, 2012/13, 2013/14, 2014/15
  • Operational Research (undergrad level, Spanish): Courses 2010/11, 2012/13

Post-Doctoral Researcher

PRHLT Research Centre, Technical University of Valencia

Feb 2012-Dec 2014

  • CASMACAT research project, funded by the 7th Framework Programme of the European Commission

Independent Expert (FP7 research project reviewer)

European Commission

Mar 2011-May 2011

Research Assistant

Instituto Tecnológico de Informática, Technical University of Valencia

Jul 2008-Jan 2012

  • MIPRCV research project, part of the CONSOLIDER programme of the Spanish Government

Computational Linguistics Researcher

Technical University of Valencia

Mar 2003-Jun 2008

  • Participation in several research projects funded by the Spanish Government


MSc in Bioinformatics

University of Valencia, Valencia, Spain

Oct 2016

  • Thesis title: Systems Biology Strategies to Study Cancer Metabolism

PhD in Pattern Recognition and Artificial Intelligence

Technical University of Valencia, Valencia, Spain

Oct 2011

  • Thesis title: Advances in Fully-Automatic and Interactive Phrase-Based Statistical Machine Translation

MSc in Pattern Recognition and Artificial Intelligence

Technical University of Valencia, Valencia, Spain

Nov 2005

  • Thesis topic: Search Algorithms for Phrase-based Statistical Machine Translation

BSc in Computer Science Engineering

University of Castilla La Mancha, Albacete, Spain

Jan 2003

  • Specialization in program of Pattern Recognition and Artificial Intelligence
  • Thesis topic: Stack Decoding Algorithms for Statistical Machine Translation

Selected Publications

  • Daniel Ortiz-Martínez. Online learning for statistical machine translation. Computational Linguistics, Vol 42, No. 1, 2016; DOI: 10.1162/COLI_a_00244
  • Antonio L. Lagarda, Daniel Ortiz-Martínez, Vicent Alabau, Francisco Casacuberta. Translating without In-domain Corpus: Machine Translation Post-Editing with Online Learning Techniques. Computer Speech & Language Journal, 11/2014; DOI: 10.1016/j.csl.2014.10.004
  • Daniel Ortiz-Martínez, Francisco Casacuberta. The New Thot Toolkit for Fully Automatic and Interactive Statistical Machine Translation. Proceedings of the European Chapter of the Association for Computational Linguistics (EACL) conference, Gothenburg, Sweden, April 2014
  • Daniel Ortiz-Martínez, Ismael García-Varea, Francisco Casacuberta. Online Learning for Interactive Statistical Machine Translation. Proceedings of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT) conference, Los Angeles, US, 2010
  • Daniel Ortiz-Martínez, Ismael García-Varea, Francisco Casacuberta. Phrase-level alignment generation using a smoothed loglinear phrase-based statistical alignment model. Proceedings of the XII European Association for Machine Translation (EAMT) conference, Hamburg, Germany, October 2008 (Best paper award)
  • Full publication list

    Open Source Software

    Daniel is the creator and maintainer of the Thot toolkit for statistical machine translation. This toolkit, written in C, C++, Python and shell scripting and publicly available under LGPL license, is composed of more than 50 000 lines of code and offers many useful tools for statistical modelling and search in the field of statistical machine translation. The Thot toolkit is strongly focused on the use of online learning techniques to incrementally train statistical model parameters. Among the different functionalities provided by the toolkit, Thot implements a version of the incremental EM algorithm for HMM models that can be applied on datasets of an arbitrary size using Map-Reduce. The Thot toolkit has been one of the two official statistical toolkits used within the CASMACAT project.

    Daniel is also the author of the PanPipe Workflow Manager, a software package to execute general pipelines. The pipelines executed by PanPipe are composed of steps that are implemented in modules. The Bio-PanPipe package provides modules related to bioinformatics.

    Daniel also created and currently maintains two additional software packages related to bioinformatics: the Flux Capacitor toolkit for systems biology and the snptools package useful to work with single nucleotide polymorphisms (SNPs).