Hendra Setiawan is a computer scientist who just loves to design algorithms to automate analysis of human language with the goal of helping human to cope with the overwhelming amount of textual information. He works in a field that has many names, namely Natural Language Processing (NLP), Computational Linguistics (CL) or text mining just to name a few. The focus of his work is a particular task, named Statistical Machine Translation (SMT), where the goal is to translate a text in one language, say Chinese, to another language, say English. The task is challenging to a human translator, let alone an average human being and even more so to a computer. But at the same time, it is super fun to the like of him because it touches many core NLP problems; in fact, some consider SMT as a holy grail of NLP research.
In a more technical term, he is interested in modeling latent synchronous derivation trees, which refers to the space over which SMT has to make inference during translation. Making inference of this space is often intractable because SMT has to learn from translation examples that come with no linguistics annotation. In particular, he is interested in the space related to reordering which occurs due to the fact that the source and target language have different syntactic structures. Unlike other researchers who insist on imposing linguistics structures, he takes an unsupervised approach by using on whatever information embedded in function words. This set of words has little semantic values but often carries essential syntactic information.
His dream is to develop a *unified* translation model that is the core model in all stages of an SMT system, not only at decoding time. Currently, the models used to learn the translation rules are much simpler from the models used at decoding time. A unified model is computationally challenging, but I get the feeling that he is closer by the day. To do so, he cannot avoid the popular field of machine learning. In fact, his models are nothing more than the application of machine learning techniques to deal with ambiguity in languages. Take a look at his publication below to find out whether his approach is successful (spoiler alert: it is).
To push for a better translation, he also pursues other topics, which include but not limited to the application of paraphrase lattice, the application of sentence simplification techniques, discriminative training of phrase translation probabilities, among many others.
Currently, he is a Scientist with the Raytheon BBN Technologies, exploring deep learning techniques to advance state-of-the-art Statistical Machine Translation systems. Previously, he was a postdoctoral researcher with the IBM Thomas J. Watson Research Center, Yorktown Heights, NY, which is the birthplace of SMT. At IBM, he focuses on improving IBM Statistical Machine Translation system to perform well in the DARPA BOLT project evaluation. Prior to that, he was a postdoctoral researcher at the Computational Linguistics and Information Processing (CLIP) lab at the University of Maryland, Institute for Advanced Computer Studies, which is the birthplace of a state-of-the-art hierarchical phrase-based SMT, where he worked with the awesome Philip Resnik. Prior to UMD, he was a Ph.D. student at the School of Computing, National University of Singapore, co-advised by two wonderful supervisors: Haizhou Li (I2R) and Min-Yen Kan (NUS). At NUS, he was affiliated to the Web Information Retrieval / Natural Language Processing (WING) Group. He obtained his Bachelor in Informatics/Computer Science from Institut Teknologi Bandung (ITB), the oldest institute of technology in Indonesia.
- Hendra Setiawan, Zhongqiang Huang, Jacob Devlin, Thomas Lamar, Rabih Zbib, Richard Schwartz and John Makhoul. Statistical Machine Translation Features with Multitask Tensor Networks. To appear in The Annual Meeting of the Association for Computational Linguistics (ACL), July 26-31, 2015. [arxiv-pdf]
- Hendra Setiawan, Bowen Zhou and Bing Xiang. Anchor Graph: Global Reordering Contexts for Statistical Machine Translation. To appear in Proc. of The 2013 Conference On Empirical Methods in Natural Language Processing (EMNLP), October 18-21, 2013.[pdf]
- Hendra Setiawan, Bowen Zhou, Bing Xiang and Libin Shen. Two-Neighbor Orientation Model with Cross-Boundary Global Contexts. In Proc. of The Annual Meeting of the Association for Computational Linguistics (ACL), August 4-9, 2013. [pdf] [pre-print pdf] [slides]
- Hendra Setiawan and Bowen Zhou. Discriminative Training of 150 Million Translation Parameters and Its Application to Pruning. In Proc. of The Annual Meeting of the North America chapter of Association for Computational Linguistics (NAACL), June 9-15, 2013. [pdf][bib]
- Hendra Setiawan and Philip Resnik. Generalizing Hierarchical Phrase-based Translation using Rules with Adjacent Nonterminals. In Proc. of The Annual Meeting of the North America chapter of Association for Computational Linguistics (NAACL), June 1-6, 2010. [pdf][bib]
- Hendra Setiawan, Chris Dyer and Philip Resnik. Discriminative Word Alignment with a Function Word Reordering Model. In Proc. of The 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), October 9-11, 2010.[pdf][bib]
- Chris Dyer, Adam Lopez, Juri Ganitkevitch, Jonathan Weese, Ferhan Ture, Phil Blunsom, Hendra Setiawan, Vladimir Eidelman, and Philip Resnik. cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models. In Proc. of The Annual Meeting of the Association for Computational Linguistics (ACL Demonstration track), July 11-16, 2010. [pdf][bib]
- Hendra Setiawan, Min-Yen Kan, Haizhou Li and Philip Resnik. Topological Ordering of Function Words in Hierarchical Phrase-based Translation. In Proc. of The Annual Meeting of the Association for Computational Linguistics, August 2-7, 2009. [pdf][bib]
- Chris Dyer, Hendra Setiawan, Yuval Marton, and Philip Resnik. The University of Maryland Statistical Machine Translation System for the Third Workshop on Machine Translation. In Proc. of The EACL-2009 Workshop on Statistical Machine Translation, March 30 – April 3, 2009. [pdf][bib]
- Hendra Setiawan, Min-Yen Kan, and Haizhou Li. Ordering Phrases with Function Words. In Proc. of The Annual Meeting of the Association for Computational Linguistics (ACL), June 23-30, 2007. [pdf][bib]
- Min Zhang, Haizhou Li, Jian Su, and Hendra Setiawan. A Phrase-based Context-dependent Joint Probability Model for Named Entity Translation. In Proc. of The Second International Joint Conference on Natural Language Processing (IJCNLP), Oct 11-13, 2005. [pdf]
- Hendra Setiawan, Haizhou Li, Min Zhang, and Beng Chin Ooi. Phrase-based Statistical Machine Translation: A Level of Detail Approach. In Proc. of The Second International Joint Conference on Natural Language Processing (IJCNLP), Oct 11-13, 2005. [pdf]
- Hendra Setiawan, Haizhou Li, and Min Zhang. Learning Phrase Translation using Level of Detail Approach. In Proc. of The Tenth Machine Translation Summit (MT-Summit X), Sept 12-16, 2005. [pdf]
He is an Indonesian citizen of Chinese descent. He currently lives in New York City, a concrete jungle where dreams are made of, a melting polt of the world. His more famous self is a leading men’s double badminton player, who won several Olympics medals including one gold medal.