Scalable machine translation in memory constrained environments

Baltescu, P

Thesis

Scalable machine translation in memory constrained environments

Abstract:: Machine translation is the discipline concerned with developing automated tools for translating from one human language to another. Statistical machine translation (SMT) is the dominant paradigm in this field. In SMT, translations are generated by means of statistical models whose parameters are learned from bilingual data. Scalability is a key concern in SMT, as one would like to make use of as much data as possible to train better translation systems.

In recent years, mobile devices with adequate computing power have become widely available. Despite being very successful, mobile applications relying on NLP systems continue to follow a client-server architecture, which is of limited use because access to internet is often limited and expensive. The goal of this dissertation is to show how to construct a scalable machine translation system that can operate with the limited resources available on a mobile device.

The main challenge for porting translation systems on mobile devices is memory usage. The amount of memory available on a mobile device is far less than what is typically available on the server side of a client-server application. In this thesis, we investigate alternatives for the two components which prevent standard translation systems from working on mobile devices due to high memory usage. We show that once these standard components are replaced with our proposed alternatives, we obtain a scalable translation system that can work on a device with limited memory.

The first two chapters of this thesis are introductory. Chapter 1 discusses the task we undertake in greater detail and highlights our contributions. Chapter 2 provides a brief introduction to statistical machine translation.

In Chapter 3, we explore online grammar extractors as a memory efficient alternative to phrase tables. We propose a faster and simpler extraction algorithm for translation rules containing gaps, thereby improving the extraction time for hierarchical phase-based translation systems.

In Chapter 4, we conduct a thorough investigation on how neural language models should be integrated in translation systems. We settle on a novel combination of noise contrastive estimation and factoring the output layer using Brown clusters. We obtain a high quality translation system that is fast both when training and decoding and we use it to show that neural language models outperform traditional n-gram models in memory constrained environments.

Chapter 5 concludes our work showing that online grammar extractors and neural language models allow us to build scalable, high quality systems that can translate text with the limited resources available on a mobile device.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Baltescu, P. (2016). Scalable machine translation in memory constrained environments [Master's thesis]. University of Oxford.

MLA Style

Baltescu, P. Scalable Machine Translation in Memory Constrained Environments. University of Oxford, 2016.

Chicago Style

Baltescu, P. 2016. “Scalable Machine Translation in Memory Constrained Environments.” Master's thesis, University of Oxford.
Share
Print

Access Document

Files:: SMTinMCE.pdf

(Preview, pdf, 811.4KB, Terms of use)

Authors

+ Baltescu, P More by this author

Division:: MPLS
Department:: Computer Science
Department:: Department of Computer Science
Role:: Author

Contributors

+ Blunsom, P

Department:: Department of Computer Science
Role:: Supervisor

DOI:: 10.5287/ora-gaygoxyre
Type of award:: MSc by Research
Level of award:: Masters
Awarding institution:: University of Oxford

Subjects:: Machine Translation
UUID:: uuid:427a58ed-9727-454c-92a1-7f481f7d246b
Deposit date:: 2016-11-04

Terms of use

Copyright holder:: Baltescu, P

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Thesis

Scalable machine translation in memory constrained environments

Actions

Access Document

Authors

Contributors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Thesis

Scalable machine translation in memory constrained environments

Actions

Access Document

Authors

Contributors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions