Tuesday, February 15, 2011

An Exploration of Post-Editing MT – Part I

The topic of post-editing MT (PEMT) (yes, somebody has already come up with an acronym) continues to gain prominence in the professional translation world and is also a subject of heated and sometimes disdainful discussion amongst many translators in the blogosphere. There is a lot of confusion and conflation in the discussion, and this post is an attempt to define and clarify the issues around this growing phenomenon, to see if it is possible to have a more constructive dialogue. 

This is a subject of some importance, so I think it is worth exploring over a series of entries, but we can hopefully frame the key issues in this post. I should also say at the outset, that I maintain a perspective that assumes that much of the investment in MT is considered as creation of long-term translation production infrastructure rather than as something used for a single one-time project. This perspective I think makes MT at least somewhat different from most other “CAT Tools”.  

There are at least three (and probably more) areas of general concern on PEMT:
  1. A clear definition of what PEMT actually is
  2. Compensation for PEMT work
  3. The quality of the work experience

    What is the task of post-editing machine translation?

    The most commonly understood definition of PEMT in the localization world is described graphically below. It is usually understood to be the linguistic work needed to correct MT output to a linguistic quality level that is close to, or indiscernible from a standard TEP process. At it’s worst, this corrective work can be very tedious and perhaps even a mind-numbing process.
    However, this view neglects several factors that affect the actual PEMT process, e.g.
    • MT system output varies, and some MT systems produce much better output than others, and thus the PEMT experience can be very different and is highly dependent on the actual quality produced by each individual MT system or engine. “Good” MT engines (systems) can noticeably enhance the productivity of translators and vice versa.
    • Google Translate is often not the example of the best that MT can do, and  customized, domain focused system usually outperform generic engines.
    • Increasingly, MT is used to translate content that would never be translated through a standard TEP process. Usually this involves much larger volumes of content that is also frequently updated and changed.
    • There are many MT applications where the editing work is only focused on making sure the content is understandable and accurate in meaning, even if it is grammatically imperfect.
    • It is possible to do linguistic analysis and a priori terminology and linguistic work to enhance the ability of the MT system to produce better output, and this can also be considered a kind of PEMT activity.
    • Many equate PEMT to “janitorial” work (often accurately so) that no self respecting translator would ever resort to, but there are many different bilingual skill levels required in the development process of an MT engine and very few of the critics seem to realize this.
     Thus, I would expand the definition of PEMT to:

    All tasks that are intended to improve the linguistic quality of output produced by an MT engine. This includes both the a priori analysis and post-editing and structural linguistic analysis work involved in the development of an MT engine. The objective always being to improve translator productivity (as well as reducing cost and time) on every translation project in that domain in future.


    The experience at Asia Online is more closely characterized by the following graphic which shows a rapidly (weeks) evolving MT system that produces continuously improving MT quality as corrective feedback is fed back into the MT engine and error patterns are identified and gradually eliminated. This is a process that has been in use in the Asia Online project to translate the English Wikipedia to Thai. The processes are transferable to other languages. There is a highly collaborative process underlying the MT systems here where linguists are looking to eliminate as many linguistic error patterns as possible and thus also enhance the ongoing PEMT experience and process, and expedite the translation of a billion word corpus and bring it up to human quality levels in the quickest time possible.  

    Reducing Post-Editing Efforts

    It is also often assumed that translators are the only people who are capable of doing post-editing work. However, many translators do not care for the “janitorial” aspect of the work so it is not for everybody. MT systems that produce generally “high-quality” output could also be edited by monolingual speakers of the target language with expertise in the subject domain. Thus, it is possible to use humans who are less skilled than your average translator to accomplish business objectives, especially  where there are really large volumes involved and/or when grammatical perfection matters less than immediate access. There are many students and housewives that may actually find the flexibility and money offered by PEMT tasks attractive, and for very large ongoing projects they may indeed have a role. Problems arise when it is assumed that the work done by professionals can easily be done by non-professionals, so it is wise to be clear about your objectives and understand the nature of the work being offered and the skills required for competent performance. In general, open-minded professionals will always have an edge in producing better quality, but bringing in hostile or reluctant translators is also a sure way to fail.

    PEMT Skills Hierarchy

    The issues that get the most attention in the broader discussions on PEMT are the other two issues which are stated in brief below. These two issues will be examined in more detail in future posts.

    Compensation for PEMT: The early experience in the industry has been to arbitrarily reduce standard rates for PEMT work as it is assumed that to some extent the translation is already done. This practice causes a lot of resistance amongst translators who are expected to actually produce a usable translation (at low rates) when the starting point MT output is not useful or viable. It is already true that the unfortunate state of affairs today is to link payment of per word rates to TM matching rates. This has caused commoditization of the actual translation work, and translators today are expected to track baskets of words and get paid different rates in schemes such as the hypothetical one shown below:
    Matching Rate Pay Rate Per Word
    Repetitions 25%
    100% 25%
    85% to 99% 45%
    75% to 84% 60%
    0 to 74% 100%

    MT is sometimes used as a way to push these rates even lower. Thus, to be fair to translators there needs to be an assessment of the scope of the PEMT task. An MT system that produces 50% usable output should be compensated differently from one the produces 75% usable output, assuming the content has to be raised to the same target quality level. Logically, the greater the quality gap that needs to be filled, the greater the pay rate for doing the work. To do this accurately and fairly, we require rapid and widely accepted MT engine quality measures which do not exist today.  As our understanding of this issue evolves, I would not be surprised to see more hourly, consulting fee and project based payment schemes develop in future, as linguists and tech-savvy translators get more involved in steering MT engine development initiatives.

    The Nature Of PEMT Work: Another common complaint about PEMT is about the drudgery and mind-numbing nature of error correction work. (This seems surprising to me ;-) as I notice that many translators love to correct and point out linguistic, typo and errors of expression errors in blog comments and social networking discussions.) But this does however suggest that not all translators want to do this kind of work or are well suited to it. I have seen very positive and very negative translator feedback,  but often the implementation of this technology has as much to do with it, as the work process itself. Success with this technology often requires close collaboration with key translators who often provide great value in the process. And of course they must be compensated for these contributions. For very large projects it will become necessary to use less skilled workers or the “crowd”. I think that the tools will improve and possibly even make translation work more fun, whole and organic. (The fragmentation caused by the current TM matching rate approach has got to be a nightmare for many.) TAUS offers some guidelines for PEMT and this is a subject of great and growing interest to many. There are many levels of human steering interaction possible with MT and as some translators engage more often with MT systems they will start to understand where they have a long-term role to play as quality drivers. I expect that they will become critical members of teams who undertake to make massive content repositories increasingly multilingual.

    There are in fact environments where MT systems have been developed in close collaboration with translators, and thus offer clearly understood productivity advantages. In these cases the translators really want to use the systems  and would see not having access to the MT system as a disadvantage e.g. PAHO, Asia Online, Andovar. Most of the time, MT has yet to really rise to a level where it is a must-have tool for a professional translator but I think that is the direction we are heading in. The content deluge shows no sign of slowing down, and any global enterprise worth anything realizes, that they need to translate a whole lot more than user manuals and spec sheets if they want to build strong international businesses in future. So this is worth some attention and worth doing well.


    1. Thanks Kirti for another lesson in MT. I have been thinking recently about MT adoption among translators and I have noticed a behavioral element that might explain the resistance to PEMT. When the translator performs the MT AND does the post-editing, he or she can the improvement in speed (hours of work done in seconds) and the resistance to PEMT is small. When the translator gets a preprocessed MT and is asked to do the post-editing, he or she only sees the "janitorial" part as you call it. They lose the sense of wow, and see it just as a chore. Maybe we should always give the translator access to the engine and let the professional process the file and then edit it. Just thinking.

    2. Hi Kirti,

      Thanks for an informative first post on the matter of post-editing. I think you cover the issues well and fairly, but I am confused by this statement:

      "...investment in MT is considered as creation of long-term translation production infrastructure rather than as something used for a single one-time project. This perspective I think makes MT at least somewhat different from most other 'CAT Tools'."

      I don't understand the distinction you are trying to make here. I agree with your assessment of MT, but I can't think of any CAT tool that is designed for a "single one-time project." I wonder if you could elaborate on this.

      Also, your expanded definition of PEMT (ugh, as you say... but I suppose it is better than MTPE, which sounds like a gasoline additive) seems a little too broad to me. Would it also include source clean-up efforts before MT is ever run? Given that the process is recursive, I don't think one could just say, "any activity that is done after MT is introduced". I would think that the definition of PEMT should focus on activity done directly on the MT output (to raise the quality level) or that feeds back into the system based on that output (to improve the future output). This is tricky, but possibly a worthy discussion. I'm reminded of the glory days when MT developers were up in arms about standardizing the definition of MT, because certain systems that were little more than automated dictionary-based word replacement products were sullying their good names.

      Renato brings up an interesting psychological point. It reminds me of a story I read (and cannot find anymore) about someone who built a computerized system for diagnosing particular medical conditions. Apparently it was quite accurate, but it was so fast that people had little confidence in the results. So the designers went back and made two adjustments:
      1) they actively slowed it down so that it took 10-30 seconds(?) to produce a result, and
      2) they added a red flashing light to it so that it looked like it was thinking.

      The story goes that due to these changes humans gave much more credence to its output. I think the audible whirring or grinding of gears might have been a nice touch too ;)

      Thanks again. Looking forward to the rest.


    3. In a comment about your article, John Weisgerber wrote: I would think that the definition of PEMT should focus on activity done directly on the MT output (to raise the quality level) or that feeds back into the system based on that output (to improve the future output).

      I agree. To use the term 'post-editing MT' to mean preparatory work is contradictory.

      Instead of extending the meaning of 'post-editing' to include preparation, use a different term. Possibly, the term 'MT optimization' is suitable.

    4. John

      I think you are right, my "definition" is too over-reaching. Your definition is a better one:
      "The definition of PEMT should focus on activity done directly on the MT output (to raise the quality level) or that feeds back into the system based on that output (to improve the future output)."

      So linguistic work directed at improving existing output or improving future MT output is a good way to define the scope of PEMT.

      In terms of the MT vs CAT differentiation - I guess what I am saying is that MT systems are like production lines, e.g. you build an EN to ES Travel engine that serves long-term needs for that purpose only. This engine may not be very useful for IT or Engineering. MT systems can only be used as general tools if you build generic engines which also usually means lower quality. CAT tools are more like word-processors, they can be used from project to project (agnostic of the words they process) while MT systems are investments in "production capacity in a specific area."

      Hopefully that clarifies what I meant to say. Please ping me if you would like further elaboration.

      You also raise another good question:
      "Would it also include source clean-up efforts before MT is ever run?"

      I think success with MT and best practices would suggest that Source Cleanup + PEMT is a better model than PEMT alone. My point is that as we move our focus from translating static chunks of text to translating flowing streams of words related to customer conversations, we should look at the problem in a more holistic and comprehensive way.

      Thanks you for your comments.

    5. Mike

      MT Optimization may be a useful way to describe the overall process of using MT successfully.

      One of the reasons I like to blog, is that I find that comments like the ones from you and others all add to bringing the subject into better focus.

    6. Hi Kirti,

      Re: MT vs. CAT differentiation

      Thanks for clarifying. I simply misinterpreted your original paragraph. All clear now.

    7. @Renato
      "When the translator performs the MT AND does the post-editing, he/she can see the improvement in speed and the resistance to PEMT is small. When the translator gets a preprocessed MT and is asked to do the post-editing, he or she only sees the "janitorial" part as you call it. They lose the sense of wow, and see it just as a chore."

      good point!

      anyway, I find even the second task a funny one, not boring at all when you master the technique

      but may be this is simply why I'm biased by my not linguistic background ...


    8. Kirti, I think you are right about the role of post-editing. There are really 2 types, and we should give them different names. One role focuses on text correction and readying docs for publication. The other role is more of an analyst who looks at the MT-produced docs, recognizes language patterns that can be improved and gives feedback to the MT administrators based on some knowledge of how the system can be configured, so the role is more of a collaboration.

      The former role will have a large volume of work opportunity, but with rapidly decreasing value (and hourly rates) as authoring, MT, and post-editing tools improve. The latter role is higher value but requires not only linguistic and analytical skills but the ability to relate to the MT administrators and MT architecture.

      Posted by Tex Texin

    9. ".. we require a rapid and widely accepted MT engine quality measures which do not exist today."

      I agree that a measure for MT quality is required. But I also think that "widely accepted quality measure" might be currently too distant goal. At this point, any measure of MT quality would be very beneficial.

      Based on my experience, back-translating MT is one way forward in measuring the quality. Limited tests show that real cost-savings can be achieved with it.

      The measure of translation quality (including machine translation) should focus in analysing the meaning of the translation. Linguistic preferences should be left out of the measure. Otherwise developing the measure will be too difficult. Linguistic preferences might be very subtle and matter of taste.