Hlt.fbk.eu

Automatic Translation of Nominal Compounds from English to
International Conference on Natural Language Processing Centre for Language Technologies Research Centre International Institute of Information Technology Automatic Translation of English Nominal Compound in Hindi
Abstract
translation‟ and so on2. Rackow et al. (1992) has rightly observed that the two main issues correctly in the target language involves a) correctness in the choice of the appropriate target lexeme during lexical substitution and b) correctness in the selection of the right method comprises of the following steps: target construct type. The issue stated in (b) parallel corpus of English and Hindi that we English corpus (3) Finding the appropriate found that English nominal compounds can using WSD tool (4) Lexical substitution of „Hindu texts‟  hindU SastroM, „milk production‟  dugdha utpAdana significantly improves the performance of temperature‟ kamare ke tApamAn distinct from all the previous works done „nature cure‟  prAkrtik cikitsA, „hill 1.0 Introduction
The words prAkrtik and pahARI being adjectives derived from prakriti and frequently occurring expression in English1. wax work  mom par ciwroM „work preceeding noun the modifier as found in „cow milk‟, „road condition‟, „machine body pain  SarIr meM dard „pain in body‟ 1 Tanaka and Baldwin (2004) reports that the BNC corpus (84 million words: Burnard (2000)) has 2.6% and the Reuters has (108M wo rds: 2 A nominal compound may be constituted of a Rose et al. (2002)) 3.9% of bigra m no minal more co mp lex structure as „customer satisfaction indices‟, „social service department‟ and so on. Proceedings of ICON-2009: 7th International Conference on Natural Language Processing, Macmillan Publishers, India. Also accessible from http://ltrc.iiit.ac.in/proceedings/ICON-2009 one described in this paper follow a template based corpus search approach. However, the Hand luggage  haat meM le jaaye present system distinctly differs from the jaane vaale saamaan „luggage to be aforementioned works for the analysis stage. Our system, unlike others, attempts to select the correct sense of nominal components by However, no definite clue is available in the running a WSD system on the SL data. As a data that helps one in selecting the right construction type of Hindi for translating a translation candidates to be searched in the reduced. Translation of nominal compound system attempting to translate a corpus will run across NCs with high frequency, but that Selection from target language Hindi (2) Extraction of NCs from English corpus (3) occurring only once). The upshot of this for Finding relevant sense of the components of compounds are too varied to be able to pre- compile in an exhaustive list of translated NN compounds. The system must be able to deal with novel NN compounds on the fly. The next section describes the data in some Building an automatic translation system for detail. In section 3, we review earlier works that have followed similar approaches as the language (SL) English to the target language present work. Our approach is described in (TL) Hindi thus becomes a very challenging section 4. Finally the result and analysis is could achieve an accuracy of 45% with the same test data that we have used to evaluate At the time of taking up the present project Hindi. When an NC is translated in genitive English-Hindi parallel corpora in order to construction in Hindi, the translator could identify the distribution of various construct return the correct result 10% of cases. For types which English NC are aligned to. We other cases such as when NC translated as took a parallel corpora of around 50,000 Adjective noun pair or as a single word, the sentences in which we got 9246 sentences performance of Google translator is poor. (i.e. 21% cases of the whole corpus) that has This paper presents the architecture of a “Nomin various translations is given in Table 1. that has been able to give an accuracy of We have also come across some cases where an NC corresponds to a paraphrase construct test data. We limit our discussion to English for which we have not given a count in this two word nominal compounds in this paper. table. There are .08% cases (see table 1) The approach adopted to build the system when an English NC becomes a single word has a close resemblance to the approaches described in Bungum and Oepen (2009) for either be a simple word as in („cattle dung‟ gobar) or a compounded word such translation and Tanaka and Baldwin (2004) as „blood pressure‟  raktacApa, „transition (English to Japanese nominal compound and vice versa). All these works including the Proceedings of ICON-2009: 7th International Conference on Natural Language Processing, Macmillan Publishers, India. Also accessible from http://ltrc.iiit.ac.in/proceedings/ICON-2009 (Rackow et al. (1992)) and b) corpus search based probabilistic approach (Bungum and Oepen (2009) (henceforth B&O), Tanaka and Baldwin (2004) (henceforth T&B)). Rackow et al. tried to set a mapping between the head noun of source language and target language in terms of some grammatical and selecting the right lexical item for the target Table 1 : Distributi on of translations of
B&O and T&B has close similarity to ours English NC fr om an English Hi ndi par allel
as far as the template generation and the procedure of corpus search is concerned. The above table records major translation represent various construct types of the types. There are 1208 cases (approximately templates in a huge corpus. The two works is not translated but transliterated in Hindi. differ in using different strategy for ranking They are mostly technical terms, names of of the possible translated candidates that are found in the corpus. We have adopted the T&B proposal for ranking. T&B suggests The figure given in Table 1 is a report of the ranking candidate translation based on target empirical study performed on English-Hindi essentially corpus frequency. They develop translation templates that represents the construct types of Hindi (as in table 1). In (Corpus-based translation quality) metric” which extracts frequency counts from the templates are used for searching possible target language corpus (for the details see translation in Hindi raw corpus. From table 1, we come to know that the frequency of both B&O and T&B disregard local contexts and does not attempt to identify the sense of second highest construction is the genitive construct. Parallely we have performed a They have, on the other hand, taken into study with Hindi informants to find out how account of all possible translations of the corpus search. In this way the number of syntactic genitive construct even when it can have other more accurate translation. Our experiment shows that a nominal compound is well accepted as a genitive construct in Hindi in 59% of cases. This is an interesting compound in the given context, that is, the finding which we have used in designing the sentence in which it has occurred. In this other works referred to in this section. 3.0 Related Works

While working on the automatic translation 4.0 Preparation of Data and Approach
of English nominal compound to Hindi, we Proceedings of ICON-2009: 7th International Conference on Natural Language Processing, Macmillan Publishers, India. Also accessible from http://ltrc.iiit.ac.in/proceedings/ICON-2009 following stages: a) Preparation of data and translated construct type of English NC in template generation b) Determining sense of the component nouns in the given context, c) inspected and generalized into translation templates. As shown in section 2, the two dictionary, d) corpus search using translation templates <E15 E2>  <H1 H2> and <E1 templates and e) Ranking of the possible E2>  <H1 kA6 H2> are the most frequent ones. The other interesting candidate is Adjective noun phrase in Hindi. Hindi has a 4.1 Preparation of Source Language Data
formation. In this work we have identified Two sets of language data are prepared for the work. The first set is a parallel corpus of 4.3 Sense Selection for Source Language
The context determines the sense of a given identified in the Hindi target language3. The component nouns are taken independently, they might represent more than one sense. sentences of English on which we have run For each sense the English word might be tagger not only gives part of speech of the words but also outputs the lemma for each bilingual dictionary. Let me explain the word. The lemma is required in the later complexity of lexical substitution with data stage for searching the word in the wordnet. nominal compounds are strictly restricted to be two consecutive noun construction type. harm (deaths, injuries, and property damage) resulting from crashes of road vehicles‟ processing. These sentences are manually translated into Hindi and used half of it as sentence (a) and (b) are „border area‟ and „road safety‟ respectively. All four words can be used in more than one sense as given 4.2 Generation of Translation Templates
One of the most important subtasks in this 3 In order to execute this task we have used a JAVA based interface “Sanchay” that has been developed in-house. Using an interface to do 5 E stands for English and H stands for Hindi this task helped us to ma intain consistency in 6 kA is a genitive marker in Hindi. It has variants kI and ke. Therefore <H1 kA H2>, <H1 ke H2> tagging the corpus of 1.7M words. It gave an and <H1 kI H2> form three translation Proceedings of ICON-2009: 7th International Conference on Natural Language Processing, Macmillan Publishers, India. Also accessible from http://ltrc.iiit.ac.in/proceedings/ICON-2009 Table 2 : Number of Senses Listed in Wor dne t
wordnet. Since that was not available to us, For each sense there exists a synset which we have maintained the following strategy. We first acquire all possible translations for consider all words for all senses of the possible dictionary resources. Then we take component nouns and attempt to translate all of them using a bilingual dictionary the translations to all English words of a synset, number of translation candidates will be if there is one. For example, we got the following translations for the two synsets searching for those candidates that are not <„road‟, „route‟> from bilingual dictionaries: relevant for the English NC in the given context. In order to avoid the proliferation of data, we have chosen to use a WSD tool. al.) on our data for the purpose. This tool Table 4: Translation using a bilingual
specifies the wordnet sense id for each noun dictionary
component within NC as shown in table 3: From table 4, we find out that maarg, saDak, raastaa are common translation for „road‟ and „route‟. Once the Hindi equivalents are translation candidates which are searched in equivalent(s) is not found for all member common translation is available. The worst Table 3: Output of WSD tool
members of „border‟ as well as „safety‟ we The third column of table 3 presents the synset associated with the sense selected by the WSD tool. Once the synsets are acquired translations of all synset members one by in this process the translation for each word one for generating the translation templates. in the synset is obtained from a bilingual dictionary. Once we look into a bilingual dictionary, again we may come across many Translation Candidates
equivalents of a word which do not match to the sense id selected for that word. For We have performed the corpus search on a example, the word „border‟ (a member of Hindi indexed corpus of 28 million words. the synset of „border‟) has one equivalent For ranking, a reference ranking based on jhaalar in the bilingual dictionary that is the frequency of occurrence of the translate used in the domain of „decoration‟ and not candidates in full in the TL corpora is taken „location‟. We would like to discard such as baseline. To improve on the baseline, a equivalents. Otherwise the whole attempt of stronger ranking measure is borrowed from using WSD tool on the source language side Baldwin and Tanaka (2004). It rates a given will be lost. The ideal situation would have translation candidate according to corpus been to have a mapping from the synset id translation and its parts in the context of the Proceedings of ICON-2009: 7th International Conference on Natural Language Processing, Macmillan Publishers, India. Also accessible from http://ltrc.iiit.ac.in/proceedings/ICON-2009 measure is called interpolated CTQ metric The motivation for this approach is two fold: that extracts the frequency counts from the a) a word occurs mostly in its default sense which is listed as the first sense in any lexicon; b) if the input word is not available in a bilingual dictionary for substitution, a CTQ (w1H , w2H , t) = αp(w1H , w2H , t) + synset gives us other equivalent words. This increases the robustness of the system. The third method is the one we have adopted for the present task – using a WSD tool on the of occurrence of template t with w1 and w2 appropriate sense of the given word in that as its instances and βp(w1H , t)p(w2H , context. The purpose of trying out various t)p(t) is the probability of occurrence of translation template t with w1 as its instance at one time multiplied by the probability of brings in any improvement to the overall occurrence of translation template t with w2 performance of the translator tool. The table as its instance at another time multiplied by below shows that it does. The pre-processed the occurrence of translation template t. Naturally the first term will be given higher substitution is not humanly analyzed data priority than the second term. The result but is actually obtained as the output of presented in the next section will show that the incorporation of frequency of occurrence of βp(w1H , t)p(w2H , t) has distinctly has produced 80% accurate case for nominal compound disambiguation7. The results of corpus search of the translation candidates are given in the following two tables. The 5.0 Result and Analysis
baseline frequency model performs in the various experiments performed as part of improvement in performance as we go from baseline ranking method to CTQ method of nominal compounds into Hindi equivalents and the result obtained for each method is presented at table 1 and table 2. As part of the first method we have not done any word words of source language NC; on the contrary we have straightaway used the Table 5: Ranking using Baseline Frequency Hindi equivalents. For the second method, 7 It is interesting to note that the accuracy been selected as default sense and all the reported for the WordNet-SenseRelate output on members of synset of the first sense have general data is 58%. When we tested the tool for been substituted using a bilingual dictionary. nomina l co mpound, it gave an accuracy of around 80% for the same. Proceedings of ICON-2009: 7th International Conference on Natural Language Processing, Macmillan Publishers, India. Also accessible from http://ltrc.iiit.ac.in/proceedings/ICON-2009 6.0 Conclusion and Future Work
improved as shown in the following table: This paper describes the architecture of a translating English nominal compound into translated into Hindi. However no clue is available to determine which type of Hindi have, therefore, adopted a corpus search found out that adjectival templates are hard from noun is a complex derivational process in Hindi. It does not only involve attaching an adjectival suffix on the noun but also many a time requires a change in the vowel The recall of this experiment was very low. of the stem. In the present work, we have includes the correct generation of adjectival verify on the development data whether the form from the modifier nouns so that correct templates for „Adjective Noun‟ construct corpus search can legitimately be translated as a genitive construct. We found that the approach is that a translation if it exists in the corpus will never be missed. Therefore Therefore we incorporated this as a default accuracy of translation will depends largely translation case for our system. Whenever a corpus search for a translation candidate searched for the translation candidates. fails, we assign a genitive translation for that nominal compound. This results in a steep 7. References
improvement in recall although the precision falls down a little. We ran the experiment substitution methods. The result is reported Helmut Schmid. 1994. Probabilistic Part-of- International Conference on New Methods in Language Processing. Manchester, UK. Lou Burnard. 2000. User Reference Guide for the British National Corpus. Technical Table 7: Ranking after inclusion of de fault
translati on (X kA Y, X k I Y, X ke Y as

Pierrette Bouillon, Katharina Boesefeldt, templates)
Proceedings of ICON-2009: 7th International Conference on Natural Language Processing, Macmillan Publishers, India. Also accessible from http://ltrc.iiit.ac.in/proceedings/ICON-2009 nouns in a unification-based MT system. In Proc. of the 4th Conference on Applied Natural Language Processing (ANLP), Stuttgart, Germany. Siddharth Patwardhan, Satanjeev Banerjee and SenseRelate::TargetWord – A Generalized Framework for Word Sense Disambiguation. Proceedings of the ACL Interactive Poster and Demonstration Sessions, Ann Arbor, MI Sparck Jones, K. 1983. "So what about parsing compound nouns?," in Automatic Natural Language Processing, K. Sparck Jones and Y. A. Wilks, eds., Ellis Horwood, Chichester, 164--168. Su Nam Kim, Timothy Baldwin: Automatic Interpretation of Noun Compounds Using WordNet Similarity. IJCNLP 2005: 945-956 interpretation of nominal compounds. In Proc. of the 1st Conference on Artificial Intelligence (AAAI-80). Timothy Baldwin and Takaaki Tanaka. 2004. Translation by Machine of Complex Nominals: Getting it right. In Proceedings of the ACL04 Workshop on Multiword Expressions: Tanaka, Takaaki and Timothy Baldwin. 2003b. Translation Selection for Japanese-English Proceedings of Machine Translation Summit IX, New Orleans, LO, USA. Ulrike Rackow, Ido Dagan, Ulrike Schwall. 1992. Automatic Translation of Noun Compounds. COLING 1992, 1249-1253 Zouhair Maalej, English-Arabic Machine Translation of Nominal Compounds, in Proceedings of the Workshop on Compound Nouns: Multilingual Aspects of Nominal Composition. Geneva: ISSCO, pp. 135–146, 1994. Proceedings of ICON-2009: 7th International Conference on Natural Language Processing, Macmillan Publishers, India. Also accessible from http://ltrc.iiit.ac.in/proceedings/ICON-2009

Source: https://hlt.fbk.eu/sites/hlt.fbk.eu/files/prashant-mathur-camera-ready.pdf

pesihealthcare.com

REGISTER TODAY! CREDIT INFORMATION SATISFACTION GUARANTEED California Nurses: CMI Education Institute Inc., is a provider approved by the California Board of All registrations must include payment or signed purchase order. The Heart of the Matter Fibrinolytics Registered Nursing, Provider Number 6538 for 6.0 contact hours. Full attendance is required. No Walk-ins ar

cmfone.org

Hailed as “a brilliant cellist” by the legendary Mstislav Rostropovich, Sergey Antonov went on to prove his mentor’s proclamation when in 2007 he became one of the youngest cellists awarded the gold medal at the world’s premier musical Olympiad, the quadrennial Interna-tional Tchaikovsky Competition. Antonov’s entry into this elite stratum of sought-after classical artists has already p

Copyright © 2013-2018 Pharmacy Abstracts