• Title/Summary/Keyword: 영어어휘

Search Result 202, Processing Time 0.028 seconds

English-Korean Transfer Dictionary Extension Tool in English-Korean Machine Translation System (영한 기계번역 시스템의 영한 변환사전 확장 도구)

  • Kim, Sung-Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.1
    • /
    • pp.35-42
    • /
    • 2013
  • Developing English-Korean machine translation system requires the construction of information about the languages, and the amount of information in English-Korean transfer dictionary is especially critical to the translation quality. Newly created words are out-of-vocabulary words and they appear as they are in the translated sentence, which decreases the translation quality. Also, compound nouns make lexical and syntactic analysis complex and it is difficult to accurately translate compound nouns due to the lack of information in the transfer dictionary. In order to improve the translation quality of English-Korean machine translation, we must continuously expand the information of the English-Korean transfer dictionary by collecting the out-of-vocabulary words and the compound nouns frequently used. This paper proposes a method for expanding of the transfer dictionary, which consists of constructing corpus from internet newspapers, extracting the words which are not in the existing dictionary and the frequently used compound nouns, attaching meaning to the extracted words, and integrating with the transfer dictionary. We also develop the tool supporting the expansion of the transfer dictionary. The expansion of the dictionary information is critical to improving the machine translation system but requires much human efforts. The developed tool can be useful for continuously expanding the transfer dictionary, and so it is expected to contribute to enhancing the translation quality.

A Study on the Improvement of Security Terminology (경호・경비 용어의 개선방안)

  • Kim, Hong Seong
    • Korean Security Journal
    • /
    • no.57
    • /
    • pp.231-252
    • /
    • 2018
  • we have long used foreign words in using the term for guard security despite the obvious existence of own language, the use of foreign terms is strong in foreign feeling in delivery. and also weakens the true meaning of security. there are no terms expressed independently in korean, and we(they) are negligent in finding them and use the terms of foreign language. as a result, we(they) brought about a lack of choice in terms of proper security for our langage. currenty, it is widely used as a security guard even though there is an appropraite word that corresponds to the meaning and meaning of security guards in our words, we still use enlish expressions. there is because the English language is used for convenience regardless of weather the term is appropraite or not, and as the power of the English language is great amid in the trend of globalization. lt is easy to use english without thinking in terms of the use of terminology. ultimitely, however, this is due to the lack of awareness of the korean language. with these reasons, we must find the term of security guards in pure korean language. until now, we have used the terms 'guard, security, protect' as the terms security and protection the term 'Jikim' refers to the korean language as a means to be vigilant and guarded. Jikim refers to the action of maintaining the current safe state. Like school jikimi, children safety jikimi and environment jikimi, Jikim is already being used in many places. Therefore, the term 'guard' should be changed to an appropriate Korean term, and the term 'Jikim' is considered to be the most appropriate term in various sections. so, 'Jikim' will be appropriate in korean, which corresponds to the meaning of security guards. the guardian here is called the Jikimi. Jikimi is a combination of the word Jikim and the korean pronounce 'I' which means people

Development and Evaluation of a Document Summarization System using Features and a Text Component Identification Method (텍스트 구성요소 판별 기법과 자질을 이용한 문서 요약 시스템의 개발 및 평가)

  • Jang, Dong-Hyun;Myaeng, Sung-Hyon
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.6
    • /
    • pp.678-689
    • /
    • 2000
  • This paper describes an automatic summarization approach that constructs a summary by extracting sentences that are likely to represent the main theme of a document. As a way of selecting summary sentences, the system uses a model that takes into account lexical and statistical information obtained from a document corpus. As such, the system consists of two parts: the training part and the summarization part. The former processes sentences that have been manually tagged for summary sentences and extracts necessary statistical information of various kinds, and the latter uses the information to calculate the likelihood that a given sentence is to be included in the summary. There are at least three unique aspects of this research. First of all, the system uses a text component identification model to categorize sentences into one of the text components. This allows us to eliminate parts of text that are not likely to contain summary sentences. Second, although our statistically-based model stems from an existing one developed for English texts, it applies the framework to individual features separately and computes the final score for each sentence by combining the pieces of evidence using the Dempster-Shafer combination rule. Third, not only were new features introduced but also all the features were tested for their effectiveness in the summarization framework.

  • PDF

Addressing Low-Resource Problems in Statistical Machine Translation of Manual Signals in Sign Language (말뭉치 자원 희소성에 따른 통계적 수지 신호 번역 문제의 해결)

  • Park, Hancheol;Kim, Jung-Ho;Park, Jong C.
    • Journal of KIISE
    • /
    • v.44 no.2
    • /
    • pp.163-170
    • /
    • 2017
  • Despite the rise of studies in spoken to sign language translation, low-resource problems of sign language corpus have been rarely addressed. As a first step towards translating from spoken to sign language, we addressed the problems arising from resource scarcity when translating spoken language to manual signals translation using statistical machine translation techniques. More specifically, we proposed three preprocessing methods: 1) paraphrase generation, which increases the size of the corpora, 2) lemmatization, which increases the frequency of each word in the corpora and the translatability of new input words in spoken language, and 3) elimination of function words that are not glossed into manual signals, which match the corresponding constituents of the bilingual sentence pairs. In our experiments, we used different types of English-American sign language parallel corpora. The experimental results showed that the system with each method and the combination of the methods improved the quality of manual signals translation, regardless of the type of the corpora.

A Corpus-driven Approach to Korean and English Newspaper Obituaries (빈도 분석을 활용한 한·영 사망기사 특징 비교)

  • Shin, Hyejung
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.11
    • /
    • pp.592-601
    • /
    • 2014
  • This study examines newspaper obituaries in Korean media and English media. Initially, 100 Korean obituaries were collected from the JoongAng Ilbo which span over more than three years, from May 2011 to August 2014. After that, another 50 Korean obituaries were gathered from the DongA Ilbo which were published over the same time period with the JoongAng Ilbo. As for English newspapers, obituaries from the New York Times and the Guardian were included in the corpus for comparison. First, the structure and composition of obituaries in each language (Korean and English) are compared. Korean obituaries show a pattern of a combination of a death notice and an obituary. Second, distinct features of each newspaper are discussed. The JoongAng Ilbo has its obituary section titled "Life and Memories", and the DongA Ilbo's obituaries are under the heading of "Rest in Peace." Obituaries in the New York Times appear in print on different pages of the paper according to the deceased's field of interest. Following discussion of formal structure and characteristics of each newspaper, Korean and English obituaries will be compared in terms of content and cultural context.

The Effect of Strong Syllables on Lexical Segmentation in English Continuous Speech by Korean Speakers (강음절이 한국어 화자의 영어 연속 음성의 어휘 분절에 미치는 영향)

  • Kim, Sunmi;Nam, Kichun
    • Phonetics and Speech Sciences
    • /
    • v.5 no.2
    • /
    • pp.43-51
    • /
    • 2013
  • English native listeners have a tendency to treat strong syllables in a speech stream as the potential initial syllables of new words, since the majority of lexical words in English have a word-initial stress. The current study investigates whether Korean (L1) - English (L2) late bilinguals perceive strong syllables in English continuous speech as word onsets, as English native listeners do. In Experiment 1, word-spotting was slower when the word-initial syllable was strong, indicating that Korean listeners do not perceive strong syllables as word onsets. Experiment 2 was conducted in order to avoid any possibilities that the results of Experiment 1 may be due to the strong-initial targets themselves used in Experiment 1 being slower to recognize than the weak-initial targets. We employed the gating paradigm in Experiment 2, and measured the Isolation Point (IP, the point at which participants correctly identify a word without subsequently changing their minds) and the Recognition Point (RP, the point at which participants correctly identify the target with 85% or greater confidence) for the targets excised from the non-words in the two conditions of Experiment 1. Both the mean IPs and the mean RPs were significantly earlier for the strong-initial targets, which means that the results of Experiment 1 reflect the difficulty of segmentation when the initial syllable of words was strong. These results are consistent with Kim & Nam (2011), indicating that strong syllables are not perceived as word onsets for Korean listeners and interfere with lexical segmentation in English running speech.

A Korean Emotion Features Extraction Method and Their Availability Evaluation for Sentiment Classification (감정 분류를 위한 한국어 감정 자질 추출 기법과 감정 자질의 유용성 평가)

  • Hwang, Jae-Won;Ko, Young-Joong
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.4
    • /
    • pp.499-517
    • /
    • 2008
  • In this paper, we propose an effective emotion feature extraction method for Korean and evaluate their availability in sentiment classification. Korean emotion features are expanded from several representative emotion words and they play an important role in building in an effective sentiment classification system. Firstly, synonym information of English word thesaurus is used to extract effective emotion features and then the extracted English emotion features are translated into Korean. To evaluate the extracted Korean emotion features, we represent each document using the extracted features and classify it using SVM(Support Vector Machine). In experimental results, the sentiment classification system using the extracted Korean emotion features obtained more improved performance(14.1%) than the system using content-words based features which have generally used in common text classification systems.

  • PDF

A User Sentiment Classification Using Instagram image and text Analysis (인스타그램 이미지와 텍스트 분석을 통한 사용자 감정 분류)

  • Hong, Taekeun;Kim, Jeongin;Shin, Juhyun
    • Smart Media Journal
    • /
    • v.5 no.1
    • /
    • pp.61-68
    • /
    • 2016
  • According to increasing SNS users and developing smart devices like smart phone and tablet PC recently, many techniques to classify user emotions with social network information are researching briskly. The use emotion classification stands for distinguishing its emotion with text and images listed on his/her SNS. This paper suggests a method to classify user emotions through sampling a value of a representative figure on a trigonometrical function, a representative adjective on text, and a canny algorithm on images. The sampling representative adjective on text is selected as one of high frequency in the samplings and measured values of positive-negative by SentiWordNet. Figures sampled on images are selected as the representative in figures; triangle, quadrangle, and circle as well as classified user emotions by measuring pleasure-unpleased values as a type of figures and inclines. Finally, this is re-defined as x-y graph that represents pleasure-unpleased and positive-negative values with wheel of emotions by Plutchik. Also, we are anticipating for applying user-customized service through classifying user emotions on wheel of emotions by Plutchik that is redefined the representative adjectives and figures.

IMPORTANCE OF PHONETICS IN LINGUISTIC STUDIES (언어학에 있어서 음성학의 중요성)

  • Robins R. H.
    • MALSORI
    • /
    • no.3
    • /
    • pp.34-39
    • /
    • 1981
  • 유 만근 교수 질문 : 음성학에 관하여 질문하고 싶습니다. 영국대학의 언어학 교육 과정에서 음성학이 차지하는 비중이 얼마나 큰 지 알고 싶습니다. 러우비니스 교수 응답 : 네 ,그 질문엔 한마디로 긍정적인 답변을 할 수 있읍니다. 음성학은 언어학에서 대단히 중요한 위치를 차지하며 런던대학뿐만이 아니라, 영국의 어느 대학에서나 음성학은 언어학에 초기부터 도입됩니다. 지금 하신 질문은 대단히 중요한 문제이므로 좀 더 자세히 설명하겠습니다. 런던의 동료와 마찬가지로 나는 음성학이 언어학의 일부일뿐 아니라 실로 필수적인 분야라고 봅니다. 런던 대학내에는 "음성학 및 언어학과" 라는 학과가 두 군데나 있는데 이는 "불어 및 이태리어과" 라는 식의 명칭과는 의미가 다릅니다. 이러한 명칭의 배후에는 역사적인 이유가 있읍니다. 음성학은 언어학의 분과중에서 가장i 먼저 발전하였으며, 영국에서는 음성학 교수직이 언어학 교수직보다 30년이나 먼저 생겼습니다. 바로 내이얼 저운스가 최초로 음성학 교수직을 얻어 30년이나 봉직했던 것입니다. 이러한 이유로 학과의 명칭이 아직도 그대로 존속하는 것입니다. 10년전에 한국에 왔던 헨더슨 교수 역시 음성학 교수입니다만, 그렇다고 그가 언어학에 관심이 없는 것은 아닙니다. 이제는 아무도 언어학의 전분야를 모두 전공할 수는 없습니다. 그래서 언어학에는 역사언어학, 응용언어학, 음성학 같은 분야별 전문가가 있기 마련입니다. 그러나, 통사론을 하지 않고 언어학을 할 수 없는 것과 마찬가지로 음성학을 하지 않고는 언어학을 할 수가 없습니다. 물론, 음성학안에서도 일반음성학이냐 개별어의 음성학이냐에 따라서 전문이 세분될 수 있읍니다마는, "음성학이 영국 대학의 언어학 교육 과정의 일부인가?" 라는 질문에 대한 답변은 자명할 것입니다. 안걸은 더 나아가서, 인간 언어의 역사를 잠시 생각해 보면 음성학이 언어학 연구에 없어서는 안될 필수적인 것을 알겁니다. 결국 언어학이란 인간 언어의 역사와 운용을 연구하는 것이고 인간의 언어란 수천년을 내려왔으나, 글자가 생기기 전까지는 순전히 음성언어, 즉 소리말로 존재했습니다. 한국어이건 영어이건, 라틴러이건 언어는 모두 발음기관으로 발음 할 수 있고 귀로 들을 수 있는 소리로 되어 있으며, 이미 죽은 말을 다를 때에도 결국 화석화한 소리말을 연구한다고 보는 것입니다. 즉 언어란 바로 소리말입니다. 사람은 글자를 배우기 앞서서 말을 배우며 우리같이 고등교육을 받은 사람도 쓰기와 읽기보다는 말하기와 듣기를 훨씬 많이 합니다. 이 같이 언어는 소리말로 운용되는데, 바로 소리 말을 연구하는 것이 음성학으므로 음성학은 언어학의 기본이요, 필수입니다. 음성학이 영국의 언어학에서 중요한 기본을 이루고 있다고 말 할 수 없다면 영국의 언어학은 그 만큼 빈약하게 될 것입니다. 한국에 음성학회가 있고 또 한글학회가 있지만, 그렇다고 하여 한국어 음성학이 한국어와는 다른 것이라거나, 한국어 음성학을 공부하지 않고 한국어를 연구할 수 있다는 뜻은 아닙니다. 의학에도 분야마다 전문의가 있듯이, 언어학도 이제 복잡하고 광범한 학문이 되었으므로 분야별로 전문가가 나오게 된 것뿐입니다. 따라서 "나는 통사론에 관심이 있으므로 소리말에는 관심이 없다"고 말하는 언어학자가 있다면, 이 것은 크게 잘못된 것입니다. 마찬가지로 "나는 소리에만 관심이 있으므로 통사론에는 관심이 없다"고 말하는 음성학자가 있다면, 이 또한 안되는 일입니다. 문의 구성과 어휘 요소와 아무 관련이 없는 말소리의 차이가 무슨 소용이 있으며, 통사 구조를 표현하고 저달하는 말소리를 연구하지 않고 어떻게 통사론을 연구할 수 있겠습니까? 다시 간추리면, 언어는 본질적으로 소리말이고, 언어의 특성과 사용 및 습득도 모두 소리말 형태로 나타납니다. 따라서, 영국의 대학은 음성학이 대단히 중요함을 인식하고, 언어학을 올바르게 다루는 세계의 여러 다른 대학과 마찬가지로 이에 따라 교육 과정을 편성하고 있습니다.

  • PDF

Pivot Discrimination Approach for Paraphrase Extraction from Bilingual Corpus (이중 언어 기반 패러프레이즈 추출을 위한 피봇 차별화 방법)

  • Park, Esther;Lee, Hyoung-Gyu;Kim, Min-Jeong;Rim, Hae-Chang
    • Korean Journal of Cognitive Science
    • /
    • v.22 no.1
    • /
    • pp.57-78
    • /
    • 2011
  • Paraphrasing is the act of writing a text using other words without altering the meaning. Paraphrases can be used in many fields of natural language processing. In particular, paraphrases can be incorporated in machine translation in order to improve the coverage and the quality of translation. Recently, the approaches on paraphrase extraction utilize bilingual parallel corpora, which consist of aligned sentence pairs. In these approaches, paraphrases are identified, from the word alignment result, by pivot phrases which are the phrases in one language to which two or more phrases are connected in the other language. However, the word alignment is itself a very difficult task, so there can be many alignment errors. Moreover, the alignment errors can lead to the problem of selecting incorrect pivot phrases. In this study, we propose a method in paraphrase extraction that discriminates good pivot phrases from bad pivot phrases. Each pivot phrase is weighted according to its reliability, which is scored by considering the lexical and part-of-speech information. The experimental result shows that the proposed method achieves higher precision and recall of the paraphrase extraction than the baseline. Also, we show that the extracted paraphrases can increase the coverage of the Korean-English machine translation.

  • PDF