• Title/Summary/Keyword: phrase-based

Search Result 233, Processing Time 0.022 seconds

Phrase-Pattern-based Korean-to-English Machine Translation System using Two Level Word Selection (두단계 대역어선택 방식을 이용한 구단위 패턴기반 한영 기계번역 시스템)

  • Kim, Jung-Jae;Park, Jun-Sik;Choi, Key-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.209-214
    • /
    • 1999
  • 패턴기반기계번역방식은 원시언어패턴과 그에 대한 대역언어패턴들의 쌍을 이용하여 구문분석과 변환을 수행하는 기계번역방식이다. 패턴기반 기계번역방식은 번역할 때 발생하는 애매성을 해소하기 위해 패턴의 길이를 문장단위까지 늘이기 때문에, 패턴의 수가 급증하는 문제점을 가진다. 본 논문에서는 패턴의 단위를 구단위로 한정시킬 때 발생하는 애매성을 해소하는 방법으로 시소러스를 기반으로 한 두단계 대역어 선택 방식을 제안함으로써 효과적으로 애매성을 감소시키면서 패턴의 길이를 줄이는 모델을 제시한다. 두단계 대역어 선택 방식은 원시언어의 한 패턴에 대해 여러 가능한 목적언어의 대역패턴들이 있을 때, 첫 번째 단계에서는 원시언어 내에서의 제약조건에 맞는 몇가지 대역패턴들을 선택하고, 두번째 단계에서는 목적언어 내에서의 제약조건에 가장 적합한 하나의 대역패턴을 선택하는 방식이다. 또한 본 논문에서는 이와 같은 모델에서 패턴의 수가 코퍼스의 증가에 따른 수렴가능성을 논한다.

  • PDF

A Neural Network Based Korean Segmental Duration Modeling Using Tonal Information of Phonemes (음소별 성조 정보를 이용한 신경망 기반의 한국어 음소 지속시간 모델링)

  • 김은경;이상호;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.6
    • /
    • pp.84-88
    • /
    • 1999
  • The accurate estimation of segmental duration is crucial for natural-sounding text-to-speech synthesis. For predicting Korean segmental durations, conventional methods utilized phonemic context, part-of-speech context and locational information in prosodic phrase. In this paper, the tonal information of phonemes is employed for more accurate prediction. After defining two non-boundary tones and six boundary tones, we annotated the tonal label on each syllable of 400 sentences. To predict segmental duration using tonal information, we constructed neural networks with a real-valued output node predicting phonemic duration and trained them by backpropagation algorithm. Experimental results showed that the proposed features are effective for predicting Korean segmental durations, and we got 0.863 correlation coefficient of the observed durations and predicted ones.

  • PDF

An Order Promising Procedure for Simulation-Based Scheduling Systems (시뮬레이션에 기초한 일정계획 시스템에서의 납기산정 절차)

  • 박문원;최성훈;이근철;김영대
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2002.05a
    • /
    • pp.103-108
    • /
    • 2002
  • 본 연구는 주문형 반도체 생산공장이나 인쇄회로기판 제조공장과 같이 매우 다양한 제품들을 주문에 의해 생산하며 제조공정이 매우 길고 복잡한 생산 시스템에 대하여 다룬다. 이러한 생산 시스템은 그 특성상 APS (Advanced Planning and Scheduling)의 일정계획(scheduling) 모듈로서 시뮬레이션(simulation)이 유일한 대안이 되는 경우가 발생할 수 있다. 시뮬레이션 기법은 복잡한 상황도 대부분 묘사가 가능하기 때문에 사실적이고 실현 가능한 일정계획을 생성할 수 있다는 장점이 있는 반면 수행시간이 상당히 길다는 단점이 있다. 기업이 경쟁력을 가지기 위해서는 고객이 의뢰한 주문에 대하여 가능 납기(가능한 생산완료 시점)를 빠른 시간 내에 정확히 알려주어야만 한다. 따라서, APS 역시 "즉시 납기산정, 정시 납품:(commit now, deliver on time)을 캐치프레이즈(catch phrase)로 한다. 하지만 시뮬레이션은 :정시 납품:이 가능한 납기를 산정할 수 있을지는 모르지만 "즉시 납기산정"이 불가능하다. 따라서, 본 연구에서는 시뮬레이션에 근거한 일정계획 모듈을 가지고 있는 APS 시스템에서 납기산정을 빠르고 정확하게 할 수 있는 방법론을 제시한다. 이 방법론은 기존의 MRP Ⅱ 및 ERP 시스템에서 행하던 ATP (available to promise) 흑은 CTP (capable to promise) 기법과 차별화 되며, 의뢰한 주문의 생산착수 시점과 제조 리드타임을 합리적이고 신속하게 산출한다.

  • PDF

A Study on the Knowledge Structure of Cancer Survivors based on Social Network Analysis (네트워크 분석을 통한 암 생존자 지식구조 연구)

  • Kwon, Sun Young;Bae, Ka Ryeong
    • Journal of Korean Academy of Nursing
    • /
    • v.46 no.1
    • /
    • pp.50-58
    • /
    • 2016
  • Purpose: The purpose of this study was to identify the knowledge structure of cancer survivors. Methods: For data, 1099 articles were collected, with 365 keywords as a Noun phrase extracted from the articles and standardized for analyzing. Co-occurrence matrix were generated via a cosine similarity measure, and then the network analysis and visualization using PFNet and NodeXL were applied to visualize intellectual interchanges among keywords. Results: According to the result of the content analysis and the cluster analysis of author keywords from cancer survivors articles, keywords such as 'quality of life', 'breast neoplasms', 'cancer survivors', 'neoplasms', 'exercise' had a high degree centrality. The 9 most important research topics concerning cancer survivors were 'cancer-related symptoms and nursing', 'cancer treatment-related issues', 'late effects', 'psychosocial issues', 'healthy living managements', 'social supports', 'palliative cares', 'research methodology', and 'research participants'. Conclusion: Through this study, the knowledge structure of cancer survivors was identified. The 9 topics identified in this study can provide useful research direction for the development of nursing in cancer survivor research areas. The Network analysis used in this study will be useful for identifying the knowledge structure and identifying general views and current cancer survivor research trends.

Improve Performance of Phrase-based Statistical Machine Translation through Standardizing Korean Allomorph (한국어의 이형태 표준화를 통한 구 기반 통계적 기계 번역 성능 향상)

  • Lee, Won-Kee;Kim, Young-Gil;Lee, Eui-Hyun;Kwon, Hong-Seok;Jo, Seung-U;Cho, Hyung-Mi;Lee, Jong-Hyeok
    • Annual Conference on Human and Language Technology
    • /
    • 2016.10a
    • /
    • pp.285-290
    • /
    • 2016
  • 한국어는 형태론적으로 굴절어에 속하는 언어로서, 어휘의 형태가 문장 속에서 문법적인 기능을 하게 되고, 형태론적으로 풍부한 언어라는 특징 때문에 조사나 어미와 같은 기능어들이 다양하게 내용어들과 결합한다. 이와 같은 특징들은 한국어를 대상으로 하는 구 기반 통계적 기계번역 시스템에서 데이터 부족 문제(Data Sparseness problem)를 더욱 크게 부각시킨다. 하지만, 한국어의 몇몇 조사와 어미는 함께 결합되는 내용어에 따라 의미는 같지만 두 가지의 형태를 가지는 이형태로 존재한다. 따라서 본 논문에서 이러한 이형태들을 하나로 표준화하여 데이터부족 문제를 완화하고, 베트남-한국어 통계적 기계 번역에서 성능이 개선됨을 보였다.

  • PDF

Extension and Management of Verb Phrase Patterns based on Lexicon Reconstruction and Target Word Information (사전 재구성과 대역어 정보를 통한 동사구 패턴의 확장 및 관리)

  • Hong, Mun-Pyo;Kim, Young-Kil;Ryu, Chul;Choi, Sung-Kwon;Park, Sang-Kyu
    • Annual Conference on Human and Language Technology
    • /
    • 2002.10e
    • /
    • pp.103-107
    • /
    • 2002
  • 데이터 기반 기계번역의 성공여부는 대량의 데이터를 단기간에 구축하는 방법과, 또 구축된 데이터에 대한 효과적인 관리 방법이 좌우한다고 할 수 있다. 대표적인 데이터 기반 기계번역 방법론인 예제 기반 기계번역 방식이나 패턴 기반 기계번역 방식에서는 최소한의 학습 내지는 학습과정 없이 데이터를 구축하는 데에 연구가 중점적으로 이루어져왔으나, 데이터의 관리 문제에 대해서는 많은 연구가 이루어지지 못하였다. 그러나 데이터의 확장 못지않게 데이터의 효율적인 관리도 데이터 기반 기계번역 시스템의 개발에서 매우 중요하다. 이 논문에서는 사/피동 링크 등을 이용하여 사전을 재구성하는 것이 데이터의 일관성과 관리성을 향상시키고, 이론적인 면에서는 정보 기술상의 잉여성을 줄인다는 점을 보인다. 또한 이러한 정보에 기반하여 기구축된 동사구 패턴으로부터 대역어 정보를 이용하여 새로운 패턴을 만들어내는 방법론도 제시한다.

  • PDF

Processing Scrambled Wh-Constructions in Head-Final Languages: Dependency Resolution and Feature Checking

  • Hahn, Hye-ryeong;Hong, Seungjin
    • Language and Information
    • /
    • v.18 no.2
    • /
    • pp.59-79
    • /
    • 2014
  • This paper aims at exploring the processing mechanism of filler-gap dependency resolution and feature checking in Korean wh-constructions. Based on their findings on Japanese sentence processing, Aoshima et al. (2004) have argued that the parser posits a gap in the embedded clause in head-final languages, unlike in head-initial languages, where the parser posits a gap in the matrix clause. In order to verify their findings in the Korean context, and to further explore the mechanisms involved in processing Korean wh-constructions, the present study replicated the study done by Aoshima et al., with some modifications of problematic areas in their original design. Sixty-four Korean native speakers were presented Korean sentences containing a wh-phrase in four conditions, with word order and complementizer type as the two main factors. The participants read sentences segment-by-segment, and the reading times at each segment were measured. The reading time analysis showed that there was no such slowdown at the embedded verb in the scrambled conditions as observed in Aoshima et al. Instead, there was a clear indication of the wh-feature checking process in terms of a major slowdown at the relevant region.

  • PDF

Examining Line-breaks in Korean Language Textbooks: the Promotion of Word Spacing and Reading Skills (한국어 교재의 행 바꾸기 -띄어쓰기와 읽기 능력의 계발 -)

  • Cho, In Jung;Kim, Danbee
    • Journal of Korean language education
    • /
    • v.23 no.1
    • /
    • pp.77-100
    • /
    • 2012
  • This study investigates issues in relation to text segmenting, in particular, line breaks in Korean language textbooks. Research on L1 and L2 reading has shown that readers process texts by chunking (grouping words into phrases or meaningful syntactic units) and, therefore, phrase-cued texts are helpful for readers whose syntactic knowledge has not yet been fully developed. In other words, it would be important for language textbooks to avoid awkward syntactic divisions at the end of a line, in particular, those textbooks for beginners and intermediate level learners. According to our analysis of a number of major Korean language textbooks for beginner-level learners, however, many textbooks were found to display line-breaks of awkward syntactic division. Moreover, some textbooks displayed frequent instances where a single word (or eojeol in the case of Korean) is split between different lines. This can hamper not only learners' learning of the rules of spaces between eojeols in Korean, but also learners' development in automatic word recognition, which is an essential part of reading processes. Based on the findings of our textbook analysis and of existing research on reading, this study suggests ways to overcome awkward line-breaks in Korean language textbooks.

Myanmar Articulation, Resonation, Nasal Emission, and Nasal Turbulence Test: A Preliminary Study

  • Kalyanee Makarabhirom;Benjamas Prathanee;Ampika Rattanapitak
    • Archives of Plastic Surgery
    • /
    • v.50 no.5
    • /
    • pp.468-477
    • /
    • 2023
  • Background This article describes the development of the Myanmar Articulation, Resonation, Nasal Emission, and Nasal Turbulence test for children with cleft lip and palate (CLP), and evaluation of its validity and reliability. Methods It was created by three Thai researchers and a Burmese research assistant based on Burmese phonology. The content validity was evaluated by six Burmese language experts. All test items were divided into three groups: high-pressure oral consonants, low-pressure oral consonants, and nasal consonants. Results All items (58-word and 32-phrase/sentence) gave an excellent level of the expert agreement (item-level content validity indexes = 1.00). The target items were illustrated as color pictures. Each picture was clearly drawn and easy to identify. As a pilot study of face validity, all pictures were administered to 10 typical-developing children. The actual testing was assessed by 10 CLP children, and the developed test was analyzed through consultation of the Burmese teachers and interpreters from a speech camp. Testing scores for a total including three groups of target items were shown acceptable for internal consistency reliability (ranged from 0.4 to 0.88). Conclusion The constructed test is valid in terms of its content.

Feature Generation of Dictionary for Named-Entity Recognition based on Machine Learning (기계학습 기반 개체명 인식을 위한 사전 자질 생성)

  • Kim, Jae-Hoon;Kim, Hyung-Chul;Choi, Yun-Soo
    • Journal of Information Management
    • /
    • v.41 no.2
    • /
    • pp.31-46
    • /
    • 2010
  • Now named-entity recognition(NER) as a part of information extraction has been used in the fields of information retrieval as well as question-answering systems. Unlike words, named-entities(NEs) are generated and changed steadily in documents on the Web, newspapers, and so on. The NE generation causes an unknown word problem and makes many application systems with NER difficult. In order to alleviate this problem, this paper proposes a new feature generation method for machine learning-based NER. In general features in machine learning-based NER are related with words, but entities in named-entity dictionaries are related to phrases. So the entities are not able to be directly used as features of the NER systems. This paper proposes an encoding scheme as a feature generation method which converts phrase entities into features of word units. Futhermore, due to this scheme, entities with semantic information in WordNet can be converted into features of the NER systems. Through our experiments we have shown that the performance is increased by about 6% of F1 score and the errors is reduced by about 38%.