• 제목/요약/키워드: unknown word

검색결과 70건 처리시간 0.027초

일반화된 미등록어 처리와 오류 수정규칙을 이용한 혼합형 품사태깅 (Hybrid POS Tagging with generalized unknown word handling and post error-correction rules)

  • 차정원;이원일;이근배;이종혁
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 1997년도 제9회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.88-93
    • /
    • 1997
  • 본 논문에서는 품사 태깅을 위해 여러 통계 모델을 실험을 통하여 비교하였으며 이를 토대로 통계적 모델을 구성하였다. 형태소 패턴 사전을 이용하여 미등록어의 위치와 개수에 관계없는 일반적인 방법의 미등록어 처리 방법을 개발하고 통계모델이 가지는 단점을 보완할 수 있는 오류 수정 규칙을 함께 이용하여 혼합형 품사 태깅 시스템인 $POSTAG^{i}$를 개발하였다. 미등록어를 추정하는 형태소 패턴 사전은 한국어 음절 정보와 용언의 불규칙 정보를 이용하여 구성하고 다어절어 사전을 이용하여 여러 어절에 걸쳐 나타나는 연어를 효과적으로 처리하면서 전체적인 태깅 정확도를 개선할 수 있다. 또 오류 수정 규칙은 Brill이 제안한 학습을 통하여 자동으로 얻어진다. 오류 수정 규칙의 자동 추출시에 몇 가지의 휴리스틱을 사용하여 보다 우수하고 일반적인 규clr을 추출할 수 있게 하였다. 10만의 형태소 품사 말뭉치로 학습하고 학습에 참여하지 않은 2만 5천여 형태소로 실험하여 97.28%의 정확도를 보였다.

  • PDF

An Experiment of Reading Aloud Meeting in English

  • Arimitsu, Yutaka;Yagi, Hidetsugu;Lee, Jae-Hoon;Wu, Zhiqiang
    • 공학교육연구
    • /
    • 제15권4호
    • /
    • pp.26-30
    • /
    • 2012
  • Nowadays, fewer Japanese university students are taking PhD degrees in the U.S.A than Koreans or Taiwanese. The language barrier is considered as one of major reasons for this. This lack of international education is one of the reasons cited, as Japan has been falling behind in industrial globalization. Reading aloud is a good practice for learning a foreign language, since many areas of our brains are simultaneously activated. Furthermore, students have a chance to overcome the psychological barrier in reading aloud in front of others, in a meeting. The authors start the voluntary meeting (which is unrelated to official classes of the curriculum) by reading English articles aloud. Topics in the meeting are selected from articles on web sites, therefore, (1) textbooks were not needed, (2) voice data could be listened to, and (3) the meaning and the pronunciation of an unknown word could be checked by web tools. Once the methodology has been mastered, volunteer students can manage the meeting. The authors introduce our experiments conducted at the Department of Mechanical Engineering, Ehime University.

접사정보 및 선호패턴을 이용한 복합명사의 역방향 분해 알고리즘 (A Reverse Segmentation Algorithm of Compound Nouns Using Affix Information and Preference Pattern)

  • 류방;백현철;김상복
    • 한국멀티미디어학회논문지
    • /
    • 제7권3호
    • /
    • pp.418-426
    • /
    • 2004
  • 본 논문에서는 음절간 상호정 보를 이용하여 한국어 복합명사의 역방향 분해 알고리즘을 제 안한다. 한국어 복합명사는 그 구조가 한자어에 의해 파생 한것이 대부분이며 음절 상호간 선호 음절이 존재하므로, 이 정보와 접사정보를 복합명사의 분해규칙으로 이용한다. 성능을 평가하기 위해 36061개의 복합명사를 이용하여 본 논문에서 제안한 알고리즘의 분해한 결과 99.3%의 분해 정확율을 얻었다. 실험과 관련한 기존 알고리즘간의 비교에서도 우수한 결과를 얻었으며, 특히 4음절과 5음절 복합명사의 경우 대부분 정확한 분해 결과를 얻었다.

  • PDF

일본에서 소아질환에 적용하는 한약치료에 대한 문헌고찰 (A Literature Study of Kampo Drug Treatment for Children in Japan)

  • 지현우;송창은;성현경
    • 대한한방소아과학회지
    • /
    • 제29권3호
    • /
    • pp.32-53
    • /
    • 2015
  • Objectives : This research aimed to analyze studies on pediatric disease treated by kampo drug, kind of kampo drug used in children, treatment period, and the result of kampo drug treatment for children in Japan. Methods : We got 263 search result with searching word 'kampo medicine' and '小兒', 'children', '乳兒' in J-stage. We selected 34 articles among them which were related to objective of research to analyze studies by type of pediatric disease treated with kampo drug, kinds of kampo drug for each disease, treatment period and result of kampo drug treatment for children. We considered frequency of kampo drug use & pediatric disease treated with kampo drug and significance of research. Results : According to analyzed results, respiratory diseases are the most frequent diseases that are healed by kampo drug. Next sequenced diseases are skin disease. In Kind of kampo drug for pediatric disease. Goreisan and Shosaikoto (柴胡桂枝湯), Shokenchuto (小建中湯) are used frequently. Also, various disease treated with kampo drug were improved. Conclusions : Japanese Doctors consider the Kampo drug is safe and has a lot of merit compared to modern medication. Especially for symptom with unknown origin & immune diseases such as upper respiratory tract infections. Referring to clinical cases of kampo drug in Japan, we will use kampo drug for various pediatric diseases in future.

반음절 문맥종속 모델을 이용한 한국어 4 연숫자음 인식에 관한 연구 (A Study on Korean 4-connected Digit Recognition Using Demi-syllable Context-dependent Models)

  • 이기영;최성호;이호영;배명진
    • 한국음향학회지
    • /
    • 제22권3호
    • /
    • pp.175-181
    • /
    • 2003
  • 한국어 숫자음은 단음절이며 연결된 숫자음 사이에 연음현상의 영향 때문에 한국어 연결 숫자음의 인식방법으로 반음절에 기반한 모델들이 제시되어 왔다. 기존에 제안된 반음절이나 반음절+반음절의 인식모델을 이용한 방법에서는 아직까지 우수한 인식성능을 보이지 못하고 있다. 본 논문에서는 확장된 문맥종속 반음절 모델을 이용한 한국어 4 연숫자음 인식방법을 제안한다. 실험에서 연결숫자음은 SiTEC의 4 연숫자음 데이터 베이스를 사용하였으며 학습과 인식방법으로는 HTK 3.0의 C-HMM을 이용하였다. 기존의 방법들과 인식율을 비교해 본 결과, 92%의 비교적 우수한 인식성능을 보였다.

간호대학생과 의과대학생의 임종돌봄 실습 후 죽음 정서의 의미 (A Meaning of Death through Emotional Expression about Death after Nursing and Medical Students' End-of-Life Care Practice)

  • 조계화
    • 성인간호학회지
    • /
    • 제22권3호
    • /
    • pp.329-341
    • /
    • 2010
  • Purpose: The purpose of this study was to understand the meaning of death experienced by medical and nursing students through end-of-life care practice. Methods: Data were collected by in-depth interviews with twelve (six nursing and six medical) students. Conventional qualitative content analysis was used to analyze the data. Results: This findings were analyzed in three areas: 'feeling from the word of death', 'color association of death', and 'relation between life and death'. Results were three major themes and sixteen categories from the analysis. Three major themes include 'reality of uncertain death', 'have to leave, and 'new perception about death'. Sixteen categories include 'being well', 'fear', 'unknown', 'boundless', 'being with', 'out of sight', 'new start', 'go back to', 'place going by itself', 'place to meet with', 'being transformed', 'a sense of futility', 'the same point', 'a different point', 'continuous line', and 'a crossroad'. Conclusion: The findings suggest a number of themes that nursing and medical students reported about the end of life experiences that could be explored as a way of improving end of life care.

Speaker Adaptation Using i-Vector Based Clustering

  • Kim, Minsoo;Jang, Gil-Jin;Kim, Ji-Hwan;Lee, Minho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권7호
    • /
    • pp.2785-2799
    • /
    • 2020
  • We propose a novel speaker adaptation method using acoustic model clustering. The similarity of different speakers is defined by the cosine distance between their i-vectors (intermediate vectors), and various efficient clustering algorithms are applied to obtain a number of speaker subsets with different characteristics. The speaker-independent model is then retrained with the training data of the individual speaker subsets grouped by the clustering results, and an unknown speech is recognized by the retrained model of the closest cluster. The proposed method is applied to a large-scale speech recognition system implemented by a hybrid hidden Markov model and deep neural network framework. An experiment was conducted to evaluate the word error rates using Resource Management database. When the proposed speaker adaptation method using i-vector based clustering was applied, the performance, as compared to that of the conventional speaker-independent speech recognition model, was improved relatively by as much as 12.2% for the conventional fully neural network, and by as much as 10.5% for the bidirectional long short-term memory.

Risk analysis of offshore terminals in the Caspian Sea

  • Mokhtari, Kambiz;Amanee, Jamshid
    • Ocean Systems Engineering
    • /
    • 제9권3호
    • /
    • pp.261-285
    • /
    • 2019
  • Nowadays in offshore industry there are emerging hazards with vague property such as act of terrorism, act of war, unforeseen natural disasters such as tsunami, etc. Therefore industry professionals such as offshore energy insurers, safety engineers and risk managers in order to determine the failure rates and frequencies for the potential hazards where there is no data available, they need to use an appropriate method to overcome this difficulty. Furthermore in conventional risk based analysis models such as when using a fault tree analysis, hazards with vague properties are normally waived and ignored. In other word in previous situations only a traditional probability based fault tree analysis could be implemented. To overcome this shortcoming fuzzy set theory is applied to fault tree analysis to combine the known and unknown data in which the pre-combined result will be determined under a fuzzy environment. This has been fulfilled by integration of a generic bow-tie based risk analysis model into the risk assessment phase of the Risk Management (RM) cycles as a backbone of the phase. For this reason Fault Tree Analysis (FTA) and Event Tree Analysis (ETA) are used to analyse one of the significant risk factors associated in offshore terminals. This process will eventually help the insurers and risk managers in marine and offshore industries to investigate the potential hazards more in detail if there is vagueness. For this purpose a case study of offshore terminal while coinciding with the nature of the Caspian Sea was decided to be examined.

다양한 임무 부여시 기능적 자기공명영상에서 관찰된 소뇌의 활성화 (Cerebellar Activation Related to Various Tasks Using fMRI)

  • 황승배;곽효성;이상용;진공용;한영민;김영곤;정경호
    • Investigative Magnetic Resonance Imaging
    • /
    • 제13권1호
    • /
    • pp.47-53
    • /
    • 2009
  • 목적: 기능적 자기 공명 영상을 이용하여 운동, 감각, 단어만들기, 듣고이해하기, 그리고 기억하기 등의 자극을 주어 소뇌 활성화를 평가하는 것이다. 대상 및 방법: 11명의 건강한 오른손잡이 지원자 (남: 여, 6:5, mean age: 27.4세)를 대상으로 1.5T 자기공명영상기기의 BOLD기법을 이용하여 뇌 전체를 축상면으로 기능적 자기공명영상을 얻었고 패러다임은 5번의 자극과 휴식을 반복 사용하였다. 왼쪽 손가락의 복잡한 운동, 감각 자극, 단어만들기, 듣고이해하기, 그리고 기억하기를 활성화 자극으로 사용하였고, 한계치는 p = 0.001를 사용하여, 소뇌 내 활성화를 SPM 5를 이용하여 활성화된 영역의 부위와 활성화정도를 평가하였다. 결과: 소뇌 활성화는 운동, 단어만들기, 그리고 기억하기에서 관찰되었다. 운동 자극에서는 949 영역이 활성화되었고 평균반응정도는 0.68이었고 단어만들기 자극에서는 319 영역이 활성화되었고 평균반응정도는 0.15이었으며 기억자극에서는 330의 영역이 활성화되었고 평균반응정도는 0.26이었다. 결론: 소뇌는 운동, 단어만들기, 그리고 기억하기등 다양한 기능적 자극에 관련이 되어 있으며, 이중에서도 운동기능에 가장 연관이 있었다. 기능적 자기 공명 영상은 소뇌 기능 연구하는 방법으로 이용이 가능할 것이다.

  • PDF

Part-of-speech Tagging for Hindi Corpus in Poor Resource Scenario

  • Modi, Deepa;Nain, Neeta;Nehra, Maninder
    • Journal of Multimedia Information System
    • /
    • 제5권3호
    • /
    • pp.147-154
    • /
    • 2018
  • Natural language processing (NLP) is an emerging research area in which we study how machines can be used to perceive and alter the text written in natural languages. We can perform different tasks on natural languages by analyzing them through various annotational tasks like parsing, chunking, part-of-speech tagging and lexical analysis etc. These annotational tasks depend on morphological structure of a particular natural language. The focus of this work is part-of-speech tagging (POS tagging) on Hindi language. Part-of-speech tagging also known as grammatical tagging is a process of assigning different grammatical categories to each word of a given text. These grammatical categories can be noun, verb, time, date, number etc. Hindi is the most widely used and official language of India. It is also among the top five most spoken languages of the world. For English and other languages, a diverse range of POS taggers are available, but these POS taggers can not be applied on the Hindi language as Hindi is one of the most morphologically rich language. Furthermore there is a significant difference between the morphological structures of these languages. Thus in this work, a POS tagger system is presented for the Hindi language. For Hindi POS tagging a hybrid approach is presented in this paper which combines "Probability-based and Rule-based" approaches. For known word tagging a Unigram model of probability class is used, whereas for tagging unknown words various lexical and contextual features are used. Various finite state machine automata are constructed for demonstrating different rules and then regular expressions are used to implement these rules. A tagset is also prepared for this task, which contains 29 standard part-of-speech tags. The tagset also includes two unique tags, i.e., date tag and time tag. These date and time tags support all possible formats. Regular expressions are used to implement all pattern based tags like time, date, number and special symbols. The aim of the presented approach is to increase the correctness of an automatic Hindi POS tagging while bounding the requirement of a large human-made corpus. This hybrid approach uses a probability-based model to increase automatic tagging and a rule-based model to bound the requirement of an already trained corpus. This approach is based on very small labeled training set (around 9,000 words) and yields 96.54% of best precision and 95.08% of average precision. The approach also yields best accuracy of 91.39% and an average accuracy of 88.15%.