• Title/Summary/Keyword: 단어 중의성

Search Result 121, Processing Time 0.02 seconds

Lexical Expansion of Sentence Parsers (구문분석기의 어휘확장)

  • Kim, Min-Chan;Kim, Gon;J. Bae, Jae-Hak
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2005.05a
    • /
    • pp.755-758
    • /
    • 2005
  • 본 논문에서는 구문분석기의 어휘확장을 통해 구문분석의 성공률을 높이고자 하였다. 구문분석은 문장내 구성성분들이 가지는 통사적인 관련성을 파악하는 작업이다. 구문분석 실패의 가장 빈번한 원인 중의 하나는 미등록 어휘의 출현이다. 결여된 어휘문제를 해결하는 것은 구문분석의 성공률을 높이고, 원문이해 시스템을 보다 더 견고하게 하는데 관건으로 작용한다. 이를 위하여, 본 논문에서는 구분분석기 LGPI+ 의 어휘 사전에 존재하지 않는 단어들을 또 다른 어휘자원인 WordNet을 이용하여 해결하고자 하였다. 구체적으로는, (1) 미등록 어휘를 WordNet에서 찾고, (2) 그 유의어 정보를 파악하여, (3) LGPI+ 어휘사전에 추가한다. 실험을 통하여 구문분석의 실패를 해결하고, 정확도와 성공률을 높일 수 있음을 확인하였다.

  • PDF

Analysis of Music Mood Class using Folksonomy Tags (폭소노미 분위기 태그를 이용한 음악의 분위기 유형 분석)

  • Moon, Chang Bae;Kim, HyunSoo;Kim, Byeong Man
    • Science of Emotion and Sensibility
    • /
    • v.16 no.3
    • /
    • pp.363-372
    • /
    • 2013
  • When retrieving music with folksonomy tags, internal use of numeric tags (AV tags: tags consisting of Arousal and Valence values ) instead of word tags can partially solve the problem posed by synonyms. However, the two predecessor tasks should be done correctly; the first task is to map word tags to their numeric tags; the second is to get numeric tags of the music pieces to be retrieved. The first task is verified through our prior study and thus, in this paper, its significance is seen for the second task. To this end, we propose the music mapping table defining the relation between AV values and music and ANOVA tests are performed for analysis. The result shows that the arousal values and valence values of music have different distributions for 12 mood tags with or without synonymy and that their type I error values are P<0.001. Consequently, it is checked that the distribution of AV values is different according to music mood.

  • PDF

Decision Tree based Disambiguation of Semantic Roles for Korean Adverbial Postpositions in Korean-English Machine Translation (한영 기계번역에서 결정 트리 학습에 의한 한국어 부사격 조사의 의미 중의성 해소)

  • Park, Seong-Bae;Zhang, Byoung-Tak;Kim, Yung-Taek
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.6
    • /
    • pp.668-677
    • /
    • 2000
  • Korean has the characteristics that case postpositions determine the syntactic roles of phrases and a postposition may have more than one meanings. In particular, the adverbial postpositions make translation from Korean to English difficult, because they can have various meanings. In this paper, we describe a method for resolving such semantic ambiguities of Korean adverbial postpositions using decision trees. The training examples for decision tree induction are extracted from a corpus consisting of 0.5 million words, and the semantic roles for adverbial postpositions are classified into 25 classes. The lack of training examples in decision tree induction is overcome by clustering words into classes using a greedy clustering algorithm. The cross validation results show that the presented method achieved 76.2% of precision on the average, which means 26.0% improvement over the method determining the semantic role of an adverbial postposition as the most frequently appearing role.

  • PDF

A Visual Study of the Quality of English Pronunciation Using the Praat Program (Praat을 활용한 영어발음특성의 시각적 연구)

  • Park, Heesuk
    • Journal of Digital Contents Society
    • /
    • v.14 no.3
    • /
    • pp.323-331
    • /
    • 2013
  • This study aims at investigating and comparing the diphthongs, words, and sentences between two Korean highschool students groups using the Praat program. To do this English words and sentences were uttered and recorded by twenty Korean subjects; each group has ten subjects. All the subjects are female and their grades range from freshman to sophomore. Acoustic features were measured from a sound spectrogram with the help of the Praat software program and analyzed through statistical analysis. Results showed that the lengths of diphthongs and words were different between two groups, but the difference was not significant. However, in the lengths of sentence utterance, the group of 5 to 6 grade students in the current grading system pronounced longer than that of 1 to 2 grade students. Especially in the pronunciation of the first two sentences with more than five words, the difference was significant. From the data of the overall sum of words between the two subject groups, we were able to find out that the differences of the lengths of the words with the diphthongs were not significant, but those of the sentences with more than five words were significant. In the pronunciation of the words between coat and code, the length of the diphthong in coat was smaller than that of in code.

Double Meaning Inherent in the Film : focused on the Movie "Perfume" (영화 속에 내재된 이중적 의미 : 영화 "향수"를 중심으로)

  • Kim, Seong-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.3
    • /
    • pp.147-156
    • /
    • 2011
  • Double meaning inherent in the film is interpreted in the same context as it's called ambiguity in the literature. Ambiguity means that one word or one sentence can be interpreted in two or more meanings. In the movie, one behavior of a character, a prop or a costume with two or more meanings faces the audience. A famous French direct Francois Ozon said "The director has always made movie contrary to his/her latest movie." This means that the film should seek to reform and it's his philosophy about the movie. Actually, it's the film's basis that a movie breaks the taboo. The film has always challenged taboos and led the progressivism. Taboos of western films are more intense than our country's moral and ethical level. Their taboos are to deny the sacred and legitimacy of Christianity. Particularly, as many people talk about the film to deny the divinity of Jesus Christ, it's sufficient to elicit an argument pro and con. This study is to choose the movie "Perfume"of director Tom Tykwer as a text, examine the highly elaborate and strategical double meaning in the movie and analyze the western taboos to deceive the audience skillfully.

Target Word Selection Disambiguation using Untagged Text Data in English-Korean Machine Translation (영한 기계 번역에서 미가공 텍스트 데이터를 이용한 대역어 선택 중의성 해소)

  • Kim Yu-Seop;Chang Jeong-Ho
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.749-758
    • /
    • 2004
  • In this paper, we propose a new method utilizing only raw corpus without additional human effort for disambiguation of target word selection in English-Korean machine translation. We use two data-driven techniques; one is the Latent Semantic Analysis(LSA) and the other the Probabilistic Latent Semantic Analysis(PLSA). These two techniques can represent complex semantic structures in given contexts like text passages. We construct linguistic semantic knowledge by using the two techniques and use the knowledge for target word selection in English-Korean machine translation. For target word selection, we utilize a grammatical relationship stored in a dictionary. We use k- nearest neighbor learning algorithm for the resolution of data sparseness Problem in target word selection and estimate the distance between instances based on these models. In experiments, we use TREC data of AP news for construction of latent semantic space and Wail Street Journal corpus for evaluation of target word selection. Through the Latent Semantic Analysis methods, the accuracy of target word selection has improved over 10% and PLSA has showed better accuracy than LSA method. finally we have showed the relatedness between the accuracy and two important factors ; one is dimensionality of latent space and k value of k-NT learning by using correlation calculation.

A Basis of Database Semantics: from Feature Structures to Tables (데이터베이스 의미론의 기초: 자질 구조에서 테이블로)

  • Lee, Ki-Yong
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.297-303
    • /
    • 1999
  • 오늘날 전산망을 통해 대량의 다양한 언어 정보가 일상 언어로 교환되고 있다. 따라서 대량의 이러한 정보를 효율적으로 처리할 수 있는 언어 정보 처리 시스템이 필요하다. Hausser (1999)와 이기용(1999)는 그러한 언어 정보 처리 시스템으로 데이터베이스 의미론을 주장하였다. 이 의미론의 특징은 자연언어의 정보 처리 시스템 구축에 상업용 데이터베이스 관리 시스템을 활용한다는 점이다. 이때 야기되는 문제 중의 하나가 표상(representation)의 문제이다. 그 이유는 언어학의 표상 방법이 데이터베이스 관리 시스템의 표상 방법과 다르기 때문이다. 특히, 관계형 데이터베이스 관리 시스템(RDBMS)에서는 테이블 (table) 형식으로 각종 정보를 표시한다. 따라서, 이 논문의 주안점(主眼点)은 언어학에서 흔히 쓰이는 표상 방법, 즉 문장의 통사 구조를 표시하는 수형(tree)이나 의미 구조를 표시하는 논리 형태(logical form), 또는 단어나 구의 특성을 나타내는 자질 구조(feature structure)를 테이블 형식으로 대체하는 방법을 모색하는 것이다. 더욱이 관계형 데이터베이스 관리 시스템에서는 테이블에 대한 각종 연산, 특히 두 테이블을 연결(link)하는 작업이 가능하고 이런 연산 과정을 통해 정보를 통합하거나 여과할 수 있기 때문에 관련 정보를 하나의 테이블에 표상하거나 정보 자료의 분산 저장과 자료의 순수성을 유지하는 것이 용이하다. 이 논문은 곧 이러한 점을 가급적 간단한 예를 들어 설명하는 데 그 목적이 있다.

  • PDF

Design and Implementation of a Swearing Remover Program on Web board (웹 게시판 비속어 처리 프로그램의 설계 및 구현)

  • 조아영
    • Journal of the Korea Computer Industry Society
    • /
    • v.2 no.10
    • /
    • pp.1317-1328
    • /
    • 2001
  • The existing swearing remover programs could not have blocked even slightly transformed swearings because of their input blocking properties. To overcome these defects, this paper implemented a supervising program which analysize and remove/replace swearings on web board. For this purpose this paper first classified the patterns of swearings on web board and then implemented a tokenizer which can analysize those patterns. The module tokenizing and removing/replacing swearings on each web board was implemented as a thread so that it could be parallely controlled. As a result of running this Program on some web boards , we found out it had detected almost of the swearings as 91.9% of recall but it could not meet our purpose sufficiently on morphological transformed swearings and swearings in context. So the studies will be continued about processing on morphological ambiguous words, ambiguous words in meaning and sweaings in context by extracting this program's manual mode. We expect this program could induce the users to proper usage of words and replace the manual works of web board managers in schools, public bodies, broadcasting stations etc.

  • PDF

Linking Korean Predicates to Knowledge Base Properties (한국어 서술어와 지식베이스 프로퍼티 연결)

  • Won, Yousung;Woo, Jongseong;Kim, Jiseong;Hahm, YoungGyun;Choi, Key-Sun
    • Journal of KIISE
    • /
    • v.42 no.12
    • /
    • pp.1568-1574
    • /
    • 2015
  • Relation extraction plays a role in for the process of transforming a sentence into a form of knowledge base. In this paper, we focus on predicates in a sentence and aim to identify the relevant knowledge base properties required to elucidate the relationship between entities, which enables a computer to understand the meaning of a sentence more clearly. Distant Supervision is a well-known approach for relation extraction, and it performs lexicalization tasks for knowledge base properties by generating a large amount of labeled data automatically. In other words, the predicate in a sentence will be linked or mapped to the possible properties which are defined by some ontologies in the knowledge base. This lexical and ontological linking of information provides us with a way of generating structured information and a basis for enrichment of the knowledge base.

Structural Disambiguation using Mutual Information and the Measure of Confidence (상호 정보를 이용한 구조적 모호성 해소와 결과에 대한 확신도 측정)

  • 심광섭
    • Korean Journal of Cognitive Science
    • /
    • v.4 no.1
    • /
    • pp.153-176
    • /
    • 1993
  • Structual ambiguity is one of those problem that arise in the analysis of natural language sentences.It has been considered very difficult to solve the problem.Structural ambiguity,however,should be resolved no matter how difficult it may be.Otherwise natural language processing could be virtually impossible.A statistical approach to structural disambiguation is proposed in this dissertation.The information-theoretic concept of mutual information has been empolyed in resolving structural ambiguity Mutual information can be acquired in an automatic way.from text corpora. If a structural disambiguation subsystem had the capability of self-evaluating whether the results of structural disambiguation are correct or not.it would be possible to develop a more intelligent natural language proessing system.In this paper,the concept of confidence measure is also proposed to endow the disambiguation subsystem with such intelligence.Confidence measure is a numeric value calculated after structural disambiguation. Some experiments were performed in order to show the validity of the approach.Mutual information was auto matically acquired from a corpus of 1.6milion words that were collected from scientific abstracts.The accuracy of structural disambiguation was 80%when performed over 1,639 test sentences.Notice that there was no manual tuning in advance for the experiments.The task of detecting and correcting errors in structural disambiguation will be performed very effectively if the concept of confidence measure is employed in the process.