• Title/Summary/Keyword: 범주

Search Result 3,933, Processing Time 0.035 seconds

Implementation of Case Phenomena in the Korean TCCG System (유형상속 결합범주문법에서의 격현상 구현)

  • Lee, Wha Yun;Lee, Yong-Hun
    • Annual Conference on Human and Language Technology
    • /
    • 2010.10a
    • /
    • pp.118-122
    • /
    • 2010
  • 격현상은 한국어의 여러 가지 문법현상들을 분석하고 이를 전산학적으로 구현하는데 아주 중요한 부분 중의 하나이다. 결합범주문법(Combinatory Categorial Grammar; CCG)을 이용한 대부분의 기존 연구들에서는 격조사나 보조사들에도 별도의 통사범주들을 할당한 후, 이들 조사들이 명사와 결합하여 하나의 명사구를 구성하는 것으로 조사들을 처리하고 있다. 그러나 이러한 방법들은 이론적인 문제점들뿐만 아니라 전산적 구현에 있어서도 문제점을 안고 있다. 본 논문에서는 이러한 문제점들을 해결하면서도 한국어의 격현상을 효과적으로 구현할 수 있는 방법을 소개한다. 본 논문에서 사용하려고 하는 문법공학 시스템은 한국어를 위한 유형상속 결합범주문법(Type-inherited Combinatory Categorial Grammar; TCCG)이다. 이 시스템 안에서는 명사구의 격조사나 보조사들이 별도의 통사범주들을 할당받지 않고, 명사의 굴절규칙(inflectional rules)에 의하여 명사와 결합하게 된다. 따라서 국어의 기본적인 격조사들을 효율적으로 구현할 수 있을 뿐만 아니라 보조사나 격조사 탈락현상들도 효과적으로 분석하고 구현할 수 있게 된다.

  • PDF

Dynamic Sampling Scheduler for Unbalanced Data Classification (불균형 범주 분류를 위한 동적 샘플링 스케줄러)

  • Seong, Su-Jin;Park, Won-Joo;Lee, Yong-Tae;Cha, Jeong-Won
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.221-226
    • /
    • 2021
  • 우리는 범주 불균형 분류 문제를 해결하기 위해 학습 과정 중 범주 크기 기반 배치 샘플링 방법 전환을 위한 스케줄링 방법을 제안한다. 범주별 샘플링 확률로 범주 크기의 역수(LWRS-Reciporcal)와 범주 비율의 반수(LWRS-Ratio)를 적용하여 각각 실험을 진행하였고, LWRS-Reciporcal 방법이 F1 성능 개선에 더 효과적인 것을 확인하였다. 더하여 고정된 샘플링 확률값으로 인해 발생할 수 있는 또 다른 편향 문제를 완화하기 위해 학습 과정 중 샘플링 방법을 전환하는 스케줄링 방법을 설계하였다. 결과적으로 검증 성능의 갱신 유무로 샘플링 방법을 전환하였을 때 naver shopping 데이터셋과 KLUE-TC에 대하여 f1 score와 accuracy의 성능 합이 베이스라인보다 각각 0.7%, 0.8% 향상된 가장 이상적인 성능을 보임을 확인하였다.

  • PDF

반복조사를 통한 범주형 자료의 오분류 탐색

  • 고봉성
    • Communications for Statistical Applications and Methods
    • /
    • v.4 no.1
    • /
    • pp.75-90
    • /
    • 1997
  • 본 연구는 범주형자료의 오분류에 관한 연구로, 2$\times$2분할표의 자료에 오분류가 있다고 생각되는 조사와 반복조사를 통해 정확하게 분류한 새로운 범주형자료를 시간이라는 새변수의 결합을 통해 오분류 여부를 탐색하는 방법에 대한 연구이다.

  • PDF

Syntactic Category Prediction for Improving Parsing Accuracy in English-Korean Machine Translation (영한 기계번역에서 구문 분석 정확성 향상을 위한 구문 범주 예측)

  • Kim Sung-Dong
    • The KIPS Transactions:PartB
    • /
    • v.13B no.3 s.106
    • /
    • pp.345-352
    • /
    • 2006
  • The practical English-Korean machine translation system should be able to translate long sentences quickly and accurately. The intra-sentence segmentation method has been proposed and contributed to speeding up the syntactic analysis. This paper proposes the syntactic category prediction method using decision trees for getting accurate parsing results. In parsing with segmentation, the segment is separately parsed and combined to generate the sentence structure. The syntactic category prediction would facilitate to select more accurate analysis structures after the partial parsing. Thus, we could improve the parsing accuracy by the prediction. We construct features for predicting syntactic categories from the parsed corpus of Wall Street Journal and generate decision trees. In the experiments, we show the performance comparisons with the predictions by human-built rules, trigram probability and neural networks. Also, we present how much the category prediction would contribute to improving the translation quality.

A Study on a Conceptual Taxonomy of Author Keywords of Humanities, Social Sciences, and Art and Sport in the Korea Citation Index (KCI) by Analysis of its Meaning and Lexical Morpheme (한국학술지인용색인(KCI)의 인문학, 사회과학, 예술체육 분야 저자키워드의 의미적, 형태적 분석에 의한 개념범주 텍사노미 연구)

  • Ko, Young Man;Kim, Bee-Yeon;Min, Hye-Ryoung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.48 no.4
    • /
    • pp.297-322
    • /
    • 2014
  • The purpose of this study is to analyse the meaning and lexical morpheme of author keywords of humanities, social sciences, and art and sport in the Korea Ciation Index (KCI) and to propose a conceptual taxonomy of the author keywords. Four top-level concept categories such as 'Substance, Abstraction, General/Common, and Object' are replaced by seven more concrete categories such as 'object, action/function, property, theory/method, format/framework, general/common, and Instance'. In the middle and lower-level concept categories, the hierarchical structure is simplified and the unbalance of term distribution is reduced by creating, subdivision, integration, elimination, and movement of the categories. The result of the test based on the STNet shows that the replaced taxonomy of concept categories has the effects of making the term allocation more balanced and properties of terms more detailed.

A Stochastic Word-Spacing System Based on Word Category-Pattern (어절 내의 형태소 범주 패턴에 기반한 통계적 자동 띄어쓰기 시스템)

  • Kang, Mi-Young;Jung, Sung-Won;Kwon, Hyuk-Chul
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.11
    • /
    • pp.965-978
    • /
    • 2006
  • This paper implements an automatic Korean word-spacing system based on word-recognition using morpheme unigrams and the pattern that the categories of those morpheme unigrams share within a candidate word. Although previous work on Korean word-spacing models has produced the advantages of easy construction and time efficiency, there still remain problems, such as data sparseness and critical memory size, which arise from the morpho-typological characteristics of Korean. In order to cope with both problems, our implementation uses the stochastic information of morpheme unigrams, and their category patterns, instead of word unigrams. A word's probability in a sentence is obtained based on morpheme probability and the weight for the morpheme's category within the category pattern of the candidate word. The category weights are trained so as to minimize the error means between the observed probabilities of words and those estimated by words' individual-morphemes' probabilities weighted according to their categories' powers in a given word's category pattern.

Category Reorganization for Ontology Reuse (온톨러지 재사용을 위한 범주 재분류)

  • Yang Jae-Gun;Lee Jong-Hyeok;Bae Jae-Hak J.;Bae Jae-Hak J.
    • The KIPS Transactions:PartB
    • /
    • v.12B no.1 s.97
    • /
    • pp.69-80
    • /
    • 2005
  • This paper introduces a methodology of transforming an existing ontology into the one that satisfies its application. The transformation consists of simplification and realization of word category information. They are based on category headings and base categories. Furthermore, this paper describes a method by which we can identify relationships between category sets. Through the transformation, (1) Roget's thesaurus is reorganized into 7 categories and the base of 'Ontology for Narrative'[32], (2) 22 immersion factors of multimedia games can be subdivided into 207 factors in [35], and (3) the relationships between 10 mental factors and 22 immersion factors of multimedia games are identified in [36].

The effect of orientation on recognizing object representation (규범적 표상의 방향성 효과)

  • Jung, Hyo-Sun;Lee, Seung-Bok;Jung, Woo-Hyun
    • Science of Emotion and Sensibility
    • /
    • v.11 no.4
    • /
    • pp.501-510
    • /
    • 2008
  • The purpose of this study was to investigate whether the orientation of the head position across different categories affect reaction time and accuracy of object recognition. Fifty four right handed undergraduate students were participated in the experiment. Participants performed the word-picture matching tasks, which were different in terms of head direction of object (i.e., Left-headed or Right-headed) and object category (i.e., natural : animal or artificial : tool). Participants were asked to decide whether each picture matched the word which was followed by the picture. For accuracy, no statistically significant difference was found for both animal and tool pictures due to the ceiling effect. Interaction effect of category and orientation were statistically significant, whereas only the main effect of category was significant. In the animal condition, faster reaction times were observed for left to right than right to left presentation, while no statistical significant difference was found in the tool condition. The orientation of the object's canonical representation was different across different categories. The faster RT for the animal condition implies that the canonical representation for animal is left-headed. This could be due to the orientation of the face.

  • PDF

Personnel Dosimetry Performance Test (개인방사선 피폭선량판독 성능시험)

  • Na, Seong-Ho;Han, Seung-Jae;Lee, Dew-Hey;Cho, Dae-Hyung
    • Journal of Radiation Protection and Research
    • /
    • v.21 no.2
    • /
    • pp.131-138
    • /
    • 1996
  • This paper describes the methods and results of the personnel dosimetry performance tests which were been implemented for the first time in Korea in 1995. Seven categories, except the neutron category prescribed in the ANSI N13.11-1993, were adopted in the test. Fifteen types of dosimeters were participated by fourteen dosimeter processing institutes. A total of 129 dosimeters were selected to test-each type - 15 dosimeters for each of the seven categories and 24 for the controls. A total of 144 radiation categories were employed in the test and a total of 2560 (including 400 controls)dosimeters were submitted-7 categories for each type of the fifteen types dosimeters and 39 categories for the retest. The performance index in each category. sum of the absolute value of the bias and the standard deviation value of the performance quotient. was estimated by the use of delivered and processed dose equivalents according to the standard procedure. The performance in a given category was assessed as acceptable, for the deep and shallow dose equivalents (or the absorbed dose), if the performance index was less than 0.5. The test results showed 54% of the processors passed in the first test, 33% in the retest and 13% in the second retest.

  • PDF

The Effect of the Quality of Pre-Assigned Subject Categories on the Text Categorization Performance (학습문헌집합에 기 부여된 범주의 정확성과 문헌 범주화 성능)

  • Shim, Kyung;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.2
    • /
    • pp.265-285
    • /
    • 2006
  • In text categorization a certain level of correctness of labels assigned to training documents is assumed without solid knowledge on that of real-world collections. Our research attempts to explore the quality of pre-assigned subject categories in a real-world collection, and to identify the relationship between the quality of category assignment in training set and text categorization performance. Particularly, we are interested in to what extent the performance can be improved by enhancing the quality (i.e., correctness) of category assignment in training documents. A collection of 1,150 abstracts in computer science is re-classified by an expert group, and divided into 907 training documents and 227 test documents (15 duplicates are removed). The performances of before and after re-classification groups, called Initial set and Recat-1/Recat-2 sets respectively, are compared using a kNN classifier. The average correctness of subject categories in the Initial set is 16%, and the categorization performance with the Initial set shows 17% in $F_1$ value. On the other hand, the Recat-1 set scores $F_1$ value of 61%, which is 3.6 times higher than that of the Initial set.