• Title/Summary/Keyword: 응용 범주

Search Result 243, Processing Time 0.022 seconds

Reconsideration of the Linguistic Category of Mediation in Language: a Comparative Approach between French and Korean (언어의 '매개작용' 범주 고찰: 프랑스어와 한국어 비교 연구)

  • Suh, Jungyeon
    • Cross-Cultural Studies
    • /
    • v.46
    • /
    • pp.297-325
    • /
    • 2017
  • In this paper, I would like to reconsider the evidential category (or the mediation category) in languages with language specific values, especially in Korean and French evidentials. We tried to analyze how the evidentials are represented in both languages including their linguistic markers (grammatical, lexical or discursive) and their semantic meanings. According to the precedent studies from the general linguistic point of view, we would like to reconsider the semantic meanings of both languages' grammatical markers, the so-called Korean retrospective marker '-te-' and French conditionals in the framework of the enunciative operation theory suggested by $Descl{\acute{e}}s$ & $Guentch{\acute{e}}va$ (2000), which proposed to classify the type of discourse by the language-independent description tools conceived after the enunciation theory suggested by Bally (1965), Benveniste (1956), Culioli (1973). Through this approach, we would like to contribute to establishing the linguistic basis not only for the general linguistic research to determine the invariant meaning of linguistic evidentials and their system, but also for the applied linguistics to the language engineering field.

Definition and Extraction of Causal Relations for Question-Answering on Fault-Diagnosis of Electronic Devices (전자장비 고장진단 질의응답을 위한 인과관계 정의 및 추출)

  • Lee, Sheen-Mok;Shin, Ji-Ae
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.5
    • /
    • pp.335-346
    • /
    • 2008
  • Causal relations in ontology should be defined based on the inference types necessary to solve problems specific to application as well as domain. In this paper, we present a model to define and extract causal relations for application ontology for Question-Answering (QA) on fault-diagnosis of electronic devices. Causal categories are defined by analyzing generic patterns of QA application; the relations between concepts in the corpus belonging to the causal categories are defined as causal relations. Instances of casual relations are extracted using lexical patterns in the concept definitions of domain, and extended incrementally with information from thesaurus. On the evaluation by domain specialists, our model shows precision of 92.3% in classification of relations and precision of 80.7% in identifying causal relations at the extraction phase.

A Search-Result Clustering Method based on Word Clustering for Effective Browsing of the Paper Retrieval Results (논문 검색 결과의 효과적인 브라우징을 위한 단어 군집화 기반의 결과 내 군집화 기법)

  • Bae, Kyoung-Man;Hwang, Jae-Won;Ko, Young-Joong;Kim, Jong-Hoon
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.3
    • /
    • pp.214-221
    • /
    • 2010
  • The search-results clustering problem is defined as the automatic and on-line grouping of similar documents in search results returned from a search engine. In this paper, we propose a new search-results clustering algorithm specialized for a paper search service. Our system consists of two algorithmic phases: Category Hierarchy Generation System (CHGS) and Paper Clustering System (PCS). In CHGS, we first build up the category hierarchy, called the Field Thesaurus, for each research field using an existing research category hierarchy (KOSEF's research category hierarchy) and the keyword expansion of the field thesaurus by a word clustering method using the K-means algorithm. Then, in PCS, the proposed algorithm determines the category of each paper using top-down and bottom-up methods. The proposed system can be used in the application areas for retrieval services in a specialized field such as a paper search service.

Automatic Document Classification Based on k-NN Classifier and Object-Based Thesaurus (k-NN 분류 알고리즘과 객체 기반 시소러스를 이용한 자동 문서 분류)

  • Bang Sun-Iee;Yang Jae-Dong;Yang Hyung-Jeong
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.9
    • /
    • pp.1204-1217
    • /
    • 2004
  • Numerous statistical and machine learning techniques have been studied for automatic text classification. However, because they train the classifiers using only feature vectors of documents, ambiguity between two possible categories significantly degrades precision of classification. To remedy the drawback, we propose a new method which incorporates relationship information of categories into extant classifiers. In this paper, we first perform the document classification using the k-NN classifier which is generally known for relatively good performance in spite of its simplicity. We employ the relationship information from an object-based thesaurus to reduce the ambiguity. By referencing various relationships in the thesaurus corresponding to the structured categories, the precision of k-NN classification is drastically improved, removing the ambiguity. Experiment result shows that this method achieves the precision up to 13.86% over the k-NN classification, preserving its recall.

Comparing Accuracy of Imputation Methods for Categorical Incomplete Data (범주형 자료의 결측치 추정방법 성능 비교)

  • 신형원;손소영
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.1
    • /
    • pp.33-43
    • /
    • 2002
  • Various kinds of estimation methods have been developed for imputation of categorical missing data. They include category method, logistic regression, and association rule. In this study, we propose two fusions algorithms based on both neural network and voting scheme that combine the results of individual imputation methods. A Mont-Carlo simulation is used to compare the performance of these methods. Five factors used to simulate the missing data pattern are (1) input-output function, (2) data size, (3) noise of input-output function (4) proportion of missing data, and (5) pattern of missing data. Experimental study results indicate the following: when the data size is small and missing data proportion is large, modal category method, association rule, and neural network based fusion have better performances than the other methods. However, when the data size is small and correlation between input and missing output is strong, logistic regression and neural network barred fusion algorithm appear better than the others. When data size is large with low missing data proportion, a large noise, and strong correlation between input and missing output, neural networks based fusion algorithm turns out to be the best choice.

Dynamic Classification of Categories in Web Search Environment (웹 검색 환경에서 범주의 동적인 분류)

  • Choi Bum-Ghi;Lee Ju-Hong;Park Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.7
    • /
    • pp.646-654
    • /
    • 2006
  • Directory searching and index searching methods are two main methods in web search engines. Both of the methods are applied to most of the well-known Internet search engines, which enable users to choose the other method if they are not satisfied with results shown by one method. That is, Index searching tends to come up with too many search results, while directory searching has a difficulty in selecting proper categories, frequently mislead to false ones. In this paper, we propose a novel method in which a category hierarchy is dynamically constructed. To do this, a category is regarded as a fuzzy set which includes keywords. Similarly extensible subcategories of a category can be found using fuzzy relational products. The merit of this method is to enhance the recall rate of directory search by expanding subcategories on the basis of similarity.

A Study on the Cultural Characteristics of Korean Society: Discovering Its Categories Using the Cultural Consensus Model (한국사회의 문화적 특성에 관한 연구: 문화합의이론을 통한 범주의 발견)

  • Minbong You;Hyungin Shim
    • Korean Journal of Culture and Social Issue
    • /
    • v.19 no.3
    • /
    • pp.457-485
    • /
    • 2013
  • This study attempted to discover the dimensions of Korean culture, with the presumption that the cross-cultural studies(Hofstede, 1980, 1997; Schwartz, 1992, 1994; Trompenaars and Hampden-Turner, 1997; House et al., 2004) have limitation to explain non-western culture including Korean culture. Even though there are some Korean cultural studies, they used heuristic approaches applying the authors' experiences and intuitions. This study applied the Cultural Consensus Theory to overcome the previous studies' shortcomings and to discover the dimensions that can be empirically proved by data. In specific this study conducted in-depth interview, used content analysis, did frequency analysis, and applied pilesort technique, multidimensional scaling and network analysis. As a result, this study obtained five categories: public self-consciousness, group-focused orientation, affective human relations, hierarchical culture, and result-orientation. It is expected that these dimensions can be used as important variables that may explain Korean social phenomena.

  • PDF

Improving minority prediction performance of support vector machine for imbalanced text data via feature selection and SMOTE (단어선택과 SMOTE 알고리즘을 이용한 불균형 텍스트 데이터의 소수 범주 예측성능 향상 기법)

  • Jongchan Kim;Seong Jun Chang;Won Son
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.4
    • /
    • pp.395-410
    • /
    • 2024
  • Text data is usually made up of a wide variety of unique words. Even in standard text data, it is common to find tens of thousands of different words. In text data analysis, usually, each unique word is treated as a variable. Thus, text data can be regarded as a dataset with a large number of variables. On the other hand, in text data classification, we often encounter class label imbalance problems. In the cases of substantial imbalances, the performance of conventional classification models can be severely degraded. To improve the classification performance of support vector machines (SVM) for imbalanced data, algorithms such as the Synthetic Minority Over-sampling Technique (SMOTE) can be used. The SMOTE algorithm synthetically generates new observations for the minority class based on the k-Nearest Neighbors (kNN) algorithm. However, in datasets with a large number of variables, such as text data, errors may accumulate. This can potentially impact the performance of the kNN algorithm. In this study, we propose a method for enhancing prediction performance for the minority class of imbalanced text data. Our approach involves employing variable selection to generate new synthetic observations in a reduced space, thereby improving the overall classification performance of SVM.

(A Question Type Classifier based on a Support Vector Machine for a Korean Question-Answering System) (한국어 질의응답시스템을 위한 지지 벡터기계 기반의 질의유형분류기)

  • 김학수;안영훈;서정연
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.5_6
    • /
    • pp.466-475
    • /
    • 2003
  • To build an efficient Question-Answering (QA) system, a question type classifier is needed. It can classify user's queries into predefined categories regardless of the surface form of a question. In this paper, we propose a question type classifier using a Support Vector Machine (SVM). The question type classifier first extracts features like lexical forms, part of speech and semantic markers from a user's question. The system uses $X^2$ statistic to select important features. Selected features are represented as a vector. Finally, a SVM categorizes questions into predefined categories according to the extracted features. In the experiment, the proposed system accomplished 86.4% accuracy The system precisely classifies question type without using any rules like lexico-syntactic patterns. Therefore, the system is robust and easily portable to other domains.