• Title/Summary/Keyword: 특징 집합 선택

Search Result 112, Processing Time 0.023 seconds

Comparison Between Optimal Features of Korean and Chinese for Text Classification (한중 자동 문서분류를 위한 최적 자질어 비교)

  • Ren, Mei-Ying;Kang, Sinjae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.4
    • /
    • pp.386-391
    • /
    • 2015
  • This paper proposed the optimal attributes for text classification based on Korean and Chinese linguistic features. The experiments committed to discover which is the best feature among n-grams which is known as language independent, morphemes that have language dependency and some other feature sets consisted with n-grams and morphemes showed best results. This paper used SVM classifier and Internet news for text classification. As a result, bi-gram was the best feature in Korean text categorization with the highest F1-Measure of 87.07%, and for Chinese document classification, 'uni-gram+noun+verb+adjective+idiom', which is the combined feature set, showed the best performance with the highest F1-Measure of 82.79%.

Improving Naïve Bayes Text Classifiers with Incremental Feature Weighting (점진적 특징 가중치 기법을 이용한 나이브 베이즈 문서분류기의 성능 개선)

  • Kim, Han-Joon;Chang, Jae-Young
    • The KIPS Transactions:PartB
    • /
    • v.15B no.5
    • /
    • pp.457-464
    • /
    • 2008
  • In the real-world operational environment, most of text classification systems have the problems of insufficient training documents and no prior knowledge of feature space. In this regard, $Na{\ddot{i}ve$ Bayes is known to be an appropriate algorithm of operational text classification since the classification model can be evolved easily by incrementally updating its pre-learned classification model and feature space. This paper proposes the improving technique of $Na{\ddot{i}ve$ Bayes classifier through feature weighting strategy. The basic idea is that parameter estimation of $Na{\ddot{i}ve$ Bayes considers the degree of feature importance as well as feature distribution. We can develop a more accurate classification model by incorporating feature weights into Naive Bayes learning algorithm, not performing a learning process with a reduced feature set. In addition, we have extended a conventional feature update algorithm for incremental feature weighting in a dynamic operational environment. To evaluate the proposed method, we perform the experiments using the various document collections, and show that the traditional $Na{\ddot{i}ve$ Bayes classifier can be significantly improved by the proposed technique.

Enhancing Document Clustering using Important Term of Cluster and Wikipedia (군집의 중요 용어와 위키피디아를 이용한 문서군집 향상)

  • Park, Sun;Lee, Yeon-Woo;Jeong, Min-A;Lee, Seong-Ro
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.2
    • /
    • pp.45-52
    • /
    • 2012
  • This paper proposes a new enhancing document clustering method using the important terms of cluster and the wikipedia. The proposed method can well represent the concept of cluster topics by means of selecting the important terms in cluster by the semantic features of NMF. It can solve the problem of "bags of words" to be not considered the meaningful relationships between documents and clusters, which expands the important terms of cluster by using of the synonyms of wikipedia. Also, it can improve the quality of document clustering which uses the expanded cluster important terms to refine the initial cluster by re-clustering. The experimental results demonstrate that the proposed method achieves better performance than other document clustering methods.

Extraction of a Central Object in a Color Image Based on Significant Colors (특이 칼라에 기반한 칼라 영상에서의 중심 객체 추출)

  • SungYoung Kim;Eunkyung Lim;MinHwan Kim
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.5
    • /
    • pp.648-657
    • /
    • 2004
  • A method of extracting central objects in color images without any prior-knowledge is proposed in this paper, which uses basically information of significant color distribution. A central object in an image is defined as a set of regions that lie around center of the image and have significant color distribution against the other surround (or background) regions. Significant colors in an image are first defined as the colors that are distributed more densely around center of the image than near borders. Then core object regions (CORs) are selected as the regions a lot of pixels of which have the significant colors. Finally, the adjacent regions to the CORs are iteratively merged if they are similar to the CORs but not to the background regions in color distribution. The merging result is accepted as the central object that may include differently color-characterized regions and/or two or more objects of interest. Usefulness of the significant colors in extracting the central object was verified through experiments on several kinds of test images. We expect that central objects shall be used usefully in image retrieval applications.

  • PDF

A Hybrid Clustering Technique for Processing Large Data (대용량 데이터 처리를 위한 하이브리드형 클러스터링 기법)

  • Kim, Man-Sun;Lee, Sang-Yong
    • The KIPS Transactions:PartB
    • /
    • v.10B no.1
    • /
    • pp.33-40
    • /
    • 2003
  • Data mining plays an important role in a knowledge discovery process and various algorithms of data mining can be selected for the specific purpose. Most of traditional hierachical clustering methode are suitable for processing small data sets, so they difficulties in handling large data sets because of limited resources and insufficient efficiency. In this study we propose a hybrid neural networks clustering technique, called PPC for Pre-Post Clustering that can be applied to large data sets and find unknown patterns. PPC combinds an artificial intelligence method, SOM and a statistical method, hierarchical clustering technique, and clusters data through two processes. In pre-clustering process, PPC digests large data sets using SOM. Then in post-clustering, PPC measures Similarity values according to cohesive distances which show inner features, and adjacent distances which show external distances between clusters. At last PPC clusters large data sets using the simularity values. Experiment with UCI repository data showed that PPC had better cohensive values than the other clustering techniques.

Coarse-to-fine Classifier Ensemble Selection using Clustering and Genetic Algorithms (군집화와 유전 알고리즘을 이용한 거친-섬세한 분류기 앙상블 선택)

  • Kim, Young-Won;Oh, Il-Seok
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.9
    • /
    • pp.857-868
    • /
    • 2007
  • The good classifier ensemble should have a high complementarity among classifiers in order to produce a high recognition rate and its size is small in order to be efficient. This paper proposes a classifier ensemble selection algorithm with coarse-to-fine stages. for the algorithm to be successful, the original classifier pool should be sufficiently diverse. This paper produces a large classifier pool by combining several different classification algorithms and lots of feature subsets. The aim of the coarse selection is to reduce the size of classifier pool with little sacrifice of recognition performance. The fine selection finds near-optimal ensemble using genetic algorithms. A hybrid genetic algorithm with improved searching capability is also proposed. The experimentation uses the worldwide handwritten numeral databases. The results showed that the proposed algorithm is superior to the conventional ones.

Feature Selection for Case-Based Reasoning using the Order of Selection and Elimination Effects of Individual Features (개별 속성의 선택 및 제거효과 순위를 이용한 사례기반 추론의 속성 선정)

  • 이재식;이혁희
    • Journal of Intelligence and Information Systems
    • /
    • v.8 no.2
    • /
    • pp.117-137
    • /
    • 2002
  • A CBR(Case-Based Reasoning) system solves the new problems by adapting the solutions that were used to solve the old problems. Past cases are retained in the case base, each in a specific form that is determined by features. Features are selected for the purpose of representing the case in the best way. Similar cases are retrieved by comparing the feature values and calculating the similarity scores. Therefore, the performance of CBR depends on the selected feature subsets. In this research, we measured the Selection Effect and the Elimination Effect of each feature. The Selection Effect is measured by performing the CBR with only one feature, and the Elimination Effect is measured by performing the CBR without only one feature. Based on these measurements, the feature subsets are selected. The resulting CBR showed better performance in terms of accuracy and efficiency than the CBR with all features.

  • PDF

Analysis for Jamming Accident on Emergency Escape through the Bottleneck under High Density Condition (과밀상태하의 병목구간에서 피난 시 보행자 압사사고 해석)

  • Song, Gyeong-Won;Park, Jun-Young
    • Proceedings of the Korea Institute of Fire Science and Engineering Conference
    • /
    • 2011.11a
    • /
    • pp.490-493
    • /
    • 2011
  • 공연과 스포츠 문화의 발전으로 한정된 구역에 불특정 다수의 사람들이 모이는 일이 빈번하게 일어나고 있다. 이에 비례하여 불특정 다수의 압사사고 역시 자주 일어나고 있다. 이러한 압사사고는 인도, 일본, 독일 등 세계 곳곳에서 일어나며, 국내에서도 자주 발생하는 압사사고는 단순히 안전불감증 문제로만 치부되어 특별한 과학적 해석이 행해지지 않는 문제점을 가지고 있다. 이러한 압사사고는 보행자의 심리와 물리적 충돌에 의한 힘에 의하여 일어나는 특징을 가지고 있다. 따라서 본 연구에서는 집합행동심리를 고려한 이산요소법을 활용하여 과밀상태하의 병목구간에서의 보행자 피난 유동 해석 연구를 진행한다. 연구의 변수로는 출구의 폭, 출구 각도 그리고 보행자의 혼란정도를 나타내는 Panic Factor가 선택되었다.

  • PDF

Face Recognition by Using Principal Component Analysis of Unsupervised Learning (자율학습의 PCA를 이용한 얼굴인식)

  • Cho Yong-Hyun;Cha Joo-Hee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.11a
    • /
    • pp.583-586
    • /
    • 2004
  • 본 논문에서는 자율학습의 속성을 가지는 주요성분분석을 이용한 얼굴인식 기법을 제안하였다. 이는 대용량의 입력 데이터를 통계적으로 독립인 특징들의 집합으로 변환시켜 중복신호를 제거하는 특성을 가지는 주요성분분석의 우수한 속성을 이용한 것이다. 제안된 기법을 Yale 얼굴영상 데이터베이스로부터 선택된 20개의 $320{\ast}243$ 픽셀의 영상을 대상으로 시뮬레이션한 결과, 주요성분의 개수에 따른 압축성능과 city-block, Euclidian, 그리고 negative angle(cosine)의 거리척도에 따른 인식에서의 분류성능에서 우수한 성능이 있음을 확인할 수 있었다.

  • PDF

Faults Current Discrimination Using FCM (FCM을 이용한 고장전류의 판별에 관한 연구)

  • Jeong, Jong-Won;Ji, Suk-Joon;Lee, Joon-Tark;Kim, Kwang-Back
    • Proceedings of the KIPE Conference
    • /
    • 2007.07a
    • /
    • pp.458-460
    • /
    • 2007
  • RBF 네트워크의 중간층은 클러스터링 하는 층으로 주어진 자료 집합을 유사한 클러스터들로 분류하는 것이다. 여기서 유사하다는 것은 입력 데이터들에 대한 특징 벡터 공간사이에서 한 클러스터내의 벡터들 간에 거리를 측정하여 정해진 반경 내에 존재하면 같은 클러스터로 분류하고 정해진 반경 내에 존재하지 않으면 다른 클러스터로 분류한다. 그러나 정해진 반경 내에서 클러스터링 하는 것은 잘못된 클러스터를 선택하는 단점을 가지게 된다. 그러므로 중간층을 결정하는 것은 RBF 네트워크의 전반적인 효율성에 큰 영향을 준다. 따라서 본 논문에서는 효율적으로 중간층을 결정하기 위한 방법으로 퍼지 C-Means 클러스터링 알고리즘을 이용하고자 하였다. 그리하여 본 논문에서는 고장 전류의 특성을 해석하여 그 원인을 판단, 분류하기 위하여 전력계통의 고장 기록 장치로부터 얻어지는 선로의 전류 데이터를 FCM을 이용 분류하여 다양한 고장 모드를 판별할 수 있었다.

  • PDF