• Title/Summary/Keyword: Supervised learning

Search Result 747, Processing Time 0.024 seconds

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

SVM-Based EEG Signal for Hand Gesture Classification (서포트 벡터 머신 기반 손동작 뇌전도 구분에 대한 연구)

  • Hong, Seok-min;Min, Chang-gi;Oh, Ha-Ryoung;Seong, Yeong-Rak;Park, Jun-Seok
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.29 no.7
    • /
    • pp.508-514
    • /
    • 2018
  • An electroencephalogram (EEG) evaluates the electrical activity generated by brain cell interactions that occur during brain activity, and an EEG can evaluate the brain activity caused by hand movement. In this study, a 16-channel EEG was used to measure the EEG generated before and after hand movement. The measured data can be classified as a supervised learning model, a support vector machine (SVM). To shorten the learning time of the SVM, a feature extraction and vector dimension reduction by filtering is proposed that minimizes motion-related information loss and compresses EEG information. The classification results showed an average of 72.7% accuracy between the sitting position and the hand movement at the electrodes of the frontal lobe.

Feature Subset Selection in the Induction Algorithm using Sensitivity Analysis of Neural Networks (신경망의 민감도 분석을 이용한 귀납적 학습기법의 변수 부분집합 선정)

  • 강부식;박상찬
    • Journal of Intelligence and Information Systems
    • /
    • v.7 no.2
    • /
    • pp.51-63
    • /
    • 2001
  • In supervised machine learning, an induction algorithm, which is able to extract rules from data with learning capability, provides a useful tool for data mining. Practical induction algorithms are known to degrade in prediction accuracy and generate complex rules unnecessarily when trained on data containing superfluous features. Thus it needs feature subset selection for better performance of them. In feature subset selection on the induction algorithm, wrapper method is repeatedly run it on the dataset using various feature subsets. But it is impractical to search the whole space exhaustively unless the features are small. This study proposes a heuristic method that uses sensitivity analysis of neural networks to the wrapper method for generating rules with higher possible accuracy. First it gives priority to all features using sensitivity analysis of neural networks. And it uses the wrapper method that searches the ordered feature space. In experiments to three datasets, we show that the suggested method is capable of selecting a feature subset that improves the performance of the induction algorithm within certain iteration.

  • PDF

Hierarchical Ann Classification Model Combined with the Adaptive Searching Strategy (적응적 탐색 전략을 갖춘 계층적 ART2 분류 모델)

  • 김도현;차의영
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.7_8
    • /
    • pp.649-658
    • /
    • 2003
  • We propose a hierarchical architecture of ART2 Network for performance improvement and fast pattern classification model using fitness selection. This hierarchical network creates coarse clusters as first ART2 network layer by unsupervised learning, then creates fine clusters of the each first layer as second network layer by supervised learning. First, it compares input pattern with each clusters of first layer and select candidate clusters by fitness measure. We design a optimized fitness function for pruning clusters by measuring relative distance ratio between a input pattern and clusters. This makes it possible to improve speed and accuracy. Next, it compares input pattern with each clusters connected with selected clusters and finds winner cluster. Finally it classifies the pattern by a label of the winner cluster. Results of our experiments show that the proposed method is more accurate and fast than other approaches.

Anomaly Detection of Generative Adversarial Networks considering Quality and Distortion of Images (이미지의 질과 왜곡을 고려한 적대적 생성 신경망과 이를 이용한 비정상 검출)

  • Seo, Tae-Moon;Kang, Min-Guk;Kang, Dong-Joong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.3
    • /
    • pp.171-179
    • /
    • 2020
  • Recently, studies have shown that convolution neural networks are achieving the best performance in image classification, object detection, and image generation. Vision based defect inspection which is more economical than other defect inspection, is a very important for a factory automation. Although supervised anomaly detection algorithm has far exceeded the performance of traditional machine learning based method, it is inefficient for real industrial field due to its tedious annotation work, In this paper, we propose ADGAN, a unsupervised anomaly detection architecture using the variational autoencoder and the generative adversarial network which give great results in image generation task, and demonstrate whether the proposed network architecture identifies anomalous images well on MNIST benchmark dataset as well as our own welding defect dataset.

Vision-Based Vehicle Detection and Tracking Using Online Learning (온라인 학습을 이용한 비전 기반의 차량 검출 및 추적)

  • Gil, Sung-Ho;Kim, Gyeong-Hwan
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.39A no.1
    • /
    • pp.1-11
    • /
    • 2014
  • In this paper we propose a system for vehicle detection and tracking which has the ability to learn on-line appearance changes of vehicles being tracked. The proposed system uses feature-based tracking method to estimate rapidly and robustly the motion of the newly detected vehicles between consecutive frames. Simultaneously, the system trains an online vehicle detector for the tracked vehicles. If the tracker fails, it is re-initialized by the detection of the online vehicle detector. An improved vehicle appearance model update rule is presented to increase a tracking performance and a speed of the proposed system. Performance of the proposed system is evaluated on the dataset acquired on various driving environment. In particular, the experimental results proved that the performance of the vehicle tracking is significantly improved under bad conditions such as entering a tunnel and passing rain.

An Emerging Technology Trend Identifier Based on the Citation and the Change of Academic and Industrial Popularity (학계와 산업계의 정보 대중성 변동과 인용 정보에 기반한 최신 기술 동향 식별 시스템)

  • Kim, Seonho;Lee, Junkyu;Rasheed, Waqas;Yeo, Woondong
    • Journal of Korea Technology Innovation Society
    • /
    • v.14 no.spc
    • /
    • pp.1171-1186
    • /
    • 2011
  • Identifying Emerging Technology Trends is crucial for decision makers of nations and organizations in order to use limited resources, such as time, money, etc., efficiently. Many researchers have proposed emerging trend detection systems based on a popularity analysis of the document, but this still needs to be improved. In this paper, an emerging trend detection classifier is proposed which uses both academic and industrial data, SCOPUS and PATSTAT. Unlike most pre-vious research, our emerging technology trend classifi-er utilizes supervised, semi-automatic, machine learning techniques to improve the precision of the results. In addition, the citation information from among the SCOPUS data is analyzed to identify the early signals of emerging technology trends.

  • PDF

Optimization of Structure-Adaptive Self-Organizing Map Using Genetic Algorithm (유전자 알고리즘을 사용한 구조적응 자기구성 지도의 최적화)

  • 김현돈;조성배
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.3
    • /
    • pp.223-230
    • /
    • 2001
  • Since self-organizing map (SOM) preserves the topology of ordering in input spaces and trains itself by unsupervised algorithm, it is Llsed in many areas. However, SOM has a shortcoming: structure cannot be easily detcrmined without many trials-and-errors. Structure-adaptive self-orgnizing map (SASOM) which can adapt its structure as well as its weights overcome the shortcoming of self-organizing map: SASOM makes use of structure adaptation capability to place the nodes of prototype vectors into the pattern space accurately so as to make the decision boundmies as close to the class boundaries as possible. In this scheme, the initialization of weights of newly adapted nodes is important. This paper proposes a method which optimizes SASOM with genetic algorithm (GA) to determines the weight vector of newly split node. The leanling algorithm is a hybrid of unsupervised learning method and supervised learning method using LVQ algorithm. This proposed method not only shows higher performance than SASOM in terms of recognition rate and variation, but also preserves the topological order of input patterns well. Experiments with 2D pattern space data and handwritten digit database show that the proposed method is promising.

  • PDF

An Incremental Method Using Sample Split Points for Global Discretization (전역적 범주화를 위한 샘플 분할 포인트를 이용한 점진적 기법)

  • 한경식;이수원
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.7
    • /
    • pp.849-858
    • /
    • 2004
  • Most of supervised teaming algorithms could be applied after that continuous variables are transformed to categorical ones at the preprocessing stage in order to avoid the difficulty of processing continuous variables. This preprocessing stage is called global discretization, uses the class distribution list called bins. But, when data are large and the range of the variable to be discretized is very large, many sorting and merging should be performed to produce a single bin because most of global discretization methods need a single bin. Also, if new data are added, they have to perform discretization from scratch to construct categories influenced by the data because the existing methods perform discretization in batch mode. This paper proposes a method that extracts sample points and performs discretization from these sample points in order to solve these problems. Because the approach in this paper does not require merging for producing a single bin, it is efficient when large data are needed to be discretized. In this study, an experiment using real and synthetic datasets was made to compare the proposed method with an existing one.

A Study on the Minimum Error Entropy - related Criteria for Blind Equalization (블라인드 등화를 위한 최소 에러 엔트로피 성능기준들에 관한 연구)

  • Kim, Namyong;Kwon, Kihyun
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.2 no.3
    • /
    • pp.87-95
    • /
    • 2009
  • As information theoretic learning techniques, error entropy minimization criterion (MEE) and maximum cross correntropy criterion (MCC) have been studied in depth for supervised learning. MEE criterion leads to maximization of information potential and MCC criterion leads to maximization of cross correlation between output and input random processes. The weighted combination scheme of these two criteria, namely, minimization of Error Entropy with Fiducial points (MEEF) has been introduced and developed by many researchers. As an approach to unsupervised, blind channel equalization, we investigate the possibility of applying constant modulus error (CME) to MEE criterion and some problems of the method. Also we study on the application of CME to MEEF for blind equalization and find out that MEE-CME loses the information of the constant modulus. This leads MEE-CME and MEEF-CME not to converge or to converge slower than other algorithms dependent on the constant modulus.

  • PDF