• Title/Summary/Keyword: Machine Learning

Search Result 5,209, Processing Time 0.037 seconds

Oversampling-Based Ensemble Learning Methods for Imbalanced Data (불균형 데이터 처리를 위한 과표본화 기반 앙상블 학습 기법)

  • Kim, Kyung-Min;Jang, Ha-Young;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.10
    • /
    • pp.549-554
    • /
    • 2014
  • Handwritten character recognition data is usually imbalanced because it is collected from the natural language sentences written by different writers. The imbalanced data can cause seriously negative effect on the performance of most of machine learning algorithms. But this problem is typically ignored in handwritten character recognition, because it is considered that most of difficulties in handwritten character recognition is caused by the high variance in data set and similar shapes between characters. We propose the oversampling-based ensemble learning methods to solve imbalanced data problem in handwritten character recognition and to improve the recognition accuracy. Also we show that proposed method achieved improvements in recognition accuracy of minor classes as well as overall recognition accuracy empirically.

Forecasting of Short Term Photovoltaic Generation by Various Input Model in Supervised Learning (지도학습에서 다양한 입력 모델에 의한 초단기 태양광 발전 예측)

  • Jang, Jin-Hyuk;Shin, Dong-Ha;Kim, Chang-Bok
    • Journal of Advanced Navigation Technology
    • /
    • v.22 no.5
    • /
    • pp.478-484
    • /
    • 2018
  • This study predicts solar radiation, solar radiation, and solar power generation using hourly weather data such as temperature, precipitation, wind direction, wind speed, humidity, cloudiness, sunshine and solar radiation. I/O pattern in supervised learning is the most important factor in prediction, but it must be determined by repeated experiments because humans have to decide. This study proposed four input and output patterns for solar and sunrise prediction. In addition, we predicted solar power generation using the predicted solar and solar radiation data and power generation data of Youngam solar power plant in Jeollanamdo. As a experiment result, the model 4 showed the best prediction results in the sunshine and solar radiation prediction, and the RMSE of sunshine was 1.5 times and the sunshine RMSE was 3 times less than that of model 1. As a experiment result of solar power generation prediction, the best prediction result was obtained for model 4 as well as sunshine and solar radiation, and the RMSE was reduced by 2.7 times less than that of model 1.

Word Sense Disambiguation of Predicate using Semi-supervised Learning and Sejong Electronic Dictionary (세종 전자사전과 준지도식 학습 방법을 이용한 용언의 어의 중의성 해소)

  • Kang, Sangwook;Kim, Minho;Kwon, Hyuk-chul;Oh, Jyhyun
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.2
    • /
    • pp.107-112
    • /
    • 2016
  • The Sejong Electronic(machine-readable) Dictionary, developed by the 21st century Sejong Plan, contains systematically organized information on Korean words. It helps to solve problems encountered in the electronic formatting of the still-commonly-used hard-copy dictionary. The Sejong Electronic Dictionary, however has a limitation relate to sentence structure and selection-restricted nouns. This paper discuses the limitations of word-sense disambiguation(WSD) that uses subcategorization information suggested by the Sejong Electronic Dictionary and generalized selection-restricted nouns from the Korean Lexico-semantic network. An alternative method that utilized semi-supervised learning, the chi-square test and some other means to make WSD decisions is presented herein.

Semantic Document-Retrieval Based on Markov Logic (마코프 논리 기반의 시맨틱 문서 검색)

  • Hwang, Kyu-Baek;Bong, Seong-Yong;Ku, Hyeon-Seo;Paek, Eun-Ok
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.6
    • /
    • pp.663-667
    • /
    • 2010
  • A simple approach to semantic document-retrieval is to measure document similarity based on the bag-of-words representation, e.g., cosine similarity between two document vectors. However, such a syntactic method hardly considers the semantic similarity between documents, often producing semantically-unsound search results. We circumvent such a problem by combining supervised machine learning techniques with ontology information based on Markov logic. Specifically, Markov logic networks are learned from similarity-tagged documents with an ontology representing the diverse relationship among words. The learned Markov logic networks, the ontology, and the training documents are applied to the semantic document-retrieval task by inferring similarities between a query document and the training documents. Through experimental evaluation on real world question-answering data, the proposed method has been shown to outperform the simple cosine similarity-based approach in terms of retrieval accuracy.

A Fusion Method of Co-training and Label Propagation for Prediction of Bank Telemarketing (은행 텔레마케팅 예측을 위한 레이블 전파와 협동 학습의 결합 방법)

  • Kim, Aleum;Cho, Sung-Bae
    • Journal of KIISE
    • /
    • v.44 no.7
    • /
    • pp.686-691
    • /
    • 2017
  • Telemarketing has become the center of marketing action of the industry in the information society. Recently, machine learning has emerged in many areas, especially, financial prediction. Financial data consists of lots of unlabeled data in most parts, and therefore, it is difficult for humans to perform their labeling. In this paper, we propose a fusion method of semi-supervised learning for automatic labeling of unlabeled data to predict telemarketing. Specifically, we integrate labeling results of label propagation and co-training with a decision tree. The data with lower reliabilities are removed, and the data are extracted that have consistent label from two labeling methods. After adding them to the training set, a decision tree is learned with all of them. To confirm the usefulness of the proposed method, we conduct the experiments with a real telemarketing dataset in a Portugal bank. Accuracy of the proposed method is 83.39%, which is 1.82% higher than that of the conventional method, and precision of the proposed method is 19.37%, which is 2.67% higher than that of the conventional method. As a result, we have shown that the proposed method has a better performance as assessed by the t-test.

Event Sentence Extraction for Online Trend Analysis (온라인 동향 분석을 위한 이벤트 문장 추출 방안)

  • Yun, Bo-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.9
    • /
    • pp.9-15
    • /
    • 2012
  • A conventional event sentence extraction research doesn't learn the 3W features in the learning step and applies the rule on whether the 3W feature exists in the extraction step. This paper presents a sentence weight based event sentence extraction method that calculates the weight of the 3W features in the learning step and applies the weight of the 3W features in the extraction step. In the experimental result, we show that top 30% features by the $TF{\times}IDF$ weighting method is good in the feature filtering. In the real estate domain of the public issue, the performance of sentence weight based event sentence extraction method is improved by who and when of 3W features. Moreover, In the real estate domain of the public issue, the sentence weight based event sentence extraction method is better than the other machine learning based extraction method.

Fast and All-Purpose Area-Based Imagery Registration Using ConvNets (ConvNet을 활용한 영역기반 신속/범용 영상정합 기술)

  • Baek, Seung-Cheol
    • Journal of KIISE
    • /
    • v.43 no.9
    • /
    • pp.1034-1042
    • /
    • 2016
  • Together with machine-learning frameworks, area-based imagery registration techniques can be easily applied to diverse types of image pairs without predefined features and feature descriptors. However, feature detectors are often used to quickly identify candidate image patch pairs, limiting the applicability of these registration techniques. In this paper, we propose a ConvNet (Convolutional Network) "Dart" that provides not only the matching metric between patches, but also information about their distance, which are helpful in reducing the search space of the corresponding patch pairs. In addition, we propose a ConvNet "Fad" to identify the patches that are difficult for Dart to improve the accuracy of registration. These two networks were successfully implemented using Deep Learning with the help of a number of training instances generated from a few registered image pairs, and were successfully applied to solve a simple image registration problem, suggesting that this line of research is promising.

Simulated Annealing for Two-Agent Scheduling Problem with Exponential Job-Dependent Position-Based Learning Effects (작업별 위치기반 지수학습 효과를 갖는 2-에이전트 스케줄링 문제를 위한 시뮬레이티드 어닐링)

  • Choi, Jin Young
    • Journal of the Korea Society for Simulation
    • /
    • v.24 no.4
    • /
    • pp.77-88
    • /
    • 2015
  • In this paper, we consider a two-agent single-machine scheduling problem with exponential job-dependent position-based learning effects. The objective is to minimize the total weighted completion time of one agent with the restriction that the makespan of the other agent cannot exceed an upper bound. First, we propose a branch-and-bound algorithm by developing some dominance /feasibility properties and a lower bound to find an optimal solution. Second, we design an efficient simulated annealing (SA) algorithm to search a near optimal solution by considering six different SAs to generate initial solutions. We show the performance superiority of the suggested SA using a numerical experiment. Specifically, we verify that there is no significant difference in the performance of %errors between different considered SAs using the paired t-test. Furthermore, we testify that random generation method is better than the others for agent A, whereas the initial solution method for agent B did not affect the performance of %errors.

Review of Author Name Disambiguation Techniques for Citation Analysis (인용분석에서의 모호한 저자명 식별을 위한 방법들에 관한 고찰)

  • Kim, Hyun-Jung
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.23 no.3
    • /
    • pp.5-17
    • /
    • 2012
  • In citation analysis, author names are often used as the unit of analysis and some authors are indexed under the same name in bibliographic databases where the citation counts are obtained from. There are many techniques for author name disambiguation, using supervised, unsupervised, or semisupervised learning algorithms. Unsupervised approach uses machine learning algorithms to extract necessary bibliographic information from large-scale databases and digital libraries, while supervised approaches use manually built training datasets for clustering author groups for combining them with learning algorithms for author name disambiguation. The study examines various techniques for author name disambiguation in the hope for finding an aid to improve the precision of citation counts in citation analysis, as well as for better results in information retrieval.

An Efficient Multi-Attribute Negotiation System using Learning Agents for Reciprocity (상호 이익을 위한 학습 에이전트 기반의 효율적인 다중 속성 협상 시스템)

  • Park, Sang-Hyun;Yang, Sung-Bong
    • The KIPS Transactions:PartD
    • /
    • v.11D no.3
    • /
    • pp.731-740
    • /
    • 2004
  • In this paper we propose a fast negotiation agent system that guarantees the reciprocity of the attendants in a bilateral negotiation on the e-commerce. The proposednegotiation agent system exploits the incremental learning method based on an artificial neural network in generating a counter-offer and is trained by the previous offer that has been rejected by the other party. During a negotiation, the software agents on behalf of a buyer and a seller negotiate each other by considering the multi-attributes of a product. The experimental results show that the proposed negotiation system achieves better agreements than other negotiation agent systems that are operated under the realistic and practical environment. Furthermore, the proposed system carries out negotiations about twenty times faster than the previous negotiation systems on the average.