• 제목/요약/키워드: Business Classification Systems

검색결과 341건 처리시간 0.095초

Medical Diagnosis Problem Solving Based on the Combination of Genetic Algorithms and Local Adaptive Operations (유전자 알고리즘 및 국소 적응 오퍼레이션 기반의 의료 진단 문제 자동화 기법 연구)

  • Lee, Ki-Kwang;Han, Chang-Hee
    • Journal of Intelligence and Information Systems
    • /
    • 제14권2호
    • /
    • pp.193-206
    • /
    • 2008
  • Medical diagnosis can be considered a classification task which classifies disease types from patient's condition data represented by a set of pre-defined attributes. This study proposes a hybrid genetic algorithm based classification method to develop classifiers for multidimensional pattern classification problems related with medical decision making. The classification problem can be solved by identifying separation boundaries which distinguish the various classes in the data pattern. The proposed method fits a finite number of regional agents to the data pattern by combining genetic algorithms and local adaptive operations. The local adaptive operations of an agent include expansion, avoidance and relocation, one of which is performed according to the agent's fitness value. The classifier system has been tested with well-known medical data sets from the UCI machine learning database, showing superior performance to other methods such as the nearest neighbor, decision tree, and neural networks.

  • PDF

Terms Based Sentiment Classification for Online Review Using Support Vector Machine (Support Vector Machine을 이용한 온라인 리뷰의 용어기반 감성분류모형)

  • Lee, Taewon;Hong, Taeho
    • Information Systems Review
    • /
    • 제17권1호
    • /
    • pp.49-64
    • /
    • 2015
  • Customer reviews which include subjective opinions for the product or service in online store have been generated rapidly and their influence on customers has become immense due to the widespread usage of SNS. In addition, a number of studies have focused on opinion mining to analyze the positive and negative opinions and get a better solution for customer support and sales. It is very important to select the key terms which reflected the customers' sentiment on the reviews for opinion mining. We proposed a document-level terms-based sentiment classification model by select in the optimal terms with part of speech tag. SVMs (Support vector machines) are utilized to build a predictor for opinion mining and we used the combination of POS tag and four terms extraction methods for the feature selection of SVM. To validate the proposed opinion mining model, we applied it to the customer reviews on Amazon. We eliminated the unmeaning terms known as the stopwords and extracted the useful terms by using part of speech tagging approach after crawling 80,000 reviews. The extracted terms gained from document frequency, TF-IDF, information gain, chi-squared statistic were ranked and 20 ranked terms were used to the feature of SVM model. Our experimental results show that the performance of SVM model with four POS tags is superior to the benchmarked model, which are built by extracting only adjective terms. In addition, the SVM model based on Chi-squared statistic for opinion mining shows the most superior performance among SVM models with 4 different kinds of terms extraction method. Our proposed opinion mining model is expected to improve customer service and gain competitive advantage in online store.

Feasibility of Deep Learning Algorithms for Binary Classification Problems (이진 분류문제에서의 딥러닝 알고리즘의 활용 가능성 평가)

  • Kim, Kitae;Lee, Bomi;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • 제23권1호
    • /
    • pp.95-108
    • /
    • 2017
  • Recently, AlphaGo which is Bakuk (Go) artificial intelligence program by Google DeepMind, had a huge victory against Lee Sedol. Many people thought that machines would not be able to win a man in Go games because the number of paths to make a one move is more than the number of atoms in the universe unlike chess, but the result was the opposite to what people predicted. After the match, artificial intelligence technology was focused as a core technology of the fourth industrial revolution and attracted attentions from various application domains. Especially, deep learning technique have been attracted as a core artificial intelligence technology used in the AlphaGo algorithm. The deep learning technique is already being applied to many problems. Especially, it shows good performance in image recognition field. In addition, it shows good performance in high dimensional data area such as voice, image and natural language, which was difficult to get good performance using existing machine learning techniques. However, in contrast, it is difficult to find deep leaning researches on traditional business data and structured data analysis. In this study, we tried to find out whether the deep learning techniques have been studied so far can be used not only for the recognition of high dimensional data but also for the binary classification problem of traditional business data analysis such as customer churn analysis, marketing response prediction, and default prediction. And we compare the performance of the deep learning techniques with that of traditional artificial neural network models. The experimental data in the paper is the telemarketing response data of a bank in Portugal. It has input variables such as age, occupation, loan status, and the number of previous telemarketing and has a binary target variable that records whether the customer intends to open an account or not. In this study, to evaluate the possibility of utilization of deep learning algorithms and techniques in binary classification problem, we compared the performance of various models using CNN, LSTM algorithm and dropout, which are widely used algorithms and techniques in deep learning, with that of MLP models which is a traditional artificial neural network model. However, since all the network design alternatives can not be tested due to the nature of the artificial neural network, the experiment was conducted based on restricted settings on the number of hidden layers, the number of neurons in the hidden layer, the number of output data (filters), and the application conditions of the dropout technique. The F1 Score was used to evaluate the performance of models to show how well the models work to classify the interesting class instead of the overall accuracy. The detail methods for applying each deep learning technique in the experiment is as follows. The CNN algorithm is a method that reads adjacent values from a specific value and recognizes the features, but it does not matter how close the distance of each business data field is because each field is usually independent. In this experiment, we set the filter size of the CNN algorithm as the number of fields to learn the whole characteristics of the data at once, and added a hidden layer to make decision based on the additional features. For the model having two LSTM layers, the input direction of the second layer is put in reversed position with first layer in order to reduce the influence from the position of each field. In the case of the dropout technique, we set the neurons to disappear with a probability of 0.5 for each hidden layer. The experimental results show that the predicted model with the highest F1 score was the CNN model using the dropout technique, and the next best model was the MLP model with two hidden layers using the dropout technique. In this study, we were able to get some findings as the experiment had proceeded. First, models using dropout techniques have a slightly more conservative prediction than those without dropout techniques, and it generally shows better performance in classification. Second, CNN models show better classification performance than MLP models. This is interesting because it has shown good performance in binary classification problems which it rarely have been applied to, as well as in the fields where it's effectiveness has been proven. Third, the LSTM algorithm seems to be unsuitable for binary classification problems because the training time is too long compared to the performance improvement. From these results, we can confirm that some of the deep learning algorithms can be applied to solve business binary classification problems.

Towards Improving Causality Mining using BERT with Multi-level Feature Networks

  • Ali, Wajid;Zuo, Wanli;Ali, Rahman;Rahman, Gohar;Zuo, Xianglin;Ullah, Inam
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권10호
    • /
    • pp.3230-3255
    • /
    • 2022
  • Causality mining in NLP is a significant area of interest, which benefits in many daily life applications, including decision making, business risk management, question answering, future event prediction, scenario generation, and information retrieval. Mining those causalities was a challenging and open problem for the prior non-statistical and statistical techniques using web sources that required hand-crafted linguistics patterns for feature engineering, which were subject to domain knowledge and required much human effort. Those studies overlooked implicit, ambiguous, and heterogeneous causality and focused on explicit causality mining. In contrast to statistical and non-statistical approaches, we present Bidirectional Encoder Representations from Transformers (BERT) integrated with Multi-level Feature Networks (MFN) for causality recognition, called BERT+MFN for causality recognition in noisy and informal web datasets without human-designed features. In our model, MFN consists of a three-column knowledge-oriented network (TC-KN), bi-LSTM, and Relation Network (RN) that mine causality information at the segment level. BERT captures semantic features at the word level. We perform experiments on Alternative Lexicalization (AltLexes) datasets. The experimental outcomes show that our model outperforms baseline causality and text mining techniques.

Anomaly Detection Model Based on Semi-Supervised Learning Using LIME: Focusing on Semiconductor Process (LIME을 활용한 준지도 학습 기반 이상 탐지 모델: 반도체 공정을 중심으로)

  • Kang-Min An;Ju-Eun Shin;Dong Hyun Baek
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • 제45권4호
    • /
    • pp.86-98
    • /
    • 2022
  • Recently, many studies have been conducted to improve quality by applying machine learning models to semiconductor manufacturing process data. However, in the semiconductor manufacturing process, the ratio of good products is much higher than that of defective products, so the problem of data imbalance is serious in terms of machine learning. In addition, since the number of features of data used in machine learning is very large, it is very important to perform machine learning by extracting only important features from among them to increase accuracy and utilization. This study proposes an anomaly detection methodology that can learn excellently despite data imbalance and high-dimensional characteristics of semiconductor process data. The anomaly detection methodology applies the LIME algorithm after applying the SMOTE method and the RFECV method. The proposed methodology analyzes the classification result of the anomaly classification model, detects the cause of the anomaly, and derives a semiconductor process requiring action. The proposed methodology confirmed applicability and feasibility through application of cases.

Exploring Potential Application Industry for Fintech Technology by Expanding its Terminology: Network Analysis and Topic Modelling Approach (용어 확장을 통한 핀테크 기술 적용가능 산업의 탐색 :네트워크 분석 및 토픽 모델링 접근)

  • Park, Mingyu;Jeon, Byeongmin;Kim, Jongwoo;Geum, Youngjung
    • The Journal of Society for e-Business Studies
    • /
    • 제26권1호
    • /
    • pp.1-28
    • /
    • 2021
  • FinTech has been discussed as an important business area towards technology-driven financial innovation. The term fintech is a combination of finance and technology, which means ICT technology currently associated with all finance areas. The popularity of the fintech industry has significantly increased over time, with full investment and support for numerous startups. Therefore, both academia and practice tried to analyze the trend of the fintech area. Despite the fact, however, previous research has limitations in terms of collecting relevant databases for fintech and identifying proper application areas. In response, this study proposed a new method for analyzing the trend of Fintech fields by expanding Fintech's terminology and using network analysis and topic modeling. A new Fintech terminology list was created and a total of 18,341 patents were collected from USPTO for 10 years. The co-classification analysis and network analysis was conducted to identify the technological trends of patent classification. In addition, topic modeling was conducted to identify the trends of fintech in order to analyze the contents of fintech. This study is expected to help both managers and investors who want to be involved in technology-driven financial services seize new FinTech technology opportunities.

A Hybrid Feature Selection Method using Univariate Analysis and LVF Algorithm (단변량 분석과 LVF 알고리즘을 결합한 하이브리드 속성선정 방법)

  • Lee, Jae-Sik;Jeong, Mi-Kyoung
    • Journal of Intelligence and Information Systems
    • /
    • 제14권4호
    • /
    • pp.179-200
    • /
    • 2008
  • We develop a feature selection method that can improve both the efficiency and the effectiveness of classification technique. In this research, we employ case-based reasoning as a classification technique. Basically, this research integrates the two existing feature selection methods, i.e., the univariate analysis and the LVF algorithm. First, we sift some predictive features from the whole set of features using the univariate analysis. Then, we generate all possible subsets of features from these predictive features and measure the inconsistency rate of each subset using the LVF algorithm. Finally, the subset having the lowest inconsistency rate is selected as the best subset of features. We measure the performances of our feature selection method using the data obtained from UCI Machine Learning Repository, and compare them with those of existing methods. The number of selected features and the accuracy of our feature selection method are so satisfactory that the improvements both in efficiency and effectiveness are achieved.

  • PDF

A Model for Effective Customer Classification Using LTV and Churn Probability : Application of Holistic Profit Method (고객의 이탈 가능성과 LTV를 이용한 고객등급화 모형개발에 관한 연구)

  • Lee, HoonYoung;Yang, JooHwan;Ryu, Chi Hun
    • Journal of Intelligence and Information Systems
    • /
    • 제12권4호
    • /
    • pp.109-126
    • /
    • 2006
  • An effective customer classification has been essential for the successful customer relationship management. The typical customer rating is carried out by the proportionally allocating the customers into classes in terms of their life time values. However, since this method does not accurately reflect the homogeneity within a class along with the heterogeneity between classes, there would be many problems incurred due to the misclassification. This paper suggests a new method of rating customer using Holistic profit technique, and validates the new method using the customer data provided by an insurance company. Holistic profit is one of the methods used for deciding the cutoff score in screening the loan application. By rating customers using the proposed techniques, insurance companies could effectively perform customer relationship management and diverse marketing activities.

  • PDF

Analysis of the Recall Demand Pattern of Imported Cars and Application of ARIMA Demand Forecasting Model (수입자동차 리콜 수요패턴 분석과 ARIMA 수요 예측모형의 적용)

  • Jeong, Sangcheon;Park, Sohyun;Kim, Seungchul
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • 제43권4호
    • /
    • pp.93-106
    • /
    • 2020
  • This research explores how imported automobile companies can develop their strategies to improve the outcome of their recalls. For this, the researchers analyzed patterns of recall demand, classified recall types based on the demand patterns and examined response strategies, considering plans on how to procure parts and induce customers to visit workshops, recall execution capacity and costs. As a result, recalls are classified into four types: U-type, reverse U-type, L- type and reverse L-type. Also, as determinants of the types, the following factors are further categorized into four types and 12 sub-types of recalls: the height of maximum demand, which indicates the volatility of recall demand; the number of peaks, which are the patterns of demand variations; and the tail length of the demand curve, which indicates the speed of recalls. The classification resulted in the following: L-type, or customer-driven recall, is the most common type of recalls, taking up 25 out of the total 36 cases, followed by five U-type, four reverse L-type, and two reverse U-type cases. Prior studies show that the types of recalls are determined by factors influencing recall execution rates: severity, the number of cars to be recalled, recall execution rate, government policies, time since model launch, and recall costs, etc. As a component demand forecast model for automobile recalls, this study estimated the ARIMA model. ARIMA models were shown in three models: ARIMA (1,0,0), ARIMA (0,0,1) and ARIMA (0,0,0). These all three ARIMA models appear to be significant for all recall patterns, indicating that the ARIMA model is very valid as a predictive model for car recall patterns. Based on the classification of recall types, we drew some strategic implications for recall response according to types of recalls. The conclusion section of this research suggests the implications for several aspects: how to improve the recall outcome (execution rate), customer satisfaction, brand image, recall costs, and response to the regulatory authority.

Improving the Utilization and Efficiency of B2B Online Store using DEA (DEA를 이용한 B2B 온라인 쇼핑몰 상품관리 효율성 증대 방안)

  • Gu, Seung-Hwan;Park, Hyun-Ki;Jang, Seong Yong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • 제15권7호
    • /
    • pp.4237-4245
    • /
    • 2014
  • In this study, products in a B2B online shopping mall were classified efficiently using DEA, and an operational process is presented. The results using the data of M company were used to calculate the workload according to the category. The work load of managing the product using the DEA has been distributed evenly. In addition, the classification of A is composed of the highest net income, and it was intended to be managed centrally by the company. Business classifications C and B, which were made of a low severity workload, were reduced. Therefore, efficient operation is possible when applied to an actual business.