• Title/Summary/Keyword: Decision trees

Search Result 308, Processing Time 0.022 seconds

Decision Tree Induction with Imbalanced Data Set: A Case of Health Insurance Bill Audit in a General Hospital (불균형 데이터 집합에서의 의사결정나무 추론: 종합 병원의 건강 보험료 청구 심사 사례)

  • Hur, Joon;Kim, Jong-Woo
    • Information Systems Review
    • /
    • v.9 no.1
    • /
    • pp.45-65
    • /
    • 2007
  • In medical industry, health insurance bill audit is unique and essential process in general hospitals. The health insurance bill audit process is very important because not only for hospital's profit but also hospital's reputation. Particularly, at the large general hospitals many related workers including analysts, nurses, and etc. have engaged in the health insurance bill audit process. This paper introduces a case of health insurance bill audit for finding reducible health insurance bill cases using decision tree induction techniques at a large general hospital in Korea. When supervised learning methods had been tried to be applied, one of major problems was data imbalance problem in the health insurance bill audit data. In other words, there were many normal(passing) cases and relatively small number of reduction cases in a bill audit dataset. To resolve the problem, in this study, well-known methods for imbalanced data sets including over sampling of rare cases, under sampling of major cases, and adjusting the misclassification cost are combined in several ways to find appropriate decision trees that satisfy required conditions in health insurance bill audit situation.

Refining Rules of Decision Tree Using Extended Data Expression (확장형 데이터 표현을 이용하는 이진트리의 룰 개선)

  • Jeon, Hae Sook;Lee, Won Don
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.6
    • /
    • pp.1283-1293
    • /
    • 2014
  • In ubiquitous environment, data are changing rapidly and new data is coming as times passes. And sometimes all of the past data will be lost if there is not sufficient space in memory. Therefore, there is a need to make rules and combine it with new data not to lose all the past data or to deal with large amounts of data. In making decision trees and extracting rules, the weight of each of rules is generally determined by the total number of the class at leaf. The computational problem of finding a minimum finite state acceptor compatible with given data is NP-hard. We assume that rules extracted are not correct and may have the loss of some information. Because of this precondition. this paper presents a new approach for refining rules. It controls their weight of rules of previous knowledge or data. In solving rule refinement, this paper tries to make a variety of rules with pruning method with majority and minority properties, control weight of each of rules and observe the change of performances. In this paper, the decision tree classifier with extended data expression having static weight is used for this proposed study. Experiments show that performances conducted with a new policy of refining rules may get better.

Smart monitoring system with multi-criteria decision using a feature based computer vision technique

  • Lin, Chih-Wei;Hsu, Wen-Ko;Chiou, Dung-Jiang;Chen, Cheng-Wu;Chiang, Wei-Ling
    • Smart Structures and Systems
    • /
    • v.15 no.6
    • /
    • pp.1583-1600
    • /
    • 2015
  • When natural disasters occur, including earthquakes, tsunamis, and debris flows, they are often accompanied by various types of damages such as the collapse of buildings, broken bridges and roads, and the destruction of natural scenery. Natural disaster detection and warning is an important issue which could help to reduce the incidence of serious damage to life and property as well as provide information for search and rescue afterwards. In this study, we propose a novel computer vision technique for debris flow detection which is feature-based that can be used to construct a debris flow event warning system. The landscape is composed of various elements, including trees, rocks, and buildings which are characterized by their features, shapes, positions, and colors. Unlike the traditional methods, our analysis relies on changes in the natural scenery which influence changes to the features. The "background module" and "monitoring module" procedures are designed and used to detect debris flows and construct an event warning system. The multi-criteria decision-making method used to construct an event warring system includes gradient information and the percentage of variation of the features. To prove the feasibility of the proposed method for detecting debris flows, some real cases of debris flows are analyzed. The natural environment is simulated and an event warning system is constructed to warn of debris flows. Debris flows are successfully detected using these two procedures, by analyzing the variation in the detected features and the matched feature. The feasibility of the event warning system is proven using the simulation method. Therefore, the feature based method is found to be useful for detecting debris flows and the event warning system is triggered when debris flows occur.

Performances analysis of football matches (축구경기의 경기력분석)

  • Min, Dae Kee;Lee, Young-Soo;Kim, Yong-Rae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.1
    • /
    • pp.187-196
    • /
    • 2015
  • The team's performances were analyzed by evaluating the scores gained by their offense and the scores allowed by their defense. To evaluate the team's attacking and defending abilities, we also considered the factors that contributed the team's gained points or the opposing team's gained points? In order to analyze the outcome of the games, three prediction models were used such as decision trees, logistic regression, and discriminant analysis. As a result, the factors associated with the defense showed a decisive influence in determining the game results. We analyzed the offense and defense by using the response variable. This showed that the major factors predicting the offense were non-stop pass and attack speed and the major factor predicting the defense were the distance between right and left players and the distance between front line attackers and rearmost defenders during the game.

Spam-Filtering by Identifying Automatically Generated Email Accounts (자동 생성 메일계정 인식을 통한 스팸 필터링)

  • Lee Sangho
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.378-384
    • /
    • 2005
  • In this paper, we describe a novel method of spam-filtering to improve the performance of conventional spam-filtering systems. Conventional systems filter emails by investigating words distribution in email headers or bodies. Nowadays, spammers begin making email accounts in web-based email service sites and sending emails as if they are not spams. Investigating the email accounts of those spams, we notice that there is a large difference between the automatically generated accounts and ordinaries. Based on that difference, incoming emails are classified into spam/non-spam classes. To classify emails from only account strings, we used decision trees, which have been generally used for conventional pattern classification problems. We collected about 2.15 million account strings from email service sites, and our account checker resulted in the accuracy of $96.3\%$. The previous filter system with the checker yielded the improved filtering performance.

Efficient DRG Fraud Candidate Detection Method Using Data Mining Techniques (데이터마이닝 기법을 이용한 효율적인 DRG 확인심사대상건 검색방법)

  • Lee, Jung-Kyu;Jo, Min-Woo;Park, Ki-Dong;Lee, Moo-Song;Lee, Sang-Il;Kim, Chang-Yup;Kim, Yong-Ik;Hong, Du-Ho
    • Journal of Preventive Medicine and Public Health
    • /
    • v.36 no.2
    • /
    • pp.147-152
    • /
    • 2003
  • Objectives : To develop a Diagnosis-Related Group (DRG) fraud candidate detection method, using data mining techniques, and to examine the efficiency of the developed method. Methods ; The Study included 79,790 DRGs and their related claims of 8 disease groups (Lens procedures, with or without, vitrectomy, tonsillectomy and/or adenoidectomy only, appendectomy, Cesarean section, vaginal delivery, anal and/or perianal procedures, inguinal and/or femoral hernia procedures, uterine and/or adnexa procedures for nonmalignancy), which were examined manually during a 32 months period. To construct an optimal prediction model, 38 variables were applied, and the correction rate and lift value of 3 models (decision tree, logistic regression, neural network) compared. The analyses were peformed separately by disease group. Results : The correction rates of the developed method, using data mining techniques, were 15.4 to 81.9%, according to disease groups, with an overall correction rate of 60.7%. The lift values were 1.9 to 7.3 according to disease groups, with an overall lift value of 4.1. Conclusions : The above findings suggested that the applying of data mining techniques is necessary to improve the efficiency of DRG fraud candidate detection.

The detection of cavitation in hydraulic machines by use of ultrasonic signal analysis

  • Gruber, P.;Farhat, M.;Odermatt, P.;Etterlin, M.;Lerch, T.;Frei, M.
    • International Journal of Fluid Machinery and Systems
    • /
    • v.8 no.4
    • /
    • pp.264-273
    • /
    • 2015
  • This presentation describes an experimental approach for the detection of cavitation in hydraulic machines by use of ultrasonic signal analysis. Instead of using the high frequency pulses (typically 1MHz) only for transit time measurement different other signal characteristics are extracted from the individual signals and its correlation function with reference signals in order to gain knowledge of the water conditions. As the pulse repetition rate is high (typically 100Hz), statistical parameters can be extracted of the signals. The idea is to find patterns in the parameters by a classifier that can distinguish between the different water states. This classification scheme has been applied to different cavitation sections: a sphere in a water flow in circular tube at the HSLU in Lucerne, a NACA profile in a cavitation tunnel and two Francis model test turbines all at LMH in Lausanne. From the signal raw data several statistical parameters in the time and frequency domain as well as from the correlation function with reference signals have been determined. As classifiers two methods were used: neural feed forward networks and decision trees. For both classification methods realizations with lowest complexity as possible are of special interest. It is shown that two to three signal characteristics, two from the signal itself and one from the correlation function are in many cases sufficient for the detection capability. The final goal is to combine these results with operating point, vibration, acoustic emission and dynamic pressure information such that a distinction between dangerous and not dangerous cavitation is possible.

Real-time Estimation on Service Completion Time of Logistics Process for Container Vessels (선박 물류 프로세스의 실시간 서비스 완료시간 예측에 대한 연구)

  • Yun, Shin-Hwi;Ha, Byung-Hyun
    • The Journal of Society for e-Business Studies
    • /
    • v.17 no.2
    • /
    • pp.149-163
    • /
    • 2012
  • Logistics systems provide their service to customers by coordinating the resources with limited capacity throughout the underlying processes involved to each other. To maintain the high level of service under such complicated condition, it is essential to carry out the real-time monitoring and continuous management of logistics processes. In this study, we propose a method of estimating the service completion time of key processes based on process-state information collected in real time. We first identify the factors that influence the process completion time by modeling and analyzing an influence diagram, and then suggest algorithms for quantifying the factors. We suppose the container terminal logistics and the process of discharging and loading containers to a vessel. The remaining service time of a vessel is estimated using a decision tree which is the result of machine-learning using historical data. We validated the estimation model using container terminal simulation. The proposed model is expected to improve competitiveness of logistics systems by forecasting service completion in real time, as well as to prevent the waste of resources.

How different is a web site that many people visit?-focused on the Plastic Surgery Websites in Korea (많은 사람이 방문하는 웹 사이트는 무엇이 다를까? - 2011년 성형외과 웹 사이트의 경우 -)

  • Cho, Yeong-Bin;Kim, Chae-Bogk
    • Management & Information Systems Review
    • /
    • v.32 no.1
    • /
    • pp.43-62
    • /
    • 2013
  • In order to know the characteristics of high visit web sites that many people have visited, 37 high visit websites of plastic surgery were compared to 69 benchmark sites of same industry. We selected 36 web site attributes that can be measured objectively from existing studies and composed the data set of 36 attributes multiplied by 106 websites. For analysis, Multiple Discriminant Analysis(MDA) and Decision Tree Technique are conducted for searching what attributes divide two group definitely. The result of this study shows the dividing attributes fall into 3 categories like 'Community', 'Mobile', 'Up to date'. Thus, we are able to conclude that high visit plastic surgery web sites are community centric site but not contents centric, response a change to mobile environment rapidly and are maintained with tide up to date. The methodology employed in this study provides an efficient way of improving satisfaction of visitors of plastic surgery website.

  • PDF

Extracting Specific Information in Web Pages Using Machine Learning (머신러닝을 이용한 웹페이지 내의 특정 정보 추출)

  • Lee, Joung-Yun;Kim, Jae-Gon
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.41 no.4
    • /
    • pp.189-195
    • /
    • 2018
  • With the advent of the digital age, production and distribution of web pages has been exploding. Internet users frequently need to extract specific information they want from these vast web pages. However, it takes lots of time and effort for users to find a specific information in many web pages. While search engines that are commonly used provide users with web pages containing the information they are looking for on the Internet, additional time and efforts are required to find the specific information among extensive search results. Therefore, it is necessary to develop algorithms that can automatically extract specific information in web pages. Every year, thousands of international conference are held all over the world. Each international conference has a website and provides general information for the conference such as the date of the event, the venue, greeting, the abstract submission deadline for a paper, the date of the registration, etc. It is not easy for researchers to catch the abstract submission deadline quickly because it is displayed in various formats from conference to conference and frequently updated. This study focuses on the issue of extracting abstract submission deadlines from International conference websites. In this study, we use three machine learning models such as SVM, decision trees, and artificial neural network to develop algorithms to extract an abstract submission deadline in an international conference website. Performances of the suggested algorithms are evaluated using 2,200 conference websites.