• Title/Summary/Keyword: Classification of Information System

Search Result 3,002, Processing Time 0.035 seconds

Response Modeling for the Marketing Promotion with Weighted Case Based Reasoning Under Imbalanced Data Distribution (불균형 데이터 환경에서 변수가중치를 적용한 사례기반추론 기반의 고객반응 예측)

  • Kim, Eunmi;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.29-45
    • /
    • 2015
  • Response modeling is a well-known research issue for those who have tried to get more superior performance in the capability of predicting the customers' response for the marketing promotion. The response model for customers would reduce the marketing cost by identifying prospective customers from very large customer database and predicting the purchasing intention of the selected customers while the promotion which is derived from an undifferentiated marketing strategy results in unnecessary cost. In addition, the big data environment has accelerated developing the response model with data mining techniques such as CBR, neural networks and support vector machines. And CBR is one of the most major tools in business because it is known as simple and robust to apply to the response model. However, CBR is an attractive data mining technique for data mining applications in business even though it hasn't shown high performance compared to other machine learning techniques. Thus many studies have tried to improve CBR and utilized in business data mining with the enhanced algorithms or the support of other techniques such as genetic algorithm, decision tree and AHP (Analytic Process Hierarchy). Ahn and Kim(2008) utilized logit, neural networks, CBR to predict that which customers would purchase the items promoted by marketing department and tried to optimized the number of k for k-nearest neighbor with genetic algorithm for the purpose of improving the performance of the integrated model. Hong and Park(2009) noted that the integrated approach with CBR for logit, neural networks, and Support Vector Machine (SVM) showed more improved prediction ability for response of customers to marketing promotion than each data mining models such as logit, neural networks, and SVM. This paper presented an approach to predict customers' response of marketing promotion with Case Based Reasoning. The proposed model was developed by applying different weights to each feature. We deployed logit model with a database including the promotion and the purchasing data of bath soap. After that, the coefficients were used to give different weights of CBR. We analyzed the performance of proposed weighted CBR based model compared to neural networks and pure CBR based model empirically and found that the proposed weighted CBR based model showed more superior performance than pure CBR model. Imbalanced data is a common problem to build data mining model to classify a class with real data such as bankruptcy prediction, intrusion detection, fraud detection, churn management, and response modeling. Imbalanced data means that the number of instance in one class is remarkably small or large compared to the number of instance in other classes. The classification model such as response modeling has a lot of trouble to recognize the pattern from data through learning because the model tends to ignore a small number of classes while classifying a large number of classes correctly. To resolve the problem caused from imbalanced data distribution, sampling method is one of the most representative approach. The sampling method could be categorized to under sampling and over sampling. However, CBR is not sensitive to data distribution because it doesn't learn from data unlike machine learning algorithm. In this study, we investigated the robustness of our proposed model while changing the ratio of response customers and nonresponse customers to the promotion program because the response customers for the suggested promotion is always a small part of nonresponse customers in the real world. We simulated the proposed model 100 times to validate the robustness with different ratio of response customers to response customers under the imbalanced data distribution. Finally, we found that our proposed CBR based model showed superior performance than compared models under the imbalanced data sets. Our study is expected to improve the performance of response model for the promotion program with CBR under imbalanced data distribution in the real world.

The Risk Assessment of the Fire Occurrence According to Urban Facilities in Jinju-si (진주시 도시시설물별 화재발생 위험도 평가)

  • Bae, Gyu Han;Won, Tae Hong;Yoo, Hwan Hee
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.24 no.1
    • /
    • pp.43-50
    • /
    • 2016
  • Urbanization in Korea has increased significantly and subsequently, various facilities have been concentrated in urban areas at high speed in accordance with a growing urban population. Accordingly, damages have occurred due to a variety of disasters. In particular, fire damage among the social disasters caused the most severe damage in urban areas along with traffic accidents. 44,432 cases of fire occurred in 2015 in Korea. Due to these accidents, 253 were killed and property damage of 4,50 billion won was generated. However, despite the efforts to reduce a variety of damage, fire danger still remains high. In this regard, this study collected fire data, generated from 2007 to 2014 through the Jinju Fire Department and the National Fire Data System(NFDS) and calculated fire risk by analyzing the clustering of fire cases and facilities in Jinju-si based on the current DB of facilities, offered by the Ministry of Government Administration and Home Affairs. As a result, the risk ratings of fire occurrence were classified as four stages under the standards of the US Society of Fire Protection Engineers(SEPE). Business facilities, entertainment facilities, and automobile facilities were classified as the highest A grade, detached houses, Apartment houses, education facilities, sales facilities, accommodation, set of facilities, medical facilities, industrial facilities, and life service facilities were classified as U grade, and other facilities were classified as EU grade. Finally, hazardous production facilities were classified as BEU grade, the lowest grade. In addition, in the case of setting the standard with loss of life, the highest risk facility was the hazardous production facilities, while in the case of setting the standard with property damage, a set of facilities and industrial facilities showed the highest risk. In this regard, this study is expected to be effectively utilized to establish the fire reduction measures against facilities, distributed in urban space by calculating risk grades regarding the generation frequency, casualties, and property damage, through the classification of fire, occurred in the city, according to the facilities.

Understanding the Legal Structure of German Human Gene Testing Act (GenDG) (독일 유전자검사법의 규율 구조 이해 - 의료 목적 유전자검사의 문제를 중심으로 -)

  • Kim, Na-Kyoung
    • The Korean Society of Law and Medicine
    • /
    • v.17 no.2
    • /
    • pp.85-124
    • /
    • 2016
  • The Human gene testing act (GenDG) in Germany starts from the characteristic features of gene testing, i.e. dualisting structure consisted of anlaysis on the one side and the interpretation on the other side. The linguistic distincion of 'testing', 'anlaysis' and 'judgment' in the act is a fine example. Another important basis of the regulation is the ideological purpose of the law, that is information autonomy. The normative texts as such and the founding principle are the basis of the classification of testing types. Especially in the case of gene testing for medical purpose is classified into testing for diagnostic purpose and predictive purpose. However, those two types are not always clearly differentiated because the predictive value of testing is common in both types. In the legal regulation of gene testing it is therefore important to manage the uncertainty and subjectivity which are inherent in the gene-analysis and the judgment. In GenDG the system ensuring the quality of analysis is set up and GEKO(Commity for gene tisting) based on the section 23 of GenDG concretes the criterium of validity through guidelines. It is also very important in the case of gene testing for medical purpose to set up the system for ensurement of procedural rationality of the interpretation. The interpretation of the results of analysis has a wide spectrum because of the consistent development of technology on the one side and different understandings of different subjects who performs gene testings. Therefore the process should include the communication process for patients in oder that he or she could understand the meaning of gene testing and make plans of life. In GenDG the process of genetic counselling and GEKO concretes the regulation very precisely. The regulation as such in GenDG seems to be very suggestive to Korean legal polic concerning the gene testing.

  • PDF

Development of a water quality prediction model for mineral springs in the metropolitan area using machine learning (머신러닝을 활용한 수도권 약수터 수질 예측 모델 개발)

  • Yeong-Woo Lim;Ji-Yeon Eom;Kee-Young Kwahk
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.307-325
    • /
    • 2023
  • Due to the prolonged COVID-19 pandemic, the frequency of people who are tired of living indoors visiting nearby mountains and national parks to relieve depression and lethargy has exploded. There is a place where thousands of people who came out of nature stop walking and breathe and rest, that is the mineral spring. Even in mountains or national parks, there are about 600 mineral springs that can be found occasionally in neighboring parks or trails in the metropolitan area. However, due to irregular and manual water quality tests, people drink mineral water without knowing the test results in real time. Therefore, in this study, we intend to develop a model that can predict the quality of the spring water in real time by exploring the factors affecting the quality of the spring water and collecting data scattered in various places. After limiting the regions to Seoul and Gyeonggi-do due to the limitations of data collection, we obtained data on water quality tests from 2015 to 2020 for about 300 mineral springs in 18 cities where data management is well performed. A total of 10 factors were finally selected after two rounds of review among various factors that are considered to affect the suitability of the mineral spring water quality. Using AutoML, an automated machine learning technology that has recently been attracting attention, we derived the top 5 models based on prediction performance among about 20 machine learning methods. Among them, the catboost model has the highest performance with a prediction classification accuracy of 75.26%. In addition, as a result of examining the absolute influence of the variables used in the analysis through the SHAP method on the prediction, the most important factor was whether or not a water quality test was judged nonconforming in the previous water quality test. It was confirmed that the temperature on the day of the inspection and the altitude of the mineral spring had an influence on whether the water quality was unsuitable.

Bankruptcy prediction using an improved bagging ensemble (개선된 배깅 앙상블을 활용한 기업부도예측)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.121-139
    • /
    • 2014
  • Predicting corporate failure has been an important topic in accounting and finance. The costs associated with bankruptcy are high, so the accuracy of bankruptcy prediction is greatly important for financial institutions. Lots of researchers have dealt with the topic associated with bankruptcy prediction in the past three decades. The current research attempts to use ensemble models for improving the performance of bankruptcy prediction. Ensemble classification is to combine individually trained classifiers in order to gain more accurate prediction than individual models. Ensemble techniques are shown to be very useful for improving the generalization ability of the classifier. Bagging is the most commonly used methods for constructing ensemble classifiers. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. Instance selection is to select critical instances while deleting and removing irrelevant and harmful instances from the original set. Instance selection and bagging are quite well known in data mining. However, few studies have dealt with the integration of instance selection and bagging. This study proposes an improved bagging ensemble based on instance selection using genetic algorithms (GA) for improving the performance of SVM. GA is an efficient optimization procedure based on the theory of natural selection and evolution. GA uses the idea of survival of the fittest by progressively accepting better solutions to the problems. GA searches by maintaining a population of solutions from which better solutions are created rather than making incremental changes to a single solution to the problem. The initial solution population is generated randomly and evolves into the next generation by genetic operators such as selection, crossover and mutation. The solutions coded by strings are evaluated by the fitness function. The proposed model consists of two phases: GA based Instance Selection and Instance based Bagging. In the first phase, GA is used to select optimal instance subset that is used as input data of bagging model. In this study, the chromosome is encoded as a form of binary string for the instance subset. In this phase, the population size was set to 100 while maximum number of generations was set to 150. We set the crossover rate and mutation rate to 0.7 and 0.1 respectively. We used the prediction accuracy of model as the fitness function of GA. SVM model is trained on training data set using the selected instance subset. The prediction accuracy of SVM model over test data set is used as fitness value in order to avoid overfitting. In the second phase, we used the optimal instance subset selected in the first phase as input data of bagging model. We used SVM model as base classifier for bagging ensemble. The majority voting scheme was used as a combining method in this study. This study applies the proposed model to the bankruptcy prediction problem using a real data set from Korean companies. The research data used in this study contains 1832 externally non-audited firms which filed for bankruptcy (916 cases) and non-bankruptcy (916 cases). Financial ratios categorized as stability, profitability, growth, activity and cash flow were investigated through literature review and basic statistical methods and we selected 8 financial ratios as the final input variables. We separated the whole data into three subsets as training, test and validation data set. In this study, we compared the proposed model with several comparative models including the simple individual SVM model, the simple bagging model and the instance selection based SVM model. The McNemar tests were used to examine whether the proposed model significantly outperforms the other models. The experimental results show that the proposed model outperforms the other models.

ICT Medical Service Provider's Knowledge and level of recognizing how to cope with fire fighting safety (ICT 의료시설 기반에서 종사자의 소방안전 지식과 대처방법 인식수준)

  • Kim, Ja-Sook;Kim, Ja-Ok;Ahn, Young-Joon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.9 no.1
    • /
    • pp.51-60
    • /
    • 2014
  • In this study, ICT medical service provider's level of knowledge fire fighting safety and methods on coping with fires in the regions of Gwangju and Jeonam Province of Korea were investigated to determine the elements affecting such levels and provide basic information on the manuals for educating how to cope with the fire fighting safety in medical facilities. The data were analyzed using SPSS Win 14.0. The scores of level of knowledge fire fighting safety of ICT medical service provider's were 7.06(10 point scale), and the scores of level of recognizing how to cope with fire fighting safety were 6.61(11 point scale). level of recognizing how to cope with fire fighting safety were significantly different according to gender(t=4.12, p<.001), age(${\chi}^2$=17.24, p<.001), length of career(${\chi}^2$=22.76, p<.001), experience with fire fighting safety education(t=6.10, p<.001), level of subjective knowledge on fire fighting safety(${\chi}^2$=53.83, p<.001). In order to enhance the level of understanding of fire fighting safety and methods of coping by the ICT medical service providers it is found that: self-directed learning through avoiding the education just conveying knowledge by lecture tailored learning for individuals fire fighting education focused on experiencing actual work by developing various contents emphasizing cooperative learning deploying patients by classification systems using simulations and a study on the implementation of digital anti-fire monitoring system with multipoint communication protocol, a design and development of the smoke detection system using infra-red laser for fire detection in the wide space, video based fire detection algorithm using gaussian mixture mode developing an education manual for coping with fire fighting safety through multi learning approach at the medical facilities are required.

Research Trend and Futuristic Guideline of Platform-Based Business in Korea (플랫폼 기반 비즈니스에 대한 국내 연구동향 및 미래를 위한 가이드라인)

  • Namn, Su Hyeon
    • Management & Information Systems Review
    • /
    • v.39 no.1
    • /
    • pp.93-114
    • /
    • 2020
  • Platform is considered as an alternative strategy to the traditional linear pipeline based business. Moreover, in the 4th industrial revolution period, efficiency driven pipeline business model needs to be changed to platform business. We have such success stories about platform as Apple, Google, Amazon, Uber, and so on. However, for those smaller corporations, it is not easy to find out the transformation strategy. The essence of platform business is to leverage network effect in management. Thus platform based management can be rephrased as network management across the business functions. Research on platform business is popular and related to diverse facets. But few scholars cover what the research trend of the domain is. The main purpose of this paper is to identify the research trend on platform business in Korea. To do that we first propose the analytical model for platform architecture whose components are consumers, suppliers, artifacts, and IT platform system. We conjecture that mapping of the research work on platform to the components of the model will make us understand the hidden domain of platform research. We propose three hypotheses regarding the characteristics of research and one proposition for the transitional path from pipeline to platform business model. The mapping is based on the research articles filtered from the Korea Citation Index, using keyword search. Research papers are searched through the keywords provided by authors using the word of "platform". The filtered articles are summarized in terms of the attributes such as major component of platform considered, platform type, main purpose of the research, and research method. Using the filtered data, we test the hypotheses in exploratory ways. The contribution of our research is as follows: First, based on the findings, scholars can find the areas of research on the domain: areas where research has been matured and territory where future research is actively sought. Second, the proposition provided can give business practitioners the guideline for changing their strategy from pipeline to platform oriented. This research needs to be considered as exploratory not inferential since subjective judgments are involved in data collection, classification, and interpretation of research articles.

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.

Study of Geological Log Database for Public Wells, Jeju Island (제주도 공공 관정 지질주상도 DB 구축 소개)

  • Pak, Song-Hyon;Koh, Giwon;Park, Junbeom;Moon, Dukchul;Yoon, Woo Seok
    • Economic and Environmental Geology
    • /
    • v.48 no.6
    • /
    • pp.509-523
    • /
    • 2015
  • This study introduces newly implemented geological well logs database for Jeju public water wells, built for a research project focusing on integrated hydrogeology database of Jeju Island. A detailed analysis of the existing 1,200 Jeju Island geological logs for the public wells developed since 1970 revealed six major indications to be improved for their use in Jeju geological logs DB construction: (1) lack of uniformity in rock name classification, (2) poor definitions of pyroclastic deposits and sand and gravel layers, (3) lack of well borehole aquifer information, (4) lack of information on well screen installation in many water wells, (5) differences by person in geological logging descriptions. A new Jeju geological logs DB enabling standardized input and output formats has been implemented to overcome the above indications by reestablishing the names of Jeju volcanic and sedimentary rocks and utilizing a commercial, database-based input structured, geological log program. The newly designed database structure in geological log program enables users to store a large number of geology, well drilling, and test data at the standardized DB input structure. Also, well borehole groundwater and aquifer test data can be easily added without modifying the existing database structure. Thus, the newly implemented geological logs DB could be a standardized DB for a large number of Jeju existing public wells and new wells to be developed in the future at Jeju Island. Also, the new geological logs DB will be a basis for ongoing project 'Developing GIS-based integrated interpretation system for Jeju Island hydrogeology'.

Effectiveness Enhancement Measures for Local Government Environmental Impact Assessment (EIA) by Improving Small-scale EIA Institution (소규모 환경영향평가 제도개선을 통한 지자체 환경영향평가 효과성 증진방안)

  • Jongook Lee;Kyeong Doo Cho
    • Journal of Environmental Impact Assessment
    • /
    • v.32 no.1
    • /
    • pp.15-28
    • /
    • 2023
  • In the Republic of Korea, the target project scope of the small-scale EIA is stipulated as the plan area above around 5,000~60,000m2 depending on a type of project and classification of land use. Whereas, the lower limit of the corresponding local government EIA project is generally located above the small-scale EIA's limits, and overlapping ranges exist. This situation has been enlarged since road construction and district unit planning were included as the target projects for small-scale EIA in the "Enforcement Decree of the Environmental Impact Assessment Act", which was partially revised in November 2016, and the current consultation system needed discussion in that small-scale EIA is allowed to be done without gathering review opinions at the local level. In fact, projects subjected to local government EIA but consulted as small-scale EIAs may seem insignificant because of a small number of total cases; however, it is worth paying attention to the fact that a local government may not add a target project due to the small-scale EIA. This study suggested the three policy measures for improving small-scale EIA to enhance the effectiveness of local government EIA: supplementing the institutional arrangements to incorporate the review opinion from the local region in small-scale EIA, giving priority to local EIA for conducing the projects in overlapping ranges with partial amendments on EIA law regarding exceptions to local government EIA, including small target projects (not to be small-scale EIA targets) to the ordinance that are deemed necessary to be conducted as local government EIA. Even though a positive function of small-scale EIA has been confirmed, efforts should be made to improve the situation in which many projects within local governments are consulted without review from the region.