• Title/Summary/Keyword: 의사결정 알고리즘

Search Result 583, Processing Time 0.025 seconds

Illegal Cash Accommodation Detection Modeling Using Ensemble Size Reduction (신용카드 불법현금융통 적발을 위한 축소된 앙상블 모형)

  • Lee, Hwa-Kyung;Han, Sang-Bum;Jhee, Won-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.1
    • /
    • pp.93-116
    • /
    • 2010
  • Ensemble approach is applied to the detection modeling of illegal cash accommodation (ICA) that is the well-known type of fraudulent usages of credit cards in far east nations and has not been addressed in the academic literatures. The performance of fraud detection model (FDM) suffers from the imbalanced data problem, which can be remedied to some extent using an ensemble of many classifiers. It is generally accepted that ensembles of classifiers produce better accuracy than a single classifier provided there is diversity in the ensemble. Furthermore, recent researches reveal that it may be better to ensemble some selected classifiers instead of all of the classifiers at hand. For the effective detection of ICA, we adopt ensemble size reduction technique that prunes the ensemble of all classifiers using accuracy and diversity measures. The diversity in ensemble manifests itself as disagreement or ambiguity among members. Data imbalance intrinsic to FDM affects our approach for ICA detection in two ways. First, we suggest the training procedure with over-sampling methods to obtain diverse training data sets. Second, we use some variants of accuracy and diversity measures that focus on fraud class. We also dynamically calculate the diversity measure-Forward Addition and Backward Elimination. In our experiments, Neural Networks, Decision Trees and Logit Regressions are the base models as the ensemble members and the performance of homogeneous ensembles are compared with that of heterogeneous ensembles. The experimental results show that the reduced size ensemble is as accurate on average over the data-sets tested as the non-pruned version, which provides benefits in terms of its application efficiency and reduced complexity of the ensemble.

A Study on the Development of GIS Based Mitigation Scenario Support System Using QUAL2E Model for TMDL (TMDL 지원을 위한 QUAL2E 모델을 이용한 GIS기반의 삭감시나리오 작성 지원시스템 개발에 관한 연구)

  • Lee, Chol-Young;Kim, Kye-Hyun;Lee, Hyuk;Ryu, Kwang-Hyun
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.34 no.3
    • /
    • pp.177-188
    • /
    • 2012
  • This study was mainly focused on the development of GIS based decision support system to easily make mitigation scenarios and to conveniently simulate water quality for TMDL. The study area was the 31km section of upper Sapgyo stream in Geum river basin, and QUAL2E model was adopted. GIS DB was built through the collection of the data which includes point/non-point source attributes and various thematic maps. The amounts of discharged loads of BOD, T-N and T-P from unit watershed were estimated respectively. Finally, the system, which can operate water quality simulation through simply modifying their values, was developed. The hypothetical three mitigation scenarios were applied, thereby the most efficient mitigation scenario could be chosen by comparison of the results based on GIS. Therefore, it is expected that the developed system can facilitate the decision makers to select the best alternative through the analysis of the available BMPs. Also, it can be used to develop new scenarios using different methods and algorithms. In the future, more study need to be made to enhance its applicability in the perspective of developing mitigation scenarios through the management of individual pollutant sources and extending study areas.

A Study on Condition Analysis of Revised Project Level of Gravity Port facility using Big Data (빅데이터 분석을 통한 중력식 항만시설 수정프로젝트 레벨의 상태변화 특성 분석)

  • Na, Yong Hyoun;Park, Mi Yeon;Jang, Shinwoo
    • Journal of the Society of Disaster Information
    • /
    • v.17 no.2
    • /
    • pp.254-265
    • /
    • 2021
  • Purpose: Inspection and diagnosis on the performance and safety through domestic port facilities have been conducted for over 20 years. However, the long-term development strategies and directions for facility renewal and performance improvement using the diagnosis history and results are not working in realistically. In particular, in the case of port structures with a long service life, there are many problems in terms of safety and functionality due to increasing of the large-sized ships, of port use frequency, and the effects of natural disasters due to climate change. Method: In this study, the maintenance history data of the gravity type quay in element level were collected, defined as big data, and a predictive approximation model was derived to estimate the pattern of deterioration and aging of the facility of project level based on the data. In particular, we compared and proposed models suitable for the use of big data by examining the validity of the state-based deterioration pattern and deterioration approximation model generated through machine learning algorithms of GP and SGP techniques. Result: As a result of reviewing the suitability of the proposed technique, it was considered that the RMSE and R2 in GP technique were 0.9854 and 0.0721, and the SGP technique was 0.7246 and 0.2518. Conclusion: This research through machine learning techniques is expected to play an important role in decision-making on investment in port facilities in the future if port facility data collection is continuously performed in the future.

A Study on the Classification of Unstructured Data through Morpheme Analysis

  • Kim, SungJin;Choi, NakJin;Lee, JunDong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.105-112
    • /
    • 2021
  • In the era of big data, interest in data is exploding. In particular, the development of the Internet and social media has led to the creation of new data, enabling the realization of the era of big data and artificial intelligence and opening a new chapter in convergence technology. Also, in the past, there are many demands for analysis of data that could not be handled by programs. In this paper, an analysis model was designed and verified for classification of unstructured data, which is often required in the era of big data. Data crawled DBPia's thesis summary, main words, and sub-keyword, and created a database using KoNLP's data dictionary, and tokenized words through morpheme analysis. In addition, nouns were extracted using KAIST's 9 part-of-speech classification system, TF-IDF values were generated, and an analysis dataset was created by combining training data and Y values. Finally, The adequacy of classification was measured by applying three analysis algorithms(random forest, SVM, decision tree) to the generated analysis dataset. The classification model technique proposed in this paper can be usefully used in various fields such as civil complaint classification analysis and text-related analysis in addition to thesis classification.

Guidelines for big data projects in artificial intelligence mathematics education (인공지능 수학 교육을 위한 빅데이터 프로젝트 과제 가이드라인)

  • Lee, Junghwa;Han, Chaereen;Lim, Woong
    • The Mathematical Education
    • /
    • v.62 no.2
    • /
    • pp.289-302
    • /
    • 2023
  • In today's digital information society, student knowledge and skills to analyze big data and make informed decisions have become an important goal of school mathematics. Integrating big data statistical projects with digital technologies in high school <Artificial Intelligence> mathematics courses has the potential to provide students with a learning experience of high impact that can develop these essential skills. This paper proposes a set of guidelines for designing effective big data statistical project-based tasks and evaluates the tasks in the artificial intelligence mathematics textbook against these criteria. The proposed guidelines recommend that projects should: (1) align knowledge and skills with the national school mathematics curriculum; (2) use preprocessed massive datasets; (3) employ data scientists' problem-solving methods; (4) encourage decision-making; (5) leverage technological tools; and (6) promote collaborative learning. The findings indicate that few textbooks fully align with these guidelines, with most failing to incorporate elements corresponding to Guideline 2 in their project tasks. In addition, most tasks in the textbooks overlook or omit data preprocessing, either by using smaller datasets or by using big data without any form of preprocessing. This can potentially result in misconceptions among students regarding the nature of big data. Furthermore, this paper discusses the relevant mathematical knowledge and skills necessary for artificial intelligence, as well as the potential benefits and pedagogical considerations associated with integrating technology into big data tasks. This research sheds light on teaching mathematical concepts with machine learning algorithms and the effective use of technology tools in big data education.

Data analysis by Integrating statistics and visualization: Visual verification for the prediction model (통계와 시각화를 결합한 데이터 분석: 예측모형 대한 시각화 검증)

  • Mun, Seong Min;Lee, Kyung Won
    • Design Convergence Study
    • /
    • v.15 no.6
    • /
    • pp.195-214
    • /
    • 2016
  • Predictive analysis is based on a probabilistic learning algorithm called pattern recognition or machine learning. Therefore, if users want to extract more information from the data, they are required high statistical knowledge. In addition, it is difficult to find out data pattern and characteristics of the data. This study conducted statistical data analyses and visual data analyses to supplement prediction analysis's weakness. Through this study, we could find some implications that haven't been found in the previous studies. First, we could find data pattern when adjust data selection according as splitting criteria for the decision tree method. Second, we could find what type of data included in the final prediction model. We found some implications that haven't been found in the previous studies from the results of statistical and visual analyses. In statistical analysis we found relation among the multivariable and deducted prediction model to predict high box office performance. In visualization analysis we proposed visual analysis method with various interactive functions. Finally through this study we verified final prediction model and suggested analysis method extract variety of information from the data.

Developing Library Tour Course Recommendation Model based on a Traveler Persona: Focused on facilities and routes for library trips in J City (여행자 페르소나 기반 도서관 여행 코스 추천 모델 개발 - J시 도서관 여행을 위한 시설 및 동선 중심으로 -)

  • Suhyeon Lee;Hyunsoo Kim;Jiwon Baek;Hyo-Jung Oh
    • Journal of Korean Library and Information Science Society
    • /
    • v.54 no.2
    • /
    • pp.23-42
    • /
    • 2023
  • The library tour program is a new type of cultural program that was first introduced and operated by J City, and library tourists travel to specialized libraries in the city according to a set course and experience various experiences. This study aims to build a customized course recommendation model that considers the characteristics of individual participants in addition to the existing fixed group travel format so that more users can enjoy the opportunity to participate in library tours. To this end, the characteristics of library travelers were categorized to establish traveler personas, and library evaluation items and evaluation criteria were established accordingly. We selected 22 libraries targeted by the library travel program and measured library data through actual visits. Based on the collected data, we derived the characteristics of suitable libraries and developed a persona-based library tour course recommendation model using a decision tree algorithm. To demonstrate the feasibility of the proposed recommendation model, we build a mobile application mockup, and conducted user evaluations with actual library users to identify satisfaction and improvements to the developed model.

Forecasting the Busan Container Volume Using XGBoost Approach based on Machine Learning Model (기계 학습 모델을 통해 XGBoost 기법을 활용한 부산 컨테이너 물동량 예측)

  • Nguyen Thi Phuong Thanh;Gyu Sung Cho
    • Journal of Internet of Things and Convergence
    • /
    • v.10 no.1
    • /
    • pp.39-45
    • /
    • 2024
  • Container volume is a very important factor in accurate evaluation of port performance, and accurate prediction of effective port development and operation strategies is essential. However, it is difficult to improve the accuracy of container volume prediction due to rapid changes in the marine industry. To solve this problem, it is necessary to analyze the impact on port performance using the Internet of Things (IoT) and apply it to improve the competitiveness and efficiency of Busan Port. Therefore, this study aims to develop a prediction model for predicting the future container volume of Busan Port, and through this, focuses on improving port productivity and making improved decision-making by port management agencies. In order to predict port container volume, this study introduced the Extreme Gradient Boosting (XGBoost) technique of a machine learning model. XGBoost stands out of its higher accuracy, faster learning and prediction than other algorithms, preventing overfitting, along with providing Feature Importance. Especially, XGBoost can be used directly for regression predictive modelling, which helps improve the accuracy of the volume prediction model presented in previous studies. Through this, this study can accurately and reliably predict container volume by the proposed method with a 4.3% MAPE (Mean absolute percentage error) value, highlighting its high forecasting accuracy. It is believed that the accuracy of Busan container volume can be increased through the methodology presented in this study.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

Individual Thinking Style leads its Emotional Perception: Development of Web-style Design Evaluation Model and Recommendation Algorithm Depending on Consumer Regulatory Focus (사고가 시각을 바꾼다: 조절 초점에 따른 소비자 감성 기반 웹 스타일 평가 모형 및 추천 알고리즘 개발)

  • Kim, Keon-Woo;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.171-196
    • /
    • 2018
  • With the development of the web, two-way communication and evaluation became possible and marketing paradigms shifted. In order to meet the needs of consumers, web design trends are continuously responding to consumer feedback. As the web becomes more and more important, both academics and businesses are studying consumer emotions and satisfaction on the web. However, some consumer characteristics are not well considered. Demographic characteristics such as age and sex have been studied extensively, but few studies consider psychological characteristics such as regulatory focus (i.e., emotional regulation). In this study, we analyze the effect of web style on consumer emotion. Many studies analyze the relationship between the web and regulatory focus, but most concentrate on the purpose of web use, particularly motivation and information search, rather than on web style and design. The web communicates with users through visual elements. Because the human brain is influenced by all five senses, both design factors and emotional responses are important in the web environment. Therefore, in this study, we examine the relationship between consumer emotion and satisfaction and web style and design. Previous studies have considered the effects of web layout, structure, and color on emotions. In this study, however, we excluded these web components, in contrast to earlier studies, and analyzed the relationship between consumer satisfaction and emotional indexes of web-style only. To perform this analysis, we collected consumer surveys presenting 40 web style themes to 204 consumers. Each consumer evaluated four themes. The emotional adjectives evaluated by consumers were composed of 18 contrast pairs, and the upper emotional indexes were extracted through factor analysis. The emotional indexes were 'softness,' 'modernity,' 'clearness,' and 'jam.' Hypotheses were established based on the assumption that emotional indexes have different effects on consumer satisfaction. After the analysis, hypotheses 1, 2, and 3 were accepted and hypothesis 4 was rejected. While hypothesis 4 was rejected, its effect on consumer satisfaction was negative, not positive. This means that emotional indexes such as 'softness,' 'modernity,' and 'clearness' have a positive effect on consumer satisfaction. In other words, consumers prefer emotions that are soft, emotional, natural, rounded, dynamic, modern, elaborate, unique, bright, pure, and clear. 'Jam' has a negative effect on consumer satisfaction. It means, consumer prefer the emotion which is empty, plain, and simple. Regulatory focus shows differences in motivation and propensity in various domains. It is important to consider organizational behavior and decision making according to the regulatory focus tendency, and it affects not only political, cultural, ethical judgments and behavior but also broad psychological problems. Regulatory focus also differs from emotional response. Promotion focus responds more strongly to positive emotional responses. On the other hand, prevention focus has a strong response to negative emotions. Web style is a type of service, and consumer satisfaction is affected not only by cognitive evaluation but also by emotion. This emotional response depends on whether the consumer will benefit or harm himself. Therefore, it is necessary to confirm the difference of the consumer's emotional response according to the regulatory focus which is one of the characteristics and viewpoint of the consumers about the web style. After MMR analysis result, hypothesis 5.3 was accepted, and hypothesis 5.4 was rejected. But hypothesis 5.4 supported in the opposite direction to the hypothesis. After validation, we confirmed the mechanism of emotional response according to the tendency of regulatory focus. Using the results, we developed the structure of web-style recommendation system and recommend methods through regulatory focus. We classified the regulatory focus group in to three categories that promotion, grey, prevention. Then, we suggest web-style recommend method along the group. If we further develop this study, we expect that the existing regulatory focus theory can be extended not only to the motivational part but also to the emotional behavioral response according to the regulatory focus tendency. Moreover, we believe that it is possible to recommend web-style according to regulatory focus and emotional desire which consumers most prefer.