• Title/Summary/Keyword: 데이터 불균형 문제

Search Result 224, Processing Time 0.023 seconds

Identifying Travel Satisfaction in Mega Commuting Trip Using Rasch Modelling (Rasch 모형을 적용한 광역교통서비스의 서비스 수준 평가 분석)

  • On, Seojun;Kim, Suji;Jang, Kitae;Kim, Junghwa
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.5
    • /
    • pp.639-650
    • /
    • 2023
  • Economic development has resulted in the concentration of population and industry in the metropolitan area. Additionally, the Republic of Korea is experiencing this phenomenon, with more than half of the population living in the Seoul capital area. To alleviate this concentration of population, the Korean government implemented the new town development policy. Unfortunately, this has led to an increase in the commuting population, causing an imbalance in transportation services due to financial and policy differences in each region. This paper analyzes the level of user satisfaction with mega commuting in three aspects: mobility, accessibility, and connectivity. To objectively assess the level of user satisfaction, which is qualitative data, the Rasch Model is used to analyze the collinearity of user data. The results indicate that the level of user satisfaction differs by region, and service satisfaction with mobility is lower than that with accessibility and connectivity. Therefore, prior to the introduction of new town policies, it is necessary to develop metropolitan transportation infrastructure.

Ensemble Learning-Based Prediction of Good Sellers in Overseas Sales of Domestic Books and Keyword Analysis of Reviews of the Good Sellers (앙상블 학습 기반 국내 도서의 해외 판매 굿셀러 예측 및 굿셀러 리뷰 키워드 분석)

  • Do Young Kim;Na Yeon Kim;Hyon Hee Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.4
    • /
    • pp.173-178
    • /
    • 2023
  • As Korean literature spreads around the world, its position in the overseas publishing market has become important. As demand in the overseas publishing market continues to grow, it is essential to predict future book sales and analyze the characteristics of books that have been highly favored by overseas readers in the past. In this study, we proposed ensemble learning based prediction model and analyzed characteristics of the cumulative sales of more than 5,000 copies classified as good sellers published overseas over the past 5 years. We applied the five ensemble learning models, i.e., XGBoost, Gradient Boosting, Adaboost, LightGBM, and Random Forest, and compared them with other machine learning algorithms, i.e., Support Vector Machine, Logistic Regression, and Deep Learning. Our experimental results showed that the ensemble algorithm outperforms other approaches in troubleshooting imbalanced data. In particular, the LightGBM model obtained an AUC value of 99.86% which is the best prediction performance. Among the features used for prediction, the most important feature is the author's number of overseas publications, and the second important feature is publication in countries with the largest publication market size. The number of evaluation participants is also an important feature. In addition, text mining was performed on the four book reviews that sold the most among good-selling books. Many reviews were interested in stories, characters, and writers and it seems that support for translation is needed as many of the keywords of "translation" appear in low-rated reviews.

Energy-Efficient Routing Protocol based on Interference Awareness for Transmission of Delay-Sensitive Data in Multi-Hop RF Energy Harvesting Networks (다중 홉 RF 에너지 하베스팅 네트워크에서 지연에 민감한 데이터 전송을 위한 간섭 인지 기반 에너지 효율적인 라우팅 프로토콜)

  • Kim, Hyun-Tae;Ra, In-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.3
    • /
    • pp.611-625
    • /
    • 2018
  • With innovative advances in wireless communication technology, many researches for extending network lifetime in maximum by using energy harvesting have been actively performed on the area of network resource optimization, QoS-guaranteed transmission, energy-intelligent routing and etc. As known well, it is very hard to guarantee end-to-end network delay due to uncertainty of the amount of harvested energy in multi-hop RF(radio frequency) energy harvesting wireless networks. To minimize end-to-end delay in multi-hop RF energy harvesting networks, this paper proposes an energy efficient routing metric based on interference aware and protocol which takes account of various delays caused by co-channel interference, energy harvesting time and queuing in a relay node. The proposed method maximizes end-to-end throughput by performing avoidance of packet congestion causing load unbalance, reduction of waiting time due to exhaustion of energy and restraint of delay time from co-channel interference. Finally simulation results using ns-3 simulator show that the proposed method outperforms existing methods in respect of throughput, end-to-end delay and energy consumption.

A Customized Healthy Menu Recommendation Method Using Content-Based and Food Substitution Table (내용 기반 및 식품 교환 표를 이용한 맞춤형 건강식단 추천 기법)

  • Oh, Yoori;Kim, Yoonhee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.3
    • /
    • pp.161-166
    • /
    • 2017
  • In recent times, many people have problems of nutritional imbalance; lack or surplus intake of a specific nutrient despite the variety of available foods. Accordingly, the interest in health and diet issues has increased leading to the emergence of various mobile applications. However, most mobile applications only record the user's diet history and show simple statistics and usually provide only general information for healthy diet. It is necessary for users interested in healthy eating to be provided recommendation services reflecting their food interest and providing customized information. Hence, we propose a menu recommendation method which includes calculating the recommended calorie amount based on the user's physical and activity profile to assign to each food group a substitution unit. In addition, our method also analyzes the user's food preferences using food intake history. Thus it satisfies recommended intake unit for each food group by exchanging the user's preferred foods. Also, the excellence of our proposed algorithm is demonstrated through the calculation of precision, recall, health index and the harmonic average of the 3 aforementioned measures. We compare it to another method which considers user's interest and recommended substitution unit. The proposed method provides menu recommendation reflecting interest and personalized health status by which user can improve and maintain a healthy dietary habit.

Improvement Issues of Personal Information Protection Laws through Meta-Analysis (메타분석을 통한 개인정보보호법의 개선과제)

  • Cho, Myunggeun;Lee, Hwansoo
    • Journal of Digital Convergence
    • /
    • v.15 no.9
    • /
    • pp.1-14
    • /
    • 2017
  • As we enter the era of big data, the value of personal information is becoming ever more important. However, personal information protection laws in Korea have several issues. Furthermore, existing research are limited in their ability to facilitate a comprehensive understanding of measures to improve personal information protection laws. Accordingly, this study analyzes improvements to be made in the current personal information protection laws based on existing research. A total of 39 research articles discussing the problems of the personal information protection law were selected and analyzed by applying the meta - analysis technique. According to the results, the various issues such as the meaning and scope of personal information, the role and obligations of relevant parties, provision of personal information to third parties, and redundant and imbalanced regulations in special acts in each field. that exist in the current personal information protection laws were confirmed. This study contributes to the improvement of inconsistency between information protection laws and related special laws in each field in practice. Academically, it will contribute to understanding the problems of th law from the macro perspective and suggesting the integrated improvement ways of the law.

Design and Implementation of an Intelligent Medical Expert System for TMA(Tissue Mineral Analysis) (TMA 분석을 위한 지능적 의학 전문가 시스템의 설계 및 구현)

  • 조영임;한근식
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.2
    • /
    • pp.137-152
    • /
    • 2004
  • Assesment of 30 nutritional minerals and 8 toxic elements in hair are very important not only for determining adequacy, deficiencies and unbalance, but also for assessing their relative relationships in the body. A test has been developed that serves this purpose exceedingly well. This test is known as tissue mineral analysis(TMA). TMA is very popular method in hair mineral analysis for health care professionals in over 46 countries' medical center. However, there are some problems. First, they do not have database which is suitable for korean to do analyze. Second, as the TMA results from TEI-USA is composed of english documents and graphic files prohibited to open, its usability is very low. Third, some of them has low level database which is related to TMA, so hairs are sent to TEI-USA for analyzing and medical services. it bring about an severe outflow of dollars. Finally, TMA results are based on the database of american health and mineral standards, it is possibly mislead korean mineral standards. The purposes of this research is to develope the first Intelligent Medical Expert System(IMES) of TMA, in Korea, which makes clear the problems mentioned earlier IMES can analyze the tissue mineral data with multiple stage decision tree classifier. It is also constructed with multiple fuzzy rule base and hence analyze the complex data from Korean database by fuzzy inference methods. Pilot test of this systems are increased of business efficiency and business satisfaction 86% and 92% respectively.

Development of Index of Park Derivation to Promote Inclusive Living SOC Policy (포용적 생활 SOC 정책 추진을 위한 공원결핍지수 개발 연구)

  • Kim, Yong-Gook
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.47 no.5
    • /
    • pp.28-40
    • /
    • 2019
  • In order to resolve the imbalances in the supply of living SOCs according to socio-economic status, location, and population groups, the discussions on inclusive city policies are expanding. The purpose of this study is to propose an Index of Park Derivation (IPD) as an alternative indicator for the promotion of an inclusive urban park policy that can be applied in the 7 major metropolitan cities to select a region with a relatively high park needs. The main research results are as follows. First, the concept of an inclusive urban park policy is defined as "a policy to supply to manage high-quality park services with priority given to areas with low socio-economic and environmental status, such as a large amount of elderly, children, low-income families, areas vulnerable to disasters, such as heat and fine dust, and population groups." Second, we developed the index of park derivation (IPD), which is a combination of 17 variables including park service level, demographic characteristics, economic and educational level, health level, and environmental vulnerability. The variables that constitute the index of park deprivation (IPD) can be applied to SOC policies outside the parks, such as sports facilities, daycare centers, kindergartens, and public libraries. Third, applying index of park deprivation (IPD) to 1,148 Eup/Myeon/dong areas of the 7 metropolitan cities resulted in areas with relatively high park service needs. This study implies that the central and the local government suggest an alternative index to promote an inclusive urban park policy based on statistical and geographical information and data that can be easily accessed and utilized.

Bike Insurance Fraud Detection Model Using Balanced Randomforest Algorithm (균형 랜덤 포레스트를 이용한 이륜차 보험사기 적발 모형 개발)

  • Kim, Seunghoon;Lee, Soo Il;Kim, Tae ho
    • Journal of Digital Convergence
    • /
    • v.20 no.2
    • /
    • pp.241-250
    • /
    • 2022
  • Due to the COVID-19 pandemic, with increased 'untact' services and with unstable household economy, the bike insurance fraud is expected to surge. Moreover, the fraud methodology gets complicated. However, the fraud detection model for bike insurance is absent. we deal with the issue of skewed class distribution and reflect the criterion of fraud detection expert. We utilize a balanced random-forest algorithm to develop an efficient bike insurance fraud detection model. As a result, while the predictive performance of balanced random-forest model is superior than it of non-balanced model. There is no significant difference between the variables used by the experts and the confirmatory models. The important variables to detect frauds are turned out to be age and gender of driver, correspondence between insured and driver, the amount of self-repairing claim, and the amount of bodily injury liability.

Influencing Factors Analysis for the Number of Participants in Public Contracts Using Big Data (빅데이터를 활용한 공공계약의 입찰참가자수 영향요인 분석)

  • Choi, Tae-Hong;Lee, Kyung-Hee;Cho, Wan-Sup
    • The Journal of Bigdata
    • /
    • v.3 no.2
    • /
    • pp.87-99
    • /
    • 2018
  • This study analyze the factors affecting the number of bidders in public contracts by collecting contract data such as purchase of goods, service and facility construction through KONEPS among various forms of public contracts. The reason why the number of bidders is important in public contracts is that it can be a minimum criterion for judging whether to enter into a rational contract through fair competition and is closely related to the budget reduction of the ordering organization or the profitability of the bidders. The purpose of this study is to analyze the factors that determine the participation of bidders in public contracts and to present the problems and policy implications of bidders' participation in public contracts. This research distinguishes the existing sampling based research by analyzing and analyzing many contracts such as purchasing, service and facility construction of 4.35 million items in which 50,000 public institutions have been placed as national markets and 300,000 individual companies and corporations participated. As a research model, the number of announcement days, budget amount, contract method and winning bid is used as independent variables and the number of bidders is used as a dependent variable. Big data and multidimensional analysis techniques are used for survey analysis. The conclusions are as follows: First, the larger the budget amount of public works projects, the smaller the number of participants. Second, in the contract method, restricted competition has more participants than general competition. Third, the duration of bidding notice did not significantly affect the number of bidders. Fourth, in the winning bid method, the qualification examination bidding system has more bidders than the lowest bidding system.

Sorghum Field Segmentation with U-Net from UAV RGB (무인기 기반 RGB 영상 활용 U-Net을 이용한 수수 재배지 분할)

  • Kisu Park;Chanseok Ryu ;Yeseong Kang;Eunri Kim;Jongchan Jeong;Jinki Park
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_1
    • /
    • pp.521-535
    • /
    • 2023
  • When converting rice fields into fields,sorghum (sorghum bicolor L. Moench) has excellent moisture resistance, enabling stable production along with soybeans. Therefore, it is a crop that is expected to improve the self-sufficiency rate of domestic food crops and solve the rice supply-demand imbalance problem. However, there is a lack of fundamental statistics,such as cultivation fields required for estimating yields, due to the traditional survey method, which takes a long time even with a large manpower. In this study, U-Net was applied to RGB images based on unmanned aerial vehicle to confirm the possibility of non-destructive segmentation of sorghum cultivation fields. RGB images were acquired on July 28, August 13, and August 25, 2022. On each image acquisition date, datasets were divided into 6,000 training datasets and 1,000 validation datasets with a size of 512 × 512 images. Classification models were developed based on three classes consisting of Sorghum fields(sorghum), rice and soybean fields(others), and non-agricultural fields(background), and two classes consisting of sorghum and non-sorghum (others+background). The classification accuracy of sorghum cultivation fields was higher than 0.91 in the three class-based models at all acquisition dates, but learning confusion occurred in the other classes in the August dataset. In contrast, the two-class-based model showed an accuracy of 0.95 or better in all classes, with stable learning on the August dataset. As a result, two class-based models in August will be advantageous for calculating the cultivation fields of sorghum.