• Title/Summary/Keyword: machine learning

Search Result 5,305, Processing Time 0.035 seconds

Development of prediction model identifying high-risk older persons in need of long-term care (장기요양 필요 발생의 고위험 대상자 발굴을 위한 예측모형 개발)

  • Song, Mi Kyung;Park, Yeongwoo;Han, Eun-Jeong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.4
    • /
    • pp.457-468
    • /
    • 2022
  • In aged society, it is important to prevent older people from being disability needing long-term care. The purpose of this study is to develop a prediction model to discover high-risk groups who are likely to be beneficiaries of Long-Term Care Insurance. This study is a retrospective study using database of National Health Insurance Service (NHIS) collected in the past of the study subjects. The study subjects are 7,724,101, the population over 65 years of age registered for medical insurance. To develop the prediction model, we used logistic regression, decision tree, random forest, and multi-layer perceptron neural network. Finally, random forest was selected as the prediction model based on the performances of models obtained through internal and external validation. Random forest could predict about 90% of the older people in need of long-term care using DB without any information from the assessment of eligibility for long-term care. The findings might be useful in evidencebased health management for prevention services and can contribute to preemptively discovering those who need preventive services in older people.

Spatial Gap-filling of GK-2A/AMI Hourly AOD Products Using Meteorological Data and Machine Learning (기상모델자료와 기계학습을 이용한 GK-2A/AMI Hourly AOD 산출물의 결측화소 복원)

  • Youn, Youjeong;Kang, Jonggu;Kim, Geunah;Park, Ganghyun;Choi, Soyeon;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_3
    • /
    • pp.953-966
    • /
    • 2022
  • Since aerosols adversely affect human health, such as deteriorating air quality, quantitative observation of the distribution and characteristics of aerosols is essential. Recently, satellite-based Aerosol Optical Depth (AOD) data is used in various studies as periodic and quantitative information acquisition means on the global scale, but optical sensor-based satellite AOD images are missing in some areas with cloud conditions. In this study, we produced gap-free GeoKompsat 2A (GK-2A) Advanced Meteorological Imager (AMI) AOD hourly images after generating a Random Forest based gap-filling model using grid meteorological and geographic elements as input variables. The accuracy of the model is Mean Bias Error (MBE) of -0.002 and Root Mean Square Error (RMSE) of 0.145, which is higher than the target accuracy of the original data and considering that the target object is an atmospheric variable with Correlation Coefficient (CC) of 0.714, it is a model with sufficient explanatory power. The high temporal resolution of geostationary satellites is suitable for diurnal variation observation and is an important model for other research such as input for atmospheric correction, estimation of ground PM, analysis of small fires or pollutants.

Analysis of public opinion in the 20th presidential election using YouTube data (유튜브 데이터를 활용한 20대 대선 여론분석)

  • Kang, Eunkyung;Yang, Seonuk;Kwon, Jiyoon;Yang, Sung-Byung
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.161-183
    • /
    • 2022
  • Opinion polls have become a powerful means for election campaigns and one of the most important subjects in the media in that they predict the actual election results and influence people's voting behavior. However, the more active the polls, the more often they fail to properly reflect the voters' minds in measuring the effectiveness of election campaigns, such as repeatedly conducting polls on the likelihood of winning or support rather than verifying the pledges and policies of candidates. Even if the poor predictions of the election results of the polls have undermined the authority of the press, people cannot easily let go of their interest in polls because there is no clear alternative to answer the instinctive question of which candidate will ultimately win. In this regard, we attempt to retrospectively grasp public opinion on the 20th presidential election by applying the 'YouTube Analysis' function of Sometrend, which provides an environment for discovering insights through online big data. Through this study, it is confirmed that a result close to the actual public opinion (or opinion poll results) can be easily derived with simple YouTube data results, and a high-performance public opinion prediction model can be built.

Association Rules Analysis Between the Types and Causes of Disputes in Construction Projects (연관규칙 분석을 통한 건설공사 분쟁유형과 분쟁원인의 연관성 분석에 관한 연구)

  • Jang, Se Rim;Kim, Han Soo
    • Korean Journal of Construction Engineering and Management
    • /
    • v.23 no.5
    • /
    • pp.3-14
    • /
    • 2022
  • Construction projects have high potentials of claims among a variety of stakeholders. Claims on their own are not disputes but they have high potentials leading to disputes if agreements are not made between parties due to conflicting opinions. In the event of the construction disputes between clients and contractors, it could give negative impacts to both parties and, to minimize or pro-actively manage construction disputes, the role of clients is more significant. The objective of the study is to analyze a level of associations between the types of disputes and causes of construction projects based on the association rule analysis, and to identify and discuss key characteristics and implications from client's perspectives. The study analyzes associations between the types of disputes and causes, and also identifies those with a high level of associations. It also presents the outcomes of more systematic analysis compared to descriptive statistics just based on frequencies. Through the analysis of the data cases, the study proposes the directions to resolve the causes of disputes from client's perspectives. It can assist to improve understandings of the relationships between the types of disputes and causes and to pro-actively manage the disputes of construction projects.

A Ship-Wake Joint Detection Using Sentinel-2 Imagery

  • Woojin, Jeon;Donghyun, Jin;Noh-hun, Seong;Daeseong, Jung;Suyoung, Sim;Jongho, Woo;Yugyeong, Byeon;Nayeon, Kim;Kyung-Soo, Han
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.1
    • /
    • pp.77-86
    • /
    • 2023
  • Ship detection is widely used in areas such as maritime security, maritime traffic, fisheries management, illegal fishing, and border control, and ship detection is important for rapid response and damage minimization as ship accident rates increase due to recent increases in international maritime traffic. Currently, according to a number of global and national regulations, ships must be equipped with automatic identification system (AIS), which provide information such as the location and speed of the ship periodically at regular intervals. However, most small vessels (less than 300 tons) are not obligated to install the transponder and may not be transmitted intentionally or accidentally. There is even a case of misuse of the ship'slocation information. Therefore, in this study, ship detection was performed using high-resolution optical satellite images that can periodically remotely detect a wide range and detectsmallships. However, optical images can cause false-alarm due to noise on the surface of the sea, such as waves, or factors indicating ship-like brightness, such as clouds and wakes. So, it is important to remove these factors to improve the accuracy of ship detection. In this study, false alarm wasreduced, and the accuracy ofship detection wasimproved by removing wake.As a ship detection method, ship detection was performed using machine learning-based random forest (RF), and convolutional neural network (CNN) techniquesthat have been widely used in object detection fieldsrecently, and ship detection results by the model were compared and analyzed. In addition, in this study, the results of RF and CNN were combined to improve the phenomenon of ship disconnection and the phenomenon of small detection. The ship detection results of thisstudy are significant in that they improved the limitations of each model while maintaining accuracy. In addition, if satellite images with improved spatial resolution are utilized in the future, it is expected that ship and wake simultaneous detection with higher accuracy will be performed.

Development of technology to predict the impact of urban inundation due to climate change on urban transportation networks (기후변화에 따른 도시침수가 도시교통네트워크에 미치는 영향 예측 기술 개발)

  • Jeung, Se Jin;Hur, Dasom;Kim, Byung Sik
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.12
    • /
    • pp.1091-1104
    • /
    • 2022
  • Climate change is predicted to increase the frequency and intensity of rainfall worldwide, and the pattern is changing due to inundation damage in urban areas due to rapid urbanization and industrialization. Accordingly, the impact assessment of climate change is mentioned as a very important factor in urban planning, and the World Meteorological Organization (WMO) is emphasizing the need for an impact forecast that considers the social and economic impacts that may arise from meteorological phenomena. In particular, in terms of traffic, the degradation of transport systems due to urban flooding is the most detrimental factor to society and is estimated to be around £100k per hour per major road affected. However, in the case of Korea, even if accurate forecasts and special warnings on the occurrence of meteorological disasters are currently provided, the effects are not properly conveyed. Therefore, in this study, high-resolution analysis and hydrological factors of each area are reflected in order to suggest the depth of flooding of urban floods and to cope with the damage that may affect vehicles, and the degree of flooding caused by rainfall and its effect on vehicle operation are investigated. decided it was necessary. Therefore, the calculation formula of rainfall-immersion depth-vehicle speed is presented using various machine learning techniques rather than simple linear regression. In addition, by applying the climate change scenario to the rainfall-inundation depth-vehicle speed calculation formula, it predicts the flooding of urban rivers during heavy rain, and evaluates possible traffic network disturbances due to road inundation considering the impact of future climate change. We want to develop technology for use in traffic flow planning.

Imputation of Missing SST Observation Data Using Multivariate Bidirectional RNN (다변수 Bidirectional RNN을 이용한 표층수온 결측 데이터 보간)

  • Shin, YongTak;Kim, Dong-Hoon;Kim, Hyeon-Jae;Lim, Chaewook;Woo, Seung-Buhm
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.34 no.4
    • /
    • pp.109-118
    • /
    • 2022
  • The data of the missing section among the vertex surface sea temperature observation data was imputed using the Bidirectional Recurrent Neural Network(BiRNN). Among artificial intelligence techniques, Recurrent Neural Networks (RNNs), which are commonly used for time series data, only estimate in the direction of time flow or in the reverse direction to the missing estimation position, so the estimation performance is poor in the long-term missing section. On the other hand, in this study, estimation performance can be improved even for long-term missing data by estimating in both directions before and after the missing section. Also, by using all available data around the observation point (sea surface temperature, temperature, wind field, atmospheric pressure, humidity), the imputation performance was further improved by estimating the imputation data from these correlations together. For performance verification, a statistical model, Multivariate Imputation by Chained Equations (MICE), a machine learning-based Random Forest model, and an RNN model using Long Short-Term Memory (LSTM) were compared. For imputation of long-term missing for 7 days, the average accuracy of the BiRNN/statistical models is 70.8%/61.2%, respectively, and the average error is 0.28 degrees/0.44 degrees, respectively, so the BiRNN model performs better than other models. By applying a temporal decay factor representing the missing pattern, it is judged that the BiRNN technique has better imputation performance than the existing method as the missing section becomes longer.

Development of Block-based Code Generation and Recommendation Model Using Natural Language Processing Model (자연어 처리 모델을 활용한 블록 코드 생성 및 추천 모델 개발)

  • Jeon, In-seong;Song, Ki-Sang
    • Journal of The Korean Association of Information Education
    • /
    • v.26 no.3
    • /
    • pp.197-207
    • /
    • 2022
  • In this paper, we develop a machine learning based block code generation and recommendation model for the purpose of reducing cognitive load of learners during coding education that learns the learner's block that has been made in the block programming environment using natural processing model and fine-tuning and then generates and recommends the selectable blocks for the next step. To develop the model, the training dataset was produced by pre-processing 50 block codes that were on the popular block programming language web site 'Entry'. Also, after dividing the pre-processed blocks into training dataset, verification dataset and test dataset, we developed a model that generates block codes based on LSTM, Seq2Seq, and GPT-2 model. In the results of the performance evaluation of the developed model, GPT-2 showed a higher performance than the LSTM and Seq2Seq model in the BLEU and ROUGE scores which measure sentence similarity. The data results generated through the GPT-2 model, show that the performance was relatively similar in the BLEU and ROUGE scores except for the case where the number of blocks was 1 or 17.

Wafer bin map failure pattern recognition using hierarchical clustering (계층적 군집분석을 이용한 반도체 웨이퍼의 불량 및 불량 패턴 탐지)

  • Jeong, Joowon;Jung, Yoonsuh
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.3
    • /
    • pp.407-419
    • /
    • 2022
  • The semiconductor fabrication process is complex and time-consuming. There are sometimes errors in the process, which results in defective die on the wafer bin map (WBM). We can detect the faulty WBM by finding some patterns caused by dies. When one manually seeks the failure on WBM, it takes a long time due to the enormous number of WBMs. We suggest a two-step approach to discover the probable pattern on the WBMs in this paper. The first step is to separate the normal WBMs from the defective WBMs. We adapt a hierarchical clustering for de-noising, which nicely performs this work by wisely tuning the number of minimum points and the cutting height. Once declared as a faulty WBM, then it moves to the next step. In the second step, we classify the patterns among the defective WBMs. For this purpose, we extract features from the WBM. Then machine learning algorithm classifies the pattern. We use a real WBM data set (WM-811K) released by Taiwan semiconductor manufacturing company.

Real-time flood prediction applying random forest regression model in urban areas (랜덤포레스트 회귀모형을 적용한 도시지역에서의 실시간 침수 예측)

  • Kim, Hyun Il;Lee, Yeon Su;Kim, Byunghyun
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.spc1
    • /
    • pp.1119-1130
    • /
    • 2021
  • Urban flooding caused by localized heavy rainfall with unstable climate is constantly occurring, but a system that can predict spatial flood information with weather forecast has not been prepared yet. The worst flood situation in urban area can be occurred with difficulties of structural measures such as river levees, discharge capacity of urban sewage, storage basin of storm water, and pump facilities. However, identifying in advance the spatial flood information can have a decisive effect on minimizing flood damage. Therefore, this study presents a methodology that can predict the urban flood map in real-time by using rainfall data of the Korea Meteorological Administration (KMA), the results of two-dimensional flood analysis and random forest (RF) regression model. The Ujeong district in Ulsan metropolitan city, which the flood is frequently occurred, was selected for the study area. The RF regression model predicted the flood map corresponding to the 50 mm, 80 mm, and 110 mm rainfall events with 6-hours duration. And, the predicted results showed 63%, 80%, and 67% goodness of fit compared to the results of two-dimensional flood analysis model. It is judged that the suggested results of this study can be utilized as basic data for evacuation and response to urban flooding that occurs suddenly.