• Title/Summary/Keyword: Machine Learning

Search Result 5,378, Processing Time 0.028 seconds

Classification Modeling for Predicting Medical Subjects using Patients' Subjective Symptom Text (환자의 주관적 증상 텍스트에 대한 진료과목 분류 모델 구축)

  • Lee, Seohee;Kang, Juyoung
    • The Journal of Bigdata
    • /
    • v.6 no.1
    • /
    • pp.51-62
    • /
    • 2021
  • In the field of medical artificial intelligence, there have been a lot of researches on disease prediction and classification algorithms that can help doctors judge, but relatively less interested in artificial intelligence that can help medical consumers acquire and judge information. The fact that more than 150,000 questions have been asked about which hospital to go over the past year in NAVER portal will be a testament to the need to provide medical information suitable for medical consumers. Therefore, in this study, we wanted to establish a classification model that classifies 8 medical subjects for symptom text directly described by patients which was collected from NAVER portal to help consumers choose appropriate medical subjects for their symptoms. In order to ensure the validity of the data involving patients' subject matter, we conducted similarity measurements between objective symptom text (typical symptoms by medical subjects organized by the Seoul Emergency Medical Information Center) and subjective symptoms (NAVER data). Similarity measurements demonstrated that if the two texts were symptoms of the same medical subject, they had relatively higher similarity than symptomatic texts from different medical subjects. Following the above procedure, the classification model was constructed using a ridge regression model for subjective symptom text that obtained validity, resulting in an accuracy of 0.73.

A Study on the Forecasting Trend of Apartment Prices: Focusing on Government Policy, Economy, Supply and Demand Characteristics (아파트 매매가 추이 예측에 관한 연구: 정부 정책, 경제, 수요·공급 속성을 중심으로)

  • Lee, Jung-Mok;Choi, Su An;Yu, Su-Han;Kim, Seonghun;Kim, Tae-Jun;Yu, Jong-Pil
    • The Journal of Bigdata
    • /
    • v.6 no.1
    • /
    • pp.91-113
    • /
    • 2021
  • Despite the influence of real estate in the Korean asset market, it is not easy to predict market trends, and among them, apartments are not easy to predict because they are both residential spaces and contain investment properties. Factors affecting apartment prices vary and regional characteristics should also be considered. This study was conducted to compare the factors and characteristics that affect apartment prices in Seoul as a whole, 3 Gangnam districts, Nowon, Dobong, Gangbuk, Geumcheon, Gwanak and Guro districts and to understand the possibility of price prediction based on this. The analysis used machine learning algorithms such as neural networks, CHAID, linear regression, and random forests. The most important factor affecting the average selling price of all apartments in Seoul was the government's policy element, and easing policies such as easing transaction regulations and easing financial regulations were highly influential. In the case of the three Gangnam districts, the policy influence was low, and in the case of Gangnam-gu District, housing supply was the most important factor. On the other hand, 6 mid-lower-level districts saw government policies act as important variables and were commonly influenced by financial regulatory policies.

A Study on the Estimation of the Threshold Rainfall in Standard Watershed Units (표준유역단위 한계강우량 산정에 관한 연구)

  • Choo, Kyung-Su;Kang, Dong-Ho;Kim, Byung-Sik
    • Journal of Korean Society of Disaster and Security
    • /
    • v.14 no.2
    • /
    • pp.1-11
    • /
    • 2021
  • Recently, in Korea, the risk of meteorological disasters is increasing due to climate change, and the damage caused by rainfall is being emphasized continuously. Although the current weather forecast provides quantitative rainfall, there are several difficulties in predicting the extent of damage. Therefore, in order to understand the impact of damage, the threshold rainfall for each watershed is required. The damage caused by rainfall occurs differently by region, and there are limitations in the analysis considering the characteristic factors of each watershed. In addition, whenever rainfall comes, the analysis of rainfall-runoff through the hydrological model consumes a lot of time and is often analyzed using only simple rainfall data. This study used GIS data and calculated the threshold rainfall from the threshold runoff causing flooding by coupling two hydrologic models. The calculation result was verified by comparing it with the actual case, and it was analyzed that damage occurred in the dangerous area in general. In the future, through this study, it will be possible to prepare for flood risk areas in advance, and it is expected that the accuracy will increase if machine learning analysis methods are added.

Variation of Seasonal Groundwater Recharge Analyzed Using Landsat-8 OLI Data and a CART Algorithm (CART알고리즘과 Landsat-8 위성영상 분석을 통한 계절별 지하수함양량 변화)

  • Park, Seunghyuk;Jeong, Gyo-Cheol
    • The Journal of Engineering Geology
    • /
    • v.31 no.3
    • /
    • pp.395-432
    • /
    • 2021
  • Groundwater recharge rates vary widely by location and with time. They are difficult to measure directly and are thus often estimated using simulations. This study employed frequency and regression analysis and a classification and regression tree (CART) algorithm in a machine learning method to estimate groundwater recharge. CART algorithms are considered for the distribution of precipitation by subbasin (PCP), geomorphological data, indices of the relationship between vegetation and landuse, and soil type. The considered geomorphological data were digital elevaion model (DEM), surface slope (SLOP), surface aspect (ASPT), and indices were the perpendicular vegetation index (PVI), normalized difference vegetation index (NDVI), normalized difference tillage index (NDTI), normalized difference residue index (NDRI). The spatio-temperal distribution of groundwater recharge in the SWAT-MOD-FLOW program, was classified as group 4, run in R, sampled for random and a model trained its groundwater recharge was predicted by CART condidering modified PVI, NDVI, NDTI, NDRI, PCP, and geomorphological data. To assess inter-rater reliability for group 4 groundwater recharge, the Kappa coefficient and overall accuracy and confusion matrix using K-fold cross-validation were calculated. The model obtained a Kappa coefficient of 0.3-0.6 and an overall accuracy of 0.5-0.7, indicating that the proposed model for estimating groundwater recharge with respect to soil type and vegetation cover is quite reliable.

Analysis and Prediction of Trends for Future Education Reform Centering on the Keyword Extraction from the Research for the Last Two Decades (미래교육 혁신을 위한 트렌드 분석과 예측: 20년간의 문헌 연구 데이터를 기반으로 한 키워드 추출 분석을 중심으로)

  • Jho, Hunkoog
    • Journal of Science Education
    • /
    • v.45 no.2
    • /
    • pp.156-171
    • /
    • 2021
  • This study aims at investigating the characteristics of trends of future education over time though the literature review and examining the accuracy of the framework for forecasting future education proposed by the previous studies by comparing the outcomes between the literature review and media articles. Thus, this study collects the articles dealing with future education searched from the Web of Science and categorized them into four periods during the new millennium. The new articles from media were selected to find out the present of education so that we can figure out the appropriateness of the proposed framework to predict the future of education. Research findings reveal that gradual tendencies of topics could not be found except teacher education and they are diverse from characteristics of agents (students and teachers) to the curriculum and pedagogical strategies. On the other hand, the results of analysis on the media articles focuses more on the projects launched by the government and the immediate responses to the COVID-19, as well as educational technologies related to big data and artificial intelligence. It is surprising that only a few key words are occupied in the latest articles from the literature review and many of them have not been discussed before. This indicates that the predictive framework is not effective to establish the long-term plan for education due to the uncertainty of educational environment, and thus this study will give some implications for developing the model to forecast the future of education.

Vulnerability Assessment for Fine Particulate Matter (PM2.5) in the Schools of the Seoul Metropolitan Area, Korea: Part I - Predicting Daily PM2.5 Concentrations (인공지능을 이용한 수도권 학교 미세먼지 취약성 평가: Part I - 미세먼지 예측 모델링)

  • Son, Sanghun;Kim, Jinsoo
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.6_2
    • /
    • pp.1881-1890
    • /
    • 2021
  • Particulate matter (PM) affects the human, ecosystems, and weather. Motorized vehicles and combustion generate fine particulate matter (PM2.5), which can contain toxic substances and, therefore, requires systematic management. Consequently, it is important to monitor and predict PM2.5 concentrations, especially in large cities with dense populations and infrastructures. This study aimed to predict PM2.5 concentrations in large cities using meteorological and chemical variables as well as satellite-based aerosol optical depth. For PM2.5 concentrations prediction, a random forest (RF) model showing excellent performance in PM concentrations prediction among machine learning models was selected. Based on the performance indicators R2, RMSE, MAE, and MAPE with training accuracies of 0.97, 3.09, 2.18, and 13.31 and testing accuracies of 0.82, 6.03, 4.36, and 25.79 for R2, RMSE, MAE, and MAPE, respectively. The variables used in this study showed high correlation to PM2.5 concentrations. Therefore, we conclude that these variables can be used in a random forest model to generate reliable PM2.5 concentrations predictions, which can then be used to assess the vulnerability of schools to PM2.5.

K-means clustering analysis and differential protection policy according to 3D NAND flash memory error rate to improve SSD reliability

  • Son, Seung-Woo;Kim, Jae-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.11
    • /
    • pp.1-9
    • /
    • 2021
  • 3D-NAND flash memory provides high capacity per unit area by stacking 2D-NAND cells having a planar structure. However, due to the nature of the lamination process, there is a problem that the frequency of error occurrence may vary depending on each layer or physical cell location. This phenomenon becomes more pronounced as the number of write/erase(P/E) operations of the flash memory increases. Most flash-based storage devices such as SSDs use ECC for error correction. Since this method provides a fixed strength of data protection for all flash memory pages, it has limitations in 3D NAND flash memory, where the error rate varies depending on the physical location. Therefore, in this paper, pages and layers with different error rates are classified into clusters through the K-means machine learning algorithm, and differentiated data protection strength is applied to each cluster. We classify pages and layers based on the number of errors measured after endurance test, where the error rate varies significantly for each page and layer, and add parity data to stripes for areas vulnerable to errors to provides differentiate data protection strength. We show the possibility that this differentiated data protection policy can contribute to the improvement of reliability and lifespan of 3D NAND flash memory compared to the protection techniques using RAID-like or ECC alone.

Character Motion Control by Using Limited Sensors and Animation Data (제한된 모션 센서와 애니메이션 데이터를 이용한 캐릭터 동작 제어)

  • Bae, Tae Sung;Lee, Eun Ji;Kim, Ha Eun;Park, Minji;Choi, Myung Geol
    • Journal of the Korea Computer Graphics Society
    • /
    • v.25 no.3
    • /
    • pp.85-92
    • /
    • 2019
  • A 3D virtual character playing a role in a digital story-telling has a unique style in its appearance and motion. Because the style reflects the unique personality of the character, it is very important to preserve the style and keep its consistency. However, when the character's motion is directly controlled by a user's motion who is wearing motion sensors, the unique style can be discarded. We present a novel character motion control method that uses only a small amount of animation data created only for the character to preserve the style of the character motion. Instead of machine learning approaches requiring a large amount of training data, we suggest a search-based method, which directly searches the most similar character pose from the animation data to the current user's pose. To show the usability of our method, we conducted our experiments with a character model and its animation data created by an expert designer for a virtual reality game. To prove that our method preserves well the original motion style of the character, we compared our result with the result obtained by using general human motion capture data. In addition, to show the scalability of our method, we presented experimental results with different numbers of motion sensors.

Improvement of precipitation forecasting skill of ECMWF data using multi-layer perceptron technique (다층퍼셉트론 기법을 이용한 ECMWF 예측자료의 강수예측 정확도 향상)

  • Lee, Seungsoo;Kim, Gayoung;Yoon, Soonjo;An, Hyunuk
    • Journal of Korea Water Resources Association
    • /
    • v.52 no.7
    • /
    • pp.475-482
    • /
    • 2019
  • Subseasonal-to-Seasonal (S2S) prediction information which have 2 weeks to 2 months lead time are expected to be used through many parts of industry fields, but utilizability is not reached to expectation because of lower predictability than weather forecast and mid- /long-term forecast. In this study, we used multi-layer perceptron (MLP) which is one of machine learning technique that was built for regression training in order to improve predictability of S2S precipitation data at South Korea through post-processing. Hindcast information of ECMWF was used for MLP training and the original data were compared with trained outputs based on dichotomous forecast technique. As a result, Bias score, accuracy, and Critical Success Index (CSI) of trained output were improved on average by 59.7%, 124.3% and 88.5%, respectively. Probability of detection (POD) score was decreased on average by 9.5% and the reason was analyzed that ECMWF's model excessively predicted precipitation days. In this study, we confirmed that predictability of ECMWF's S2S information can be improved by post-processing using MLP even the predictability of original data was low. The results of this study can be used to increase the capability of S2S information in water resource and agricultural fields.

Prediction Model of User Physical Activity using Data Characteristics-based Long Short-term Memory Recurrent Neural Networks

  • Kim, Joo-Chang;Chung, Kyungyong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.4
    • /
    • pp.2060-2077
    • /
    • 2019
  • Recently, mobile healthcare services have attracted significant attention because of the emerging development and supply of diverse wearable devices. Smartwatches and health bands are the most common type of mobile-based wearable devices and their market size is increasing considerably. However, simple value comparisons based on accumulated data have revealed certain problems, such as the standardized nature of health management and the lack of personalized health management service models. The convergence of information technology (IT) and biotechnology (BT) has shifted the medical paradigm from continuous health management and disease prevention to the development of a system that can be used to provide ground-based medical services regardless of the user's location. Moreover, the IT-BT convergence has necessitated the development of lifestyle improvement models and services that utilize big data analysis and machine learning to provide mobile healthcare-based personal health management and disease prevention information. Users' health data, which are specific as they change over time, are collected by different means according to the users' lifestyle and surrounding circumstances. In this paper, we propose a prediction model of user physical activity that uses data characteristics-based long short-term memory (DC-LSTM) recurrent neural networks (RNNs). To provide personalized services, the characteristics and surrounding circumstances of data collectable from mobile host devices were considered in the selection of variables for the model. The data characteristics considered were ease of collection, which represents whether or not variables are collectable, and frequency of occurrence, which represents whether or not changes made to input values constitute significant variables in terms of activity. The variables selected for providing personalized services were activity, weather, temperature, mean daily temperature, humidity, UV, fine dust, asthma and lung disease probability index, skin disease probability index, cadence, travel distance, mean heart rate, and sleep hours. The selected variables were classified according to the data characteristics. To predict activity, an LSTM RNN was built that uses the classified variables as input data and learns the dynamic characteristics of time series data. LSTM RNNs resolve the vanishing gradient problem that occurs in existing RNNs. They are classified into three different types according to data characteristics and constructed through connections among the LSTMs. The constructed neural network learns training data and predicts user activity. To evaluate the proposed model, the root mean square error (RMSE) was used in the performance evaluation of the user physical activity prediction method for which an autoregressive integrated moving average (ARIMA) model, a convolutional neural network (CNN), and an RNN were used. The results show that the proposed DC-LSTM RNN method yields an excellent mean RMSE value of 0.616. The proposed method is used for predicting significant activity considering the surrounding circumstances and user status utilizing the existing standardized activity prediction services. It can also be used to predict user physical activity and provide personalized healthcare based on the data collectable from mobile host devices.