• Title/Summary/Keyword: model predictive

Search Result 2,328, Processing Time 0.029 seconds

Predicting the Progression of Chronic Renal Failure using Serum Creatinine factored for Height (소아 만성신부전의 진행 예측에 관한 연구)

  • Kim, Kyo-Sun;We, Harmon
    • Childhood Kidney Diseases
    • /
    • v.4 no.2
    • /
    • pp.144-153
    • /
    • 2000
  • Purpose : Effects to predict tile progression of chronic renal failure (CRF) in children, using mathematical models based on transformations of serum creatinine (Scr) concentration, have failed. Error may be introduced by age-related variations in creatinine production rate. Height (Ht) is a reliable reference for creatinine production in children. Thus, Scr, factored for Ht, could provide a more accurate predictive model. We examined this hypothesis. Methods : The progression of of was detected in 63 children who proceeded to end-stage renal disease. Derivatives of Scr, including 1/Scr, log Scr & Ht/Scr, were defined fir the period Scr was between 2 and 5 mg/dl. Regression equation were used to predict the time, in months, to Scr > 10 mg/dl. The prediction error (PE) was defined as the predicted time minus actual time for each Scr transformation. Result : The PE for Ht/Scr was lower than the PE for either 1/Scr or log Scr (median: -0.01, -2.0 & +10.6 mos respectively; P<0.0001). For children with congenital renal diseases, the PE for Ht/Scr was also lower than for the other two transformations (median: -1.2, -3.2 & +8.2 mos respectively; P<0.0001). However, the PEs for children with glomerular diseases was not as clearly different (median: +0.9, +0.5 & +9.9 respectively). In children < 13 yrs, PE for Ht/Scr was tile lowest, while in older children, 1/Scr provided the lowest PE but not significantly different from that for Ht/Scr. The logarithmic transformation tended to predict a slower progression of CRF than actually occurred. Conclusion : Scr, floored for Ht, appears to be a useful model to predict the rate of progression of CRF, particularly in the prepubertal child with congenital renal disease.

  • PDF

Tissue Distribution of HuR Protein in Crohn's Disease and IBD Experimental Model (염증성 장질환 모델 및 크론병 환자에서의 점막상피 HuR 단백질의 변화 분석)

  • Choi, Hye Jin;Park, Jae-Hong;Park, Jiyeon;Kim, Juil;Park, Seong-Hwan;Oh, Chang Gyu;Do, Kee Hun;Song, Bo Gyoung;Lee, Seung Joon;Moon, Yuseok
    • Journal of Life Science
    • /
    • v.24 no.12
    • /
    • pp.1339-1344
    • /
    • 2014
  • Inflammatory bowel disease is an immune disorder associated with chronic mucosal inflammation and severe ulceration in the gastrointestinal tract. Antibodies against proinflammatory cytokines, including TNF${\alpha}$, are currently used as promising therapeutic agents against the disease. Stabilization of the transcript is a crucial post-transcriptional process in the expression of proinflammatory cytokines. In the present study, we assessed the expression and histological distribution of the HuR protein, an important transcript stabilizer, in tissues from experimental animals and patients with Crohn's disease. The total and cytosolic levels of the HuR protein were enhanced in the intestinal epithelia from dextran sodium sulfate (DSS)-treated mice compared to those in control tissues from normal mice. Moreover, the expression of HuR was very high only in the mucosal and glandular epithelium, and the relative localization of the protein was sequestered in the lower parts of the villus during the DSS insult. The expression of HuR was significantly higher in mucosal lesions than in normal-looking areas. Consistent with the data from the animal model, the expression of HuR was confined to the mucosal and glandular epithelium. These results suggest that HuR may contribute to the post-transcriptional regulation of proinflammatory genes during early mucosal insults. More mechanistic investigations are warranted to determine the potential use of HuR as a predictive biomarker or a promising target against IBD.

Impact of Sulfur Dioxide Impurity on Process Design of $CO_2$ Offshore Geological Storage: Evaluation of Physical Property Models and Optimization of Binary Parameter (이산화황 불순물이 이산화탄소 해양 지중저장 공정설계에 미치는 영향 평가: 상태량 모델의 비교 분석 및 이성분 매개변수 최적화)

  • Huh, Cheol;Kang, Seong-Gil;Cho, Mang-Ik
    • Journal of the Korean Society for Marine Environment & Energy
    • /
    • v.13 no.3
    • /
    • pp.187-197
    • /
    • 2010
  • Carbon dioxide Capture and Storage(CCS) is regarded as one of the most promising options to response climate change. CCS is a three-stage process consisting of the capture of carbon dioxide($CO_2$), the transport of $CO_2$ to a storage location, and the long term isolation of $CO_2$ from the atmosphere for the purpose of carbon emission mitigation. Up to now, process design for this $CO_2$ marine geological storage has been carried out mainly on pure $CO_2$. Unfortunately the $CO_2$ mixture captured from the power plants and steel making plants contains many impurities such as $N_2$, $O_2$, Ar, $H_2O$, $SO_2$, $H_2S$. A small amount of impurities can change the thermodynamic properties and then significantly affect the compression, purification, transport and injection processes. In order to design a reliable $CO_2$ marine geological storage system, it is necessary to analyze the impact of these impurities on the whole CCS process at initial design stage. The purpose of the present paper is to compare and analyse the relevant physical property models including BWRS, PR, PRBM, RKS and SRK equations of state, and NRTL-RK model which are crucial numerical process simulation tools. To evaluate the predictive accuracy of the equation of the state for $CO_2-SO_2$ mixture, we compared numerical calculation results with reference experimental data. In addition, optimum binary parameter to consider the interaction of $CO_2$ and $SO_2$ molecules was suggested based on the mean absolute percent error. In conclusion, we suggest the most reliable physical property model with optimized binary parameter in designing the $CO_2-SO_2$ mixture marine geological storage process.

Scalable Collaborative Filtering Technique based on Adaptive Clustering (적응형 군집화 기반 확장 용이한 협업 필터링 기법)

  • Lee, O-Joun;Hong, Min-Sung;Lee, Won-Jin;Lee, Jae-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.73-92
    • /
    • 2014
  • An Adaptive Clustering-based Collaborative Filtering Technique was proposed to solve the fundamental problems of collaborative filtering, such as cold-start problems, scalability problems and data sparsity problems. Previous collaborative filtering techniques were carried out according to the recommendations based on the predicted preference of the user to a particular item using a similar item subset and a similar user subset composed based on the preference of users to items. For this reason, if the density of the user preference matrix is low, the reliability of the recommendation system will decrease rapidly. Therefore, the difficulty of creating a similar item subset and similar user subset will be increased. In addition, as the scale of service increases, the time needed to create a similar item subset and similar user subset increases geometrically, and the response time of the recommendation system is then increased. To solve these problems, this paper suggests a collaborative filtering technique that adapts a condition actively to the model and adopts the concepts of a context-based filtering technique. This technique consists of four major methodologies. First, items are made, the users are clustered according their feature vectors, and an inter-cluster preference between each item cluster and user cluster is then assumed. According to this method, the run-time for creating a similar item subset or user subset can be economized, the reliability of a recommendation system can be made higher than that using only the user preference information for creating a similar item subset or similar user subset, and the cold start problem can be partially solved. Second, recommendations are made using the prior composed item and user clusters and inter-cluster preference between each item cluster and user cluster. In this phase, a list of items is made for users by examining the item clusters in the order of the size of the inter-cluster preference of the user cluster, in which the user belongs, and selecting and ranking the items according to the predicted or recorded user preference information. Using this method, the creation of a recommendation model phase bears the highest load of the recommendation system, and it minimizes the load of the recommendation system in run-time. Therefore, the scalability problem and large scale recommendation system can be performed with collaborative filtering, which is highly reliable. Third, the missing user preference information is predicted using the item and user clusters. Using this method, the problem caused by the low density of the user preference matrix can be mitigated. Existing studies on this used an item-based prediction or user-based prediction. In this paper, Hao Ji's idea, which uses both an item-based prediction and user-based prediction, was improved. The reliability of the recommendation service can be improved by combining the predictive values of both techniques by applying the condition of the recommendation model. By predicting the user preference based on the item or user clusters, the time required to predict the user preference can be reduced, and missing user preference in run-time can be predicted. Fourth, the item and user feature vector can be made to learn the following input of the user feedback. This phase applied normalized user feedback to the item and user feature vector. This method can mitigate the problems caused by the use of the concepts of context-based filtering, such as the item and user feature vector based on the user profile and item properties. The problems with using the item and user feature vector are due to the limitation of quantifying the qualitative features of the items and users. Therefore, the elements of the user and item feature vectors are made to match one to one, and if user feedback to a particular item is obtained, it will be applied to the feature vector using the opposite one. Verification of this method was accomplished by comparing the performance with existing hybrid filtering techniques. Two methods were used for verification: MAE(Mean Absolute Error) and response time. Using MAE, this technique was confirmed to improve the reliability of the recommendation system. Using the response time, this technique was found to be suitable for a large scaled recommendation system. This paper suggested an Adaptive Clustering-based Collaborative Filtering Technique with high reliability and low time complexity, but it had some limitations. This technique focused on reducing the time complexity. Hence, an improvement in reliability was not expected. The next topic will be to improve this technique by rule-based filtering.

A Study on People Counting in Public Metro Service using Hybrid CNN-LSTM Algorithm (Hybrid CNN-LSTM 알고리즘을 활용한 도시철도 내 피플 카운팅 연구)

  • Choi, Ji-Hye;Kim, Min-Seung;Lee, Chan-Ho;Choi, Jung-Hwan;Lee, Jeong-Hee;Sung, Tae-Eung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.131-145
    • /
    • 2020
  • In line with the trend of industrial innovation, IoT technology utilized in a variety of fields is emerging as a key element in creation of new business models and the provision of user-friendly services through the combination of big data. The accumulated data from devices with the Internet-of-Things (IoT) is being used in many ways to build a convenience-based smart system as it can provide customized intelligent systems through user environment and pattern analysis. Recently, it has been applied to innovation in the public domain and has been using it for smart city and smart transportation, such as solving traffic and crime problems using CCTV. In particular, it is necessary to comprehensively consider the easiness of securing real-time service data and the stability of security when planning underground services or establishing movement amount control information system to enhance citizens' or commuters' convenience in circumstances with the congestion of public transportation such as subways, urban railways, etc. However, previous studies that utilize image data have limitations in reducing the performance of object detection under private issue and abnormal conditions. The IoT device-based sensor data used in this study is free from private issue because it does not require identification for individuals, and can be effectively utilized to build intelligent public services for unspecified people. Especially, sensor data stored by the IoT device need not be identified to an individual, and can be effectively utilized for constructing intelligent public services for many and unspecified people as data free form private issue. We utilize the IoT-based infrared sensor devices for an intelligent pedestrian tracking system in metro service which many people use on a daily basis and temperature data measured by sensors are therein transmitted in real time. The experimental environment for collecting data detected in real time from sensors was established for the equally-spaced midpoints of 4×4 upper parts in the ceiling of subway entrances where the actual movement amount of passengers is high, and it measured the temperature change for objects entering and leaving the detection spots. The measured data have gone through a preprocessing in which the reference values for 16 different areas are set and the difference values between the temperatures in 16 distinct areas and their reference values per unit of time are calculated. This corresponds to the methodology that maximizes movement within the detection area. In addition, the size of the data was increased by 10 times in order to more sensitively reflect the difference in temperature by area. For example, if the temperature data collected from the sensor at a given time were 28.5℃, the data analysis was conducted by changing the value to 285. As above, the data collected from sensors have the characteristics of time series data and image data with 4×4 resolution. Reflecting the characteristics of the measured, preprocessed data, we finally propose a hybrid algorithm that combines CNN in superior performance for image classification and LSTM, especially suitable for analyzing time series data, as referred to CNN-LSTM (Convolutional Neural Network-Long Short Term Memory). In the study, the CNN-LSTM algorithm is used to predict the number of passing persons in one of 4×4 detection areas. We verified the validation of the proposed model by taking performance comparison with other artificial intelligence algorithms such as Multi-Layer Perceptron (MLP), Long Short Term Memory (LSTM) and RNN-LSTM (Recurrent Neural Network-Long Short Term Memory). As a result of the experiment, proposed CNN-LSTM hybrid model compared to MLP, LSTM and RNN-LSTM has the best predictive performance. By utilizing the proposed devices and models, it is expected various metro services will be provided with no illegal issue about the personal information such as real-time monitoring of public transport facilities and emergency situation response services on the basis of congestion. However, the data have been collected by selecting one side of the entrances as the subject of analysis, and the data collected for a short period of time have been applied to the prediction. There exists the limitation that the verification of application in other environments needs to be carried out. In the future, it is expected that more reliability will be provided for the proposed model if experimental data is sufficiently collected in various environments or if learning data is further configured by measuring data in other sensors.

Long-term Prognostic Value of Dipyridamole Stress Myocardial SPECT (디피리다몰 부하 심근관류 SPECT의 장기예후 예측능)

  • Lee, Dong-Soo;Cheon, Gi-Jeong;Jang, Myung-Jin;Kang, Won-Jun;Chung, June-Key;Lee, Myoung-Mook;Lee, Myung-Chul;Kang, Wee-Chang;Lee, Young-Jo
    • The Korean Journal of Nuclear Medicine
    • /
    • v.34 no.1
    • /
    • pp.39-54
    • /
    • 2000
  • Purpose: Dipyridamole stress myocardial perfusion SPECT could predict prognosis, however, long-term follow-up showed change of hazard ratio in patients with suspected coronary artery disease. We investigated how long normal SPECT could predict the benign prognosis on the long-term follow-up. Materials and Methods: We followed up 1169 patients and divided these patients into groups in whom coronary angiography were performed and were not. Total cardiac event rate and hard event rate were predicted using clinical, angiographic and SPECT findings. Predictive values of normal and abnormal SPECT were examined using survival analysis with Mantel-Haenszel method, multivariate Cox proportional hazard model analysis and newly developed statistical method to test time-invariance of hazard rate and changing point of this rate. Results: Reversible perfusion decrease on myocardial perfusion SPECT predicted higher total cardiac event rate independently and further to angiographic findings. However, myocardial SPECT showed independent but not incremental prognostic values for hard event rate. Hazard ratio of normal perfusion SPECT was changed significantly (p<0.001) and the changing point of hazard rate was 4.4 years of follows up. However, the ratio of abnormal SPECT was not. Conclusion: Dipyridamole stress myocardial perfusion SPECT provided independent prognostic information in patients with known and suspected coronary artery disease. Normal perfusion SPECT predicted least event rate for 4.4 years.

  • PDF

Effects of Nutritional Status, Activities Daily Living, Instruments Activities Daily Living, and Social Network on the Life Satisfaction of the Elderly in Home (재가노인의 영양상태, 일상생활 수행능력, 도구적 일상생활 수행능력 및 사회적 연결망이 삶의 만족도에 미치는 영향)

  • Yang, Kyoung Mi
    • Journal of the Korean Applied Science and Technology
    • /
    • v.36 no.4
    • /
    • pp.1472-1484
    • /
    • 2019
  • This study aimed to verify the effects of nutritional status, K-ADL, K-IADL, and social network on the life satisfaction of the elderly in home. Total 213 research subjects participated in this study, and their average age was 71.38±5.59. As the methods of analysis, using the SPSS 21.0, this study examined the differences between variables in accordance with the general characteristics, and then verified the correlations between independent variables of nutritional status, K-ADL, K-IADL, social network(family networks, friends networks), and life satisfaction. In order to verify the factors having effects on the life satisfaction of the elderly in home, the stepwise multiple regression analysis was conducted. In the results of this study, in the general characteristics, the life satisfaction showed statistically significant differences in accordance with education(F=5.280, p=.002), economic condition(F=22.407, p<.001), monthly income(F=3.181, p=.015), and subjective health status(F=14.933, p<.001). In the results of verifying the correlation between independent variables, the life satisfaction showed positive correlations with family networks(r=268, p<.001) and friends networks(r=.286, p<.001) while the nutritional status(r=-.222, p=.001), K-IADL(r=-.235, p=.001), and interdependent social support(r=-.283, p<.001) showed negative correlations. The predictive factors on the life satisfaction of the elderly in home included the economic condition(β=.358, p<.001), subjective health status(β=.245, p<.001), interdependent social support(β=-.158, p=.009), and K-IADL(β=-.153, p=.012), and the explanatory power was 30.1%. The regression model was statistically significant(F=23.778, p<.001). Based on such results of this study, it would be necessary to develop programs that could maintain and improve the health of the elderly, and also provide financial support to the elderly suffering from economic hardship, in order to improve the life satisfaction of the elderly in home. Moreover, there should be the concrete measures for vitalizing the community-connected activities for interdependent social support.

Serum Tumor Marker Levels might have Little Significance in Evaluating Neoadjuvant Treatment Response in Locally Advanced Breast Cancer

  • Wang, Yu-Jie;Huang, Xiao-Yan;Mo, Miao;Li, Jian-Wei;Jia, Xiao-Qing;Shao, Zhi-Min;Shen, Zhen-Zhou;Wu, Jiong;Liu, Guang-Yu
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.11
    • /
    • pp.4603-4608
    • /
    • 2015
  • Background: To determine the potential value of serum tumor markers in predicting pCR (pathological complete response) during neoadjuvant chemotherapy. Materials and Methods: We retrospectively monitored the pro-, mid-, and post-neoadjuvant treatment serum tumor marker concentrations in patients with locally advanced breast cancer (stage II-III) who accepted pre-surgical chemotherapy or chemotherapy in combination with targeted therapy at Fudan University Shanghai Cancer Center between September 2011 and January 2014 and investigated the association of serum tumor marker levels with therapeutic effect. Core needle biopsy samples were assessed using immunohistochemistry (IHC) prior to neoadjuvant treatment to determine hormone receptor, human epidermal growth factor receptor 2(HER2), and proliferation index Ki67 values. In our study, therapeutic response was evaluated by pCR, defined as the disappearance of all invasive cancer cells from excised tissue (including primary lesion and axillary lymph nodes) after completion of chemotherapy. Analysis of variance of repeated measures and receiver operating characteristic (ROC) curves were employed for statistical analysis of the data. Results: A total of 348 patients were recruited in our study after excluding patients with incomplete clinical information. Of these, 106 patients were observed to have acquired pCR status after treatment completion, accounting for approximately 30.5% of study individuals. In addition, 147patients were determined to be Her-2 positive, among whom the pCR rate was 45.6% (69 patients). General linear model analysis (repeated measures analysis of variance) showed that the concentration of cancer antigen (CA) 15-3 increased after neoadjuvant chemotherapy in both pCR and non-pCR groups, and that there were significant differences between the two groups (P=0.008). The areas under the ROC curves (AUCs) of pre-, mid-, and post-treatment CA15-3 concentrations demonstrated low-level predictive value (AUC=0.594, 0.644, 0.621, respectively). No significant differences in carcinoembryonic antigen (CEA) or CA12-5 serum levels were observed between the pCR and non-pCR groups (P=0.196 and 0.693, respectively). No efficient AUC of CEA or CA12-5 concentrations were observed to predict patient response toward neoadjuvant treatment (both less than 0.7), nor were differences between the two groups observed at different time points. We then analyzed the Her-2 positive subset of our cohort. Significant differences in CEA concentrations were identified between the pCR and non-pCR groups (P=0.039), but not in CA15-3 or CA12-5 levels (p=0.092 and 0.89, respectively). None of the ROC curves showed underlying prognostic value, as the AUCs of these three markers were less than 0.7. The ROC-AUCs for the CA12-5 concentrations of inter-and post-neoadjuvant chemotherapy in the estrogen receptor negative HER2 positive subgroup were 0.735 and 0.767, respectively. However, the specificity and sensitivity values were at odds with each other which meant that improving either the sensitivity or specificity would impair the efficiency of the other. Conclusions: Serum tumor markers CA15-3, CA12-5, and CEA might have little clinical significance in predicting neoadjuvant treatment response in locally advanced breast cancer.

A Study to Validate the Pretest Probability of Malignancy in Solitary Pulmonary Nodule (사전검사를 통한 고립성 폐결절 환자에서의 악성 확률 타당성에 대한 연구)

  • Jang, Joo Hyun;Park, Sung Hoon;Choi, Jeong Hee;Lee, Chang Youl;Hwang, Yong Il;Shin, Tae Rim;Park, Yong Bum;Lee, Jae Young;Jang, Seung Hun;Kim, Cheol Hong;Park, Sang Myeon;Kim, Dong Gyu;Lee, Myung Goo;Hyun, In Gyu;Jung, Ki Suck
    • Tuberculosis and Respiratory Diseases
    • /
    • v.67 no.2
    • /
    • pp.105-112
    • /
    • 2009
  • Background: Solitary pulmonary nodules (SPN) are encountered incidentally in 0.2% of patients who undergo chest X-ray or chest CT. Although SPN has malignant potential, it cannot be treated surgically by biopsy in all patients. The first stage is to determine if patients with SPN require periodic observation and biopsy or resection. An important early step in the management of patients with SPN is to estimate the clinical pretest probability of a malignancy. In every patient with SPN, it is recommended that clinicians estimate the pretest probability of a malignancy either qualitatively using clinical judgment or quantitatively using a validated model. This study examined whether Bayesian analysis or multiple logistic regression analysis is more predictive of the probability of a malignancy in SPN. Methods: From January 2005 to December 2008, this study enrolled 63 participants with SPN at the Kangnam Sacred Hospital. The accuracy of Bayesian analysis and Bayesian analysis with a FDG-PET scan, and Multiple logistic regression analysis was compared retrospectively. The accurate probability of a malignancy in a patient was compared by taking the chest CT and pathology of SPN patients with <30 mm at CXR incidentally. Results: From those participated in study, 27 people (42.9%) were classified as having a malignancy, and 36 people were benign. The result of the malignant estimation by Bayesian analysis was 0.779 (95% confidence interval [CI], 0.657 to 0.874). Using Multiple logistic regression analysis, the result was 0.684 (95% CI, 0.555 to 0.796). This suggests that Bayesian analysis provides a more accurate examination than multiple logistic regression analysis. Conclusion: Bayesian analysis is better than multiple logistic regression analysis in predicting the probability of a malignancy in solitary pulmonary nodules but the difference was not statistically significant.

Online news-based stock price forecasting considering homogeneity in the industrial sector (산업군 내 동질성을 고려한 온라인 뉴스 기반 주가예측)

  • Seong, Nohyoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.1-19
    • /
    • 2018
  • Since stock movements forecasting is an important issue both academically and practically, studies related to stock price prediction have been actively conducted. The stock price forecasting research is classified into structured data and unstructured data, and it is divided into technical analysis, fundamental analysis and media effect analysis in detail. In the big data era, research on stock price prediction combining big data is actively underway. Based on a large number of data, stock prediction research mainly focuses on machine learning techniques. Especially, research methods that combine the effects of media are attracting attention recently, among which researches that analyze online news and utilize online news to forecast stock prices are becoming main. Previous studies predicting stock prices through online news are mostly sentiment analysis of news, making different corpus for each company, and making a dictionary that predicts stock prices by recording responses according to the past stock price. Therefore, existing studies have examined the impact of online news on individual companies. For example, stock movements of Samsung Electronics are predicted with only online news of Samsung Electronics. In addition, a method of considering influences among highly relevant companies has also been studied recently. For example, stock movements of Samsung Electronics are predicted with news of Samsung Electronics and a highly related company like LG Electronics.These previous studies examine the effects of news of industrial sector with homogeneity on the individual company. In the previous studies, homogeneous industries are classified according to the Global Industrial Classification Standard. In other words, the existing studies were analyzed under the assumption that industries divided into Global Industrial Classification Standard have homogeneity. However, existing studies have limitations in that they do not take into account influential companies with high relevance or reflect the existence of heterogeneity within the same Global Industrial Classification Standard sectors. As a result of our examining the various sectors, it can be seen that there are sectors that show the industrial sectors are not a homogeneous group. To overcome these limitations of existing studies that do not reflect heterogeneity, our study suggests a methodology that reflects the heterogeneous effects of the industrial sector that affect the stock price by applying k-means clustering. Multiple Kernel Learning is mainly used to integrate data with various characteristics. Multiple Kernel Learning has several kernels, each of which receives and predicts different data. To incorporate effects of target firm and its relevant firms simultaneously, we used Multiple Kernel Learning. Each kernel was assigned to predict stock prices with variables of financial news of the industrial group divided by the target firm, K-means cluster analysis. In order to prove that the suggested methodology is appropriate, experiments were conducted through three years of online news and stock prices. The results of this study are as follows. (1) We confirmed that the information of the industrial sectors related to target company also contains meaningful information to predict stock movements of target company and confirmed that machine learning algorithm has better predictive power when considering the news of the relevant companies and target company's news together. (2) It is important to predict stock movements with varying number of clusters according to the level of homogeneity in the industrial sector. In other words, when stock prices are homogeneous in industrial sectors, it is important to use relational effect at the level of industry group without analyzing clusters or to use it in small number of clusters. When the stock price is heterogeneous in industry group, it is important to cluster them into groups. This study has a contribution that we testified firms classified as Global Industrial Classification Standard have heterogeneity and suggested it is necessary to define the relevance through machine learning and statistical analysis methodology rather than simply defining it in the Global Industrial Classification Standard. It has also contribution that we proved the efficiency of the prediction model reflecting heterogeneity.