• Title/Summary/Keyword: Review score prediction

Search Result 30, Processing Time 0.025 seconds

A Study on Dementia Prediction Models and Commercial Utilization Strategies Using Machine Learning Techniques: Based on Sleep and Activity Data from Wearable Devices (머신러닝 기법을 활용한 치매 예측 모델과 상업적 활용 전략: 웨어러블 기기의 수면 및 활동 데이터를 기반으로)

  • Youngeun Jo;Jongpil Yu;Joongan Kim
    • Information Systems Review
    • /
    • v.26 no.2
    • /
    • pp.137-153
    • /
    • 2024
  • This study aimed to propose early diagnosis and management of dementia, which is increasing in aging societies, and suggest commercial utilization strategies by leveraging digital healthcare technologies, particularly lifelog data collected from wearable devices. By introducing new approaches to dementia prevention and management, this study sought to contribute to the field of dementia prediction and prevention. The research utilized 12,184 pieces of lifelog information (sleep and activity data) and dementia diagnosis data collected from 174 individuals aged between 60 and 80, based on medical pathological diagnoses. During the research process, a multidimensional dataset including sleep and activity data was standardized, and various machine learning algorithms were analyzed, with the random forest model showing the highest ROC-AUC score, indicating superior performance. Furthermore, an ablation test was conducted to evaluate the impact of excluding variables related to sleep and activity on the model's predictive power, confirming that regular sleep and activity have a significant influence on dementia prevention. Lastly, by exploring the potential for commercial utilization strategies of the developed model, the study proposed new directions for the commercial spread of dementia prevention systems.

Sentiment analysis on movie review through building modified sentiment dictionary by movie genre (영역별 맞춤형 감성사전 구축을 통한 영화리뷰 감성분석)

  • Lee, Sang Hoon;Cui, Jing;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.97-113
    • /
    • 2016
  • Due to the growth of internet data and the rapid development of internet technology, "big data" analysis is actively conducted to analyze enormous data for various purposes. Especially in recent years, a number of studies have been performed on the applications of text mining techniques in order to overcome the limitations of existing structured data analysis. Various studies on sentiment analysis, the part of text mining techniques, are actively studied to score opinions based on the distribution of polarity of words in documents. Usually, the sentiment analysis uses sentiment dictionary contains positivity and negativity of vocabularies. As a part of such studies, this study tries to construct sentiment dictionary which is customized to specific data domain. Using a common sentiment dictionary for sentiment analysis without considering data domain characteristic cannot reflect contextual expression only used in the specific data domain. So, we can expect using a modified sentiment dictionary customized to data domain can lead the improvement of sentiment analysis efficiency. Therefore, this study aims to suggest a way to construct customized dictionary to reflect characteristics of data domain. Especially, in this study, movie review data are divided by genre and construct genre-customized dictionaries. The performance of customized dictionary in sentiment analysis is compared with a common sentiment dictionary. In this study, IMDb data are chosen as the subject of analysis, and movie reviews are categorized by genre. Six genres in IMDb, 'action', 'animation', 'comedy', 'drama', 'horror', and 'sci-fi' are selected. Five highest ranking movies and five lowest ranking movies per genre are selected as training data set and two years' movie data from 2012 September 2012 to June 2014 are collected as test data set. Using SO-PMI (Semantic Orientation from Point-wise Mutual Information) technique, we build customized sentiment dictionary per genre and compare prediction accuracy on review rating. As a result of the analysis, the prediction using customized dictionaries improves prediction accuracy. The performance improvement is 2.82% in overall and is statistical significant. Especially, the customized dictionary on 'sci-fi' leads the highest accuracy improvement among six genres. Even though this study shows the usefulness of customized dictionaries in sentiment analysis, further studies are required to generalize the results. In this study, we only consider adjectives as additional terms in customized sentiment dictionary. Other part of text such as verb and adverb can be considered to improve sentiment analysis performance. Also, we need to apply customized sentiment dictionary to other domain such as product reviews.

Comparing the Performance of Three Severity Scoring Systems for ICU Patients: APACHE III, SAPS II, MPM II (중환자 중증도 평가도구의 타당도 평가 - APACHE III, SAPS II, MPM II)

  • Kwon, Young-Dae;Hwang, Jeong-Hae;Kim, Eun-Kyung
    • Journal of Preventive Medicine and Public Health
    • /
    • v.38 no.3
    • /
    • pp.276-282
    • /
    • 2005
  • Objectives : To evaluate the predictive validity of three scoring systems; the acute physiology and chronic health evaluation(APACHE) III, simplified acute physiology score(SAPS) II, and mortality probability model(MPM) II systems in critically ill patients. Methods : A concurrent and retrospective study conducted by collecting data on consecutive patients admitted to the intensive care unit(ICU) including surgical, medical and coronary care unit between January 1, 2004, and March 31, 2004. Data were collected on 348 patients consecutively admitted to the ICU(aged 16 years or older, no transfer, ICU stay at least 8 hours). Three models were analyzed using logistic regression. Discrimination was assessed using receiver operating characteristic(ROC) curves, sensitivity, specificity, and correct classification rate. Calibration was assessed using the Lemeshow-Hosmer goodness of fit H-statistic. Results : For the APACHE III, SAPS II and MPM II systems, the area under the receiver operating characterist ic(ROC) curves were 0.981, 0.978, and 0.941 respectively. With a predicted risk of 0.5, the sensitivities for the APACHE III, SAPS II, and MPM II systems were 81.1, 79.2 and 71.7%, the specificities 98.3, 98.6, and 98.3%, and the correct classification rates 95.7, 95.7, and 94.3%, respectively. The SAPS II and APACHE III systems showed good calibrations(chi-squared H=2.5838 p=0.9577 for SAPS II, and chi-squared H=4.3761 p=0.8217 for APACHE III). Conclusions : The APACHE III and SAPS II systems have excellent powers of mortality prediction, and calibration, and can be useful tools for the quality assessment of intensive care units(ICUs).

Definition, End-of-life Criterion and Prediction of Service Life for Bridge Maintenance (교량의 유지관리를 위한 사용수명 정의, 종료 기준, 추정)

  • Jeong, Yo-Seok;Kim, Woo-Seok;Lee, Il-Keun;Lee, Jae-Ha;Kim, Jin-Kwang
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.20 no.4
    • /
    • pp.68-76
    • /
    • 2016
  • The present study proposes the definition of service life and the end-of-life criterion for bridge maintenance. Bridges begin to deteriorate as soon as they are put into service. Effective bridge maintenance requires sound understanding of the deterioration mechanism as well as the expected service life. In order to determine the expected service life of a bridge for effective bridge maintenance, it is necessary to have a clear definition of service life and end-of-life. However, service life can be viewed from several perspectives based on literature review. The end of a bridge's life can be also defined by more than one perspective or performance measure. This study presents definition of service life which can be used for bridge maintenance and the end-of life criterion using the performance measure such as a damage score. The regression model can predict an average service life of bridges using the proposed end-of-life criterion.

Quality Prediction Model for Manufacturing Process of Free-Machining 303-series Stainless Steel Small Rolling Wire Rods (쾌삭 303계 스테인리스강 소형 압연 선재 제조 공정의 생산품질 예측 모형)

  • Seo, Seokjun;Kim, Heungseob
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.4
    • /
    • pp.12-22
    • /
    • 2021
  • This article suggests the machine learning model, i.e., classifier, for predicting the production quality of free-machining 303-series stainless steel(STS303) small rolling wire rods according to the operating condition of the manufacturing process. For the development of the classifier, manufacturing data for 37 operating variables were collected from the manufacturing execution system(MES) of Company S, and the 12 types of derived variables were generated based on literature review and interviews with field experts. This research was performed with data preprocessing, exploratory data analysis, feature selection, machine learning modeling, and the evaluation of alternative models. In the preprocessing stage, missing values and outliers are removed, and oversampling using SMOTE(Synthetic oversampling technique) to resolve data imbalance. Features are selected by variable importance of LASSO(Least absolute shrinkage and selection operator) regression, extreme gradient boosting(XGBoost), and random forest models. Finally, logistic regression, support vector machine(SVM), random forest, and XGBoost are developed as a classifier to predict the adequate or defective products with new operating conditions. The optimal hyper-parameters for each model are investigated by the grid search and random search methods based on k-fold cross-validation. As a result of the experiment, XGBoost showed relatively high predictive performance compared to other models with an accuracy of 0.9929, specificity of 0.9372, F1-score of 0.9963, and logarithmic loss of 0.0209. The classifier developed in this study is expected to improve productivity by enabling effective management of the manufacturing process for the STS303 small rolling wire rods.

Experimental Comparison of Network Intrusion Detection Models Solving Imbalanced Data Problem (데이터의 불균형성을 제거한 네트워크 침입 탐지 모델 비교 분석)

  • Lee, Jong-Hwa;Bang, Jiwon;Kim, Jong-Wouk;Choi, Mi-Jung
    • KNOM Review
    • /
    • v.23 no.2
    • /
    • pp.18-28
    • /
    • 2020
  • With the development of the virtual community, the benefits that IT technology provides to people in fields such as healthcare, industry, communication, and culture are increasing, and the quality of life is also improving. Accordingly, there are various malicious attacks targeting the developed network environment. Firewalls and intrusion detection systems exist to detect these attacks in advance, but there is a limit to detecting malicious attacks that are evolving day by day. In order to solve this problem, intrusion detection research using machine learning is being actively conducted, but false positives and false negatives are occurring due to imbalance of the learning dataset. In this paper, a Random Oversampling method is used to solve the unbalance problem of the UNSW-NB15 dataset used for network intrusion detection. And through experiments, we compared and analyzed the accuracy, precision, recall, F1-score, training and prediction time, and hardware resource consumption of the models. Based on this study using the Random Oversampling method, we develop a more efficient network intrusion detection model study using other methods and high-performance models that can solve the unbalanced data problem.

Roles of Perceived Use Control consisting of Perceived Ease of Use and Perceived Controllability in IT acceptance (정보기술 수용에서 사용용이성과 통제가능성을 하위 차원으로 하는 지각된 사용통제의 역할)

  • Lee, Woong-Kyu
    • Asia pacific journal of information systems
    • /
    • v.18 no.2
    • /
    • pp.1-14
    • /
    • 2008
  • According to technology acceptance model(TAN) which is one of the most important research models for explaining IT users' behavior, on intention of using IT is determined by usefulness and ease of use of it. However, TAM wouldn't explain the performance of using IT while it has been considered as a very good model for prediction of the intention. Many people would not be confirmed in the performance of using IT until they can control it at their will, although they think it useful and easy to use. In other words, in addition to usefulness and ease of use as in TAM, controllability is also should be a factor to determine acceptance of IT. Especially, there is a very close relationship between controllability and ease of use, both of which explain the other sides of control over the performance of using IT, so called perceived behavioral control(PBC) in social psychology. The objective of this study is to identify the relationship between ease of use and controllability, and analyse the effects of both two beliefs over performance and intention in using IT. For this purpose, we review the issues related with PBC in information systems studies as well as social psychology, Based on a review of PBC, we suggest a research model which includes the relationship between control and performance in using IT, and prove its validity empirically. Since it was introduced as qa variable for explaining volitional control for actions in theory of planned behavior(TPB), there have been confusion about concept of PBC in spite of its important role in predicting so many kinds of actions. Some studies define PBC as self-efficacy that means actor's perception of difficulty or ease of actions, while others as controllability. However, this confusion dose not imply conceptual contradiction but a double-faced feature of PBC since the performance of actions is related with both self-efficacy and controllability. In other words, these two concepts are discriminated and correlated with each other. Therefore, PBC should be considered as a composite concept consisting of self-efficacy and controllability, Use of IT has been also one of important areas for predictions by PBC. Most of them have been studied by analysis of comparison in prediction power between TAM and TPB or modification of TAM by inclusion of PBC as another belief as like usefulness and ease of use. Interestingly, unlike the other applications in social psychology, it is hard to find such confusion in the concept of PBC in the studies for use of IT. In most of studies, controllability is adapted as PBC since the concept of self-efficacy is included in ease of use explicitly. Based on these discussions, we can suggest perceived use control(PUC) which is defined as perception of control over the performance of using IT and composed of controllability and ease of use as sub-concepts. We suggest a research model explaining acceptance of IT which includes the relationships of PUC with attitude and performance of using IT. For empirical test of our research model, two user groups are selected for surveying questionnaires. In the first group, there are freshmen who take a basic course for Microsoft Excel, and the second group consists of senior students who take a course for analysis of management information by Excel. Most of measurements are adapted ones that have been validated in the other studies, while performance is real score of mid-term in each class. In result, four hypotheses related with PUC are supported statistically with very low significance level. Main contribution of this study is suggestion of PUC through theoretical review of PBC. Specifically, a hierarchical model of PUC are derived from very rigorous studies in the relationship between self-efficacy and controllability with a view of PBC in social psychology. The relationship between PUC and performance is another main contribution.

Bayesian Network Analysis for the Dynamic Prediction of Financial Performance Using Corporate Social Responsibility Activities (베이지안 네트워크를 이용한 기업의 사회적 책임활동과 재무성과)

  • Sun, Eun-Jung
    • Management & Information Systems Review
    • /
    • v.34 no.5
    • /
    • pp.71-92
    • /
    • 2015
  • This study analyzes the impact of Corporate Social Responsibility (CSR) activities on financial performances using Bayesian Network. The research tries to overcome the issues of the uniform assumption of a linear function between financial performance and CSR activities in multiple regression analysis widely used in previous studies. It is required to infer a causal relationship between activities of CSR which have an impact on the financial performances. Identifying the relationship would empower the firms to improve their financial performance by informing the decision makers about the different CSR activities that influence the financial performance of the firms. This research proposes General Bayesian Network (GBN) and presents Markov Blanket induced from GBN. It is empirically demonstrated that all the proposals presented in this study are statistically significant by the results of the research conducted by Korean Economic Justice Institute (KEJI) under Citizen's Coalition for Economic Justice (CCEJ) which investigated approximately 200 companies in Korea based on Korean Economic Justice Institute Index (KEJI index) from 2005 to 2011. The Bayesian Network to effectively infer the properties affecting financial performances through the probabilistic causal relationship. Moreover, I found that there is a causal relationship among CSR activities variable; that is Environment protection is related to Customer protection, Employee satisfaction, and firm size; Soundness is related to Total CSR Evaluation Score, Debt-Assets Ratio. Though the what-if analysis, I suggest to the sensitive factor among the explanatory variables.

  • PDF

Identification of Characteristics and Risk Factors Associated with Mortality in Hydrops Fetalis (태아수종의 특성 및 사망률과 연관된 위험인자)

  • Ko, Hoon;Lee, Byong-Sop;Kim, Ki-Soo;Won, Hye-Sung;Lee, Pil-Ryang;Shim, Jae-Yoon;Kim, Ahm;Kim, Ai-Rhan
    • Neonatal Medicine
    • /
    • v.18 no.2
    • /
    • pp.221-227
    • /
    • 2011
  • Purpose: The objectives were to identify the characteristics of neonates with hydrops fetalis, and to identify the risk factors associated with mortality. Methods: A retrospective review of AMC (Asan Medical Center) dataset was performed from January 1990 to June 2009. The characteristics of 71 patients with hydrops fetalis were investigated and they were divided into two groups: the survived group and the expired group. Various perinatal and neonatal factors in two groups were compared to find out risk factors associated with mortality based on univariate analysis, followed by multiple regression analyses (SPSS version 18.0). Results: Of those 71 neonates (average gestational age: 33 weeks, birth weight: 2.6 kg), 38 survived, 33 died, resulting in overall mortality rate of 46.5%. The most common etiology was idiopathic followed by chylothorax, cardiac anomalies, twin-to-twin transfusion, meconium peritonitis, cardiac arrythmias, and congenital infections. Factors that were associated independently with mortality in logistic regression analyses were low 5-minutes Apgar score, hyaline membrane disease and delayed in achieving 50th percentile ideal body weight for appropriate gestational age by 10 days. Conclusion: In this study, 5-minutes Apgar score, hyaline membrane disease and delayed in achieving 50th percentile ideal body weight for appropriate gestational age by 10 days were significant risk factors associated with mortality in hydrops fetalis. Therefore, the risk of death among neonates with hydrops fetalis depends on the illness immediately after birth and severity of hydrops fetalis. Informations from this study may prove useful in prediction of prognosis to neonates with hydrops fetalis.

Predictive Clustering-based Collaborative Filtering Technique for Performance-Stability of Recommendation System (추천 시스템의 성능 안정성을 위한 예측적 군집화 기반 협업 필터링 기법)

  • Lee, O-Joun;You, Eun-Soon
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.119-142
    • /
    • 2015
  • With the explosive growth in the volume of information, Internet users are experiencing considerable difficulties in obtaining necessary information online. Against this backdrop, ever-greater importance is being placed on a recommender system that provides information catered to user preferences and tastes in an attempt to address issues associated with information overload. To this end, a number of techniques have been proposed, including content-based filtering (CBF), demographic filtering (DF) and collaborative filtering (CF). Among them, CBF and DF require external information and thus cannot be applied to a variety of domains. CF, on the other hand, is widely used since it is relatively free from the domain constraint. The CF technique is broadly classified into memory-based CF, model-based CF and hybrid CF. Model-based CF addresses the drawbacks of CF by considering the Bayesian model, clustering model or dependency network model. This filtering technique not only improves the sparsity and scalability issues but also boosts predictive performance. However, it involves expensive model-building and results in a tradeoff between performance and scalability. Such tradeoff is attributed to reduced coverage, which is a type of sparsity issues. In addition, expensive model-building may lead to performance instability since changes in the domain environment cannot be immediately incorporated into the model due to high costs involved. Cumulative changes in the domain environment that have failed to be reflected eventually undermine system performance. This study incorporates the Markov model of transition probabilities and the concept of fuzzy clustering with CBCF to propose predictive clustering-based CF (PCCF) that solves the issues of reduced coverage and of unstable performance. The method improves performance instability by tracking the changes in user preferences and bridging the gap between the static model and dynamic users. Furthermore, the issue of reduced coverage also improves by expanding the coverage based on transition probabilities and clustering probabilities. The proposed method consists of four processes. First, user preferences are normalized in preference clustering. Second, changes in user preferences are detected from review score entries during preference transition detection. Third, user propensities are normalized using patterns of changes (propensities) in user preferences in propensity clustering. Lastly, the preference prediction model is developed to predict user preferences for items during preference prediction. The proposed method has been validated by testing the robustness of performance instability and scalability-performance tradeoff. The initial test compared and analyzed the performance of individual recommender systems each enabled by IBCF, CBCF, ICFEC and PCCF under an environment where data sparsity had been minimized. The following test adjusted the optimal number of clusters in CBCF, ICFEC and PCCF for a comparative analysis of subsequent changes in the system performance. The test results revealed that the suggested method produced insignificant improvement in performance in comparison with the existing techniques. In addition, it failed to achieve significant improvement in the standard deviation that indicates the degree of data fluctuation. Notwithstanding, it resulted in marked improvement over the existing techniques in terms of range that indicates the level of performance fluctuation. The level of performance fluctuation before and after the model generation improved by 51.31% in the initial test. Then in the following test, there has been 36.05% improvement in the level of performance fluctuation driven by the changes in the number of clusters. This signifies that the proposed method, despite the slight performance improvement, clearly offers better performance stability compared to the existing techniques. Further research on this study will be directed toward enhancing the recommendation performance that failed to demonstrate significant improvement over the existing techniques. The future research will consider the introduction of a high-dimensional parameter-free clustering algorithm or deep learning-based model in order to improve performance in recommendations.