• 제목/요약/키워드: K-fold cross validation

검색결과 150건 처리시간 0.024초

Analysis of Feature Variables for Breast Cancer Diagnosis

  • Jung, Yong Gyu;Kim, Jang Il;Sihn, Sung Chul;Heo, Jun
    • International journal of advanced smart convergence
    • /
    • 제2권2호
    • /
    • pp.36-39
    • /
    • 2013
  • It is becoming more important as the growing of health information and increasing in cancer patients diagnose over the time gradually. Among the various types of cancer, we focuses on breast cancer diagnosis. The accuracy of breast cancer diagnosis is increasing when the diagnosis is based on evidence and statistics. To do this we use the weka data mining tools and analysis algorithms significantly associated with the decision tree uses rules. In addition, the data pre-processing and cross-validation are used to increase the reliability of the results. The number and cause of the disease becomes important to increase evidence-based medical doctors. As the evidence-based medical, the data obtained from patients in the past through the disease by calculating the probability for future patients to diagnose and predict disease and treatment plan. It can be found by improving the survival rate plays an important role.

A Comparison Study of Runoff Projections for Yongdam Dam Watershed Using SWAT (SWAT모형을 이용한 용담댐 유역의 유량 전망 결과 비교 연구)

  • Jung, Cha Mi;Shin, Mun-Ju;Kim, Young-Oh
    • Journal of Korea Water Resources Association
    • /
    • 제48권6호
    • /
    • pp.439-449
    • /
    • 2015
  • In this study, reliable future runoff projections based on RCPs for Yongdam dam watershed was performed using SWAT model, which was validated by k-fold cross validation method, and investigated the factors that cause the differences with respect to runoff projections between this study and previous studies. As a result, annual average runoff compared to baseline runoff would increase 17.7% and 26.1% in 2040s and 2080s respectively under RCP8.5 scenario, and 21.9% and 44.6% in 2040s and 2080s respectively under RCP4.5 scenario. Comparing the results to previous studies, minimum and maximum differences between runoff projections over different studies were 10.3% and 53.2%, even though runoff was projected by the same rainfall-runoff model. SWAT model has 27 parameters and physically based complex structure, so it tends to make different results by the model users' setting. In the future, it is necessary to reduce the cause of difference to generate standard runoff scenarios.

Motion Recognition for Kinect Sensor Data Using Machine Learning Algorithm with PNF Patterns of Upper Extremities

  • Kim, Sangbin;Kim, Giwon;Kim, Junesun
    • The Journal of Korean Physical Therapy
    • /
    • 제27권4호
    • /
    • pp.214-220
    • /
    • 2015
  • Purpose: The purpose of this study was to investigate the availability of software for rehabilitation with the Kinect sensor by presenting an efficient algorithm based on machine learning when classifying the motion data of the PNF pattern if the subjects were wearing a patient gown. Methods: The motion data of the PNF pattern for upper extremities were collected by Kinect sensor. The data were obtained from 8 normal university students without the limitation of upper extremities. The subjects, wearing a T-shirt, performed the PNF patterns, D1 and D2 flexion, extensions, 30 times; the same protocol was repeated while wearing a patient gown to compare the classification performance of algorithms. For comparison of performance, we chose four algorithms, Naive Bayes Classifier, C4.5, Multilayer Perceptron, and Hidden Markov Model. The motion data for wearing a T-shirt were used for the training set, and 10 fold cross-validation test was performed. The motion data for wearing a gown were used for the test set. Results: The results showed that all of the algorithms performed well with 10 fold cross-validation test. However, when classifying the data with a hospital gown, Hidden Markov model (HMM) was the best algorithm for classifying the motion of PNF. Conclusion: We showed that HMM is the most efficient algorithm that could handle the sequence data related to time. Thus, we suggested that the algorithm which considered the sequence of motion, such as HMM, would be selected when developing software for rehabilitation which required determining the correctness of the motion.

Prediction of box office using data mining (데이터마이닝을 이용한 박스오피스 예측)

  • Jeon, Seonghyeon;Son, Young Sook
    • The Korean Journal of Applied Statistics
    • /
    • 제29권7호
    • /
    • pp.1257-1270
    • /
    • 2016
  • This study deals with the prediction of the total number of movie audiences as a measure for the box office. Prediction is performed by classification techniques of data mining such as decision tree, multilayer perceptron(MLP) neural network model, multinomial logit model, and support vector machine over time such as before movie release, release day, after release one week, and after release two weeks. Predictors used are: online word-of-mouth(OWOM) variables such as the portal movie rating, the number of the portal movie rater, and blog; in addition, other variables include showing the inherent properties of the film (such as nationality, grade, release month, release season, directors, actors, distributors, the number of audiences, and screens). When using 10-fold cross validation technique, the accuracy of the neural network model showed more than 90 % higher predictability before movie release. In addition, it can be seen that the accuracy of the prediction increases by adding estimates of the final OWOM variables as predictors.

Prediction of movie audience numbers using hybrid model combining GLS and Bass models (GLS와 Bass 모형을 결합한 하이브리드 모형을 이용한 영화 관객 수 예측)

  • Kim, Bokyung;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • 제31권4호
    • /
    • pp.447-461
    • /
    • 2018
  • Domestic film industry sales are increasing every year. Theaters are the primary sales channels for movies and the number of audiences using the theater affects additional selling rights. Therefore, the number of audiences using the theater is an important factor directly linked to movie industry sales. In this paper we consider a hybrid model that combines a multiple linear regression model and the Bass model to predict the audience numbers for a specific day. By combining the two models, the predictive value of the regression analysis was corrected to that of the Bass model. In the analysis, three films with different release dates were used. All subset regression method is used to generate all possible combinations and 5-fold cross validation to estimate the model 5 times. In this case, the predicted value is obtained from the model with the smallest root mean square error and then combined with the predicted value of the Bass model to obtain the final predicted value. With the existence of past data, it was confirmed that the weight of the Bass model increases and the compensation is added to the predicted value.

Follower classification system based on the similarity of Twitter node information (트위터 사용자정보의 유사성을 기반으로 한 팔로어 분류시스템)

  • Kye, Yong-Sun;Yoon, Youngmi
    • Journal of the Korea Society of Computer and Information
    • /
    • 제19권1호
    • /
    • pp.111-118
    • /
    • 2014
  • Current friend recommendation system on Twitter primarily recommends the most influential twitter. However, this way of recommendation has drawbacks where it does not recommend the users of which attributes of interests are similar to theirs. Since users want other users of which attributes are similar, this study implements follower recommendation system based on the similarity of twitter node informations. The data in this study is from SNAP(Stanford Network Analysis Platform), and it consists of twitter node information of which number of followers is over 10,000 and twitter link information. We used the SNAP data as a training data, and generated a classifier which recommends and predicts the relation between followers. We evaluated the classifier by 10-Fold Cross validation. Once two twitter node informations are given, our model can recommend the relationship of the two twitters as one of following such as: FoFo(Follower Follower), FoFr(Follower Friend), NC(Not Connected).

New Automatic Taxonomy Generation Algorithm for the Audio Genre Classification (음악 장르 분류를 위한 새로운 자동 Taxonomy 구축 알고리즘)

  • Choi, Tack-Sung;Moon, Sun-Kook;Park, Young-Cheol;Youn, Dae-Hee;Lee, Seok-Pil
    • The Journal of the Acoustical Society of Korea
    • /
    • 제27권3호
    • /
    • pp.111-118
    • /
    • 2008
  • In this paper, we propose a new automatic taxonomy generation algorithm for the audio genre classification. The proposed algorithm automatically generates hierarchical taxonomy based on the estimated classification accuracy at all possible nodes. The estimation of classification accuracy in the proposed algorithm is conducted by applying the training data to classifier using k-fold cross validation. Subsequent classification accuracy is then to be tested at every node which consists of two clusters by applying one-versus-one support vector machine. In order to assess the performance of the proposed algorithm, we extracted various features which represent characteristics such as timbre, rhythm, pitch and so on. Then, we investigated classification performance using the proposed algorithm and previous flat classifiers. The classification accuracy reaches to 89 percent with proposed scheme, which is 5 to 25 percent higher than the previous flat classification methods. Using low-dimensional feature vectors, in particular, it is 10 to 25 percent higher than previous algorithms for classification experiments.

Implementation on the evolutionary machine learning approaches for streamflow forecasting: case study in the Seybous River, Algeria (유출예측을 위한 진화적 기계학습 접근법의 구현: 알제리 세이보스 하천의 사례연구)

  • Zakhrouf, Mousaab;Bouchelkia, Hamid;Stamboul, Madani;Kim, Sungwon;Singh, Vijay P.
    • Journal of Korea Water Resources Association
    • /
    • 제53권6호
    • /
    • pp.395-408
    • /
    • 2020
  • This paper aims to develop and apply three different machine learning approaches (i.e., artificial neural networks (ANN), adaptive neuro-fuzzy inference systems (ANFIS), and wavelet-based neural networks (WNN)) combined with an evolutionary optimization algorithm and the k-fold cross validation for multi-step (days) streamflow forecasting at the catchment located in Algeria, North Africa. The ANN and ANFIS models yielded similar performances, based on four different statistical indices (i.e., root mean squared error (RMSE), Nash-Sutcliffe efficiency (NSE), correlation coefficient (R), and peak flow criteria (PFC)) for training and testing phases. The values of RMSE and PFC for the WNN model (e.g., RMSE = 8.590 ㎥/sec, PFC = 0.252 for (t+1) day, testing phase) were lower than those of ANN (e.g., RMSE = 19.120 ㎥/sec, PFC = 0.446 for (t+1) day, testing phase) and ANFIS (e.g., RMSE = 18.520 ㎥/sec, PFC = 0.444 for (t+1) day, testing phase) models, while the values of NSE and R for WNN model were higher than those of ANNs and ANFIS models. Therefore, the new approach can be a robust tool for multi-step (days) streamflow forecasting in the Seybous River, Algeria.

An Intelligent Gold Price Prediction Based on Automated Machine and k-fold Cross Validation Learning

  • Baguda, Yakubu S.;Al-Jahdali, Hani Meateg
    • International Journal of Computer Science & Network Security
    • /
    • 제21권4호
    • /
    • pp.65-74
    • /
    • 2021
  • The rapid change in gold price is an issue of concern in the global economy and financial markets. Gold has been used as a means for trading and transaction around the world for long period of time and it plays an integral role in monetary, business, commercial and financial activities. More importantly, it is used as economic measure for the global economy and will continue to play an important economic vital role - both locally and globally. There has been an explosive growth in demand for efficient and effective scheme to predict gold price due its volatility and fluctuation. Hence, there is need for the development of gold price prediction scheme to assist and support investors, marketers, and financial institutions in making effective economic and monetary decisions. This paper primarily proposed an intelligent based system for predicting and characterizing the gold market trend. The simulation result shows that the proposed intelligent gold price scheme has been able to predict the gold price with high accuracy and precision, and ultimately it has significantly reduced the prediction error when compared to baseline neural network (NN).

Prediction of Protein-Protein Interaction Sites Based on 3D Surface Patches Using SVM (SVM 모델을 이용한 3차원 패치 기반 단백질 상호작용 사이트 예측기법)

  • Park, Sung-Hee;Hansen, Bjorn
    • The KIPS Transactions:PartD
    • /
    • 제19D권1호
    • /
    • pp.21-28
    • /
    • 2012
  • Predication of protein interaction sites for monomer structures can reduce the search space for protein docking and has been regarded as very significant for predicting unknown functions of proteins from their interacting proteins whose functions are known. In the other hand, the prediction of interaction sites has been limited in crystallizing weakly interacting complexes which are transient and do not form the complexes stable enough for obtaining experimental structures by crystallization or even NMR for the most important protein-protein interactions. This work reports the calculation of 3D surface patches of complex structures and their properties and a machine learning approach to build a predictive model for the 3D surface patches in interaction and non-interaction sites using support vector machine. To overcome classification problems for class imbalanced data, we employed an under-sampling technique. 9 properties of the patches were calculated from amino acid compositions and secondary structure elements. With 10 fold cross validation, the predictive model built from SVM achieved an accuracy of 92.7% for classification of 3D patches in interaction and non-interaction sites from 147 complexes.