• Title/Summary/Keyword: Training Sample

Search Result 698, Processing Time 0.029 seconds

Product Recommender Systems using Multi-Model Ensemble Techniques (다중모형조합기법을 이용한 상품추천시스템)

  • Lee, Yeonjeong;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.39-54
    • /
    • 2013
  • Recent explosive increase of electronic commerce provides many advantageous purchase opportunities to customers. In this situation, customers who do not have enough knowledge about their purchases, may accept product recommendations. Product recommender systems automatically reflect user's preference and provide recommendation list to the users. Thus, product recommender system in online shopping store has been known as one of the most popular tools for one-to-one marketing. However, recommender systems which do not properly reflect user's preference cause user's disappointment and waste of time. In this study, we propose a novel recommender system which uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user's preference. The research data is collected from the real-world online shopping store, which deals products from famous art galleries and museums in Korea. The data initially contain 5759 transaction data, but finally remain 3167 transaction data after deletion of null data. In this study, we transform the categorical variables into dummy variables and exclude outlier data. The proposed model consists of two steps. The first step predicts customers who have high likelihood to purchase products in the online shopping store. In this step, we first use logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. We perform above data mining techniques using SAS E-Miner software. In this study, we partition datasets into two sets as modeling and validation sets for the logistic regression and decision trees. We also partition datasets into three sets as training, test, and validation sets for the artificial neural network model. The validation dataset is equal for the all experiments. Then we composite the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. Bagging is the abbreviation of "Bootstrap Aggregation" and it composite outputs from several machine learning techniques for raising the performance and stability of prediction or classification. This technique is special form of the averaging method. Bumping is the abbreviation of "Bootstrap Umbrella of Model Parameter," and it only considers the model which has the lowest error value. The results show that bumping outperforms bagging and the other predictors except for "Poster" product group. For the "Poster" product group, artificial neural network model performs better than the other models. In the second step, we use the market basket analysis to extract association rules for co-purchased products. We can extract thirty one association rules according to values of Lift, Support, and Confidence measure. We set the minimum transaction frequency to support associations as 5%, maximum number of items in an association as 4, and minimum confidence for rule generation as 10%. This study also excludes the extracted association rules below 1 of lift value. We finally get fifteen association rules by excluding duplicate rules. Among the fifteen association rules, eleven rules contain association between products in "Office Supplies" product group, one rules include the association between "Office Supplies" and "Fashion" product groups, and other three rules contain association between "Office Supplies" and "Home Decoration" product groups. Finally, the proposed product recommender systems provides list of recommendations to the proper customers. We test the usability of the proposed system by using prototype and real-world transaction and profile data. For this end, we construct the prototype system by using the ASP, Java Script and Microsoft Access. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The participants for the survey are 173 persons who use MSN Messenger, Daum Caf$\acute{e}$, and P2P services. We evaluate the user satisfaction using five-scale Likert measure. This study also performs "Paired Sample T-test" for the results of the survey. The results show that the proposed model outperforms the random selection model with 1% statistical significance level. It means that the users satisfied the recommended product list significantly. The results also show that the proposed system may be useful in real-world online shopping store.

Key Methodologies to Effective Site-specific Accessment in Contaminated Soils : A Review (오염토양의 효과적 현장조사에 대한 주요 방법론의 검토)

  • Chung, Doug-Young
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.32 no.4
    • /
    • pp.383-397
    • /
    • 1999
  • For sites to be investigated, the results of such an investigation can be used in determining foals for cleanup, quantifying risks, determining acceptable and unacceptable risk, and developing cleanup plans t hat do not cause unnecessary delays in the redevelopment and reuse of the property. To do this, it is essential that an appropriately detailed study of the site be performed to identify the cause, nature, and extent of contamination and the possible threats to the environment or to any people living or working nearby through the analysis of samples of soil and soil gas, groundwater, surface water, and sediment. The migration pathways of contaminants also are examined during this phase. Key aspects of cost-effective site assessment to help standardize and accelerate the evaluation of contaminated soils at sites are to provide a simple step-by-step methodology for environmental science/engineering professionals to calculate risk-based, site-specific soil levels for contaminants in soil. Its use may significantly reduce the time it takes to complete soil investigations and cleanup actions at some sites, as well as improve the consistency of these actions across the nation. To achieve the effective site assessment, it requires the criteria for choosing the type of standard and setting the magnitude of the standard come from different sources, depending on many factors including the nature of the contamination. A general scheme for site-specific assessment consists of sequential Phase I, II, and III, which is defined by workplan and soil screening levels. Phase I are conducted to identify and confirm a site's recognized environmental conditions resulting from past actions. If a Phase 1 identifies potential hazardous substances, a Phase II is usually conducted to confirm the absence, or presence and extent, of contamination. Phase II involve the collection and analysis of samples. And Phase III is to remediate the contaminated soils determined by Phase I and Phase II. However, important factors in determining whether a assessment standard is site-specific and suitable are (1) the spatial extent of the sampling and the size of the sample area; (2) the number of samples taken: (3) the strategy of taking samples: and (4) the way the data are analyzed. Although selected methods are recommended, application of quantitative methods is directed by users having prior training or experience for the dynamic site investigation process.

  • PDF

Survey of Current Status of Casting Industry in Korea (국내 주조산업 현황조사)

  • Cho, Minsu;Lee, Jisuk;Lee, Sanghwan;Lee, Sangmok
    • Journal of Korea Foundry Society
    • /
    • v.41 no.2
    • /
    • pp.144-152
    • /
    • 2021
  • Based on the analysis of the current state of the world's foundry industry, we looked at the international competitiveness of Korea's foundry industry for the past 20 years. Korea's total foundry production is 2.52 million tons, and the production per company (so-called productivity) is 2,831 tons, which is the eighth largest in the world and down one position for the case of total foundry production, while productivity remains its position compared to three years ago. Korea is the only one of the top 10 foundry to see a decline in production. Similar to the global situation, Korean products consist of 38% of grey csat iron, 31% of ductile cast iron, 15% of aluminum, and 9% of cast steel. In order to obtain statistics on Korea's foundry industry, the survey conducted a service project for approximately nine months from April 2020. Various statistical surveys and sample in-depth surveys by the Korean standard industry class were evaluated for various contents of the domestic casting industry. We also looked at the number of companies, the distribution by region, the number of workers and the percentage of foreigners, and the distribution of each job, as well as the R&D investment status according to the size of the enterprise. Together, sales, exports, sales and various profit ratios were analyzed to measure the earning power of foundry industry. In addition, the classification by grouping the foundry industry according to the process utilized by focusing on each company, and to determine the sales, exports, and yield status for each process was also investigated on the basis. Based on these data, the domestic foundry industry has presented a variety of offers for the following issues for sustainable growth; global ranking, marginal corporate restructuring, training of domestic technical people, differentiated support policies by company size and process.

The Relationship between Using Both Hands Keyboard Input and Hand Function Among the Lifestyles of University Student (대학생의 라이프스타일 중 양손사용 스마트폰 자판 입력과 손 기능과의 관계)

  • Bae, Seong-Hwan;Kang, Woo-Jin;Kim, Na-Yeong;Kim, Ji-Hyeon;Jo, June-Hyeok;Baek, Ji-Young
    • Journal of Korea Entertainment Industry Association
    • /
    • v.15 no.1
    • /
    • pp.221-228
    • /
    • 2021
  • This study aims to provide basic data for developing hand functional training programs using a keyboard to see if there is a relationship between the smart keyboard input speed using both hands, the Dexterity of the hand and the eye-hand coordination ability. The smartphone keyboard input speed, Purdue Pegboard, Grooved Pegboard Test, and Korean-Developmental-Test of Visual Perception-Adolescent were evaluated for 40 university students Province. An independent sample t-test and one-way ANOVA were conducted to identify differences in smartphone keyboard input speed, dexterity, eye-hand coordination ability and visual-motion using both hands according to the general characteristics of the subjects. Pearson correlation was also conducted to find out the relationship between hand-used smartphone keyboard input speed, hand dexterity, eye-hand coordination ability and visual-motor. As a result, the smartphone keyboard input speed using both hands showed a correlation with the dominant hand in the Purdue Pegboard Test (r=-.313, p<.05). In addition, the input speed of the smartphone keyboard is Copying(r=-.333, p<.05), Visual Motor Search(r=.455, p<.01), Visual Motor speed(r=-.453, p<.01) and Form Constancy (r=-.341, p<.05) in the item of K-DTVP-A. Therefore, it is believed that it will be helpful in the development of a treatment program using a smartphone, and it is expected that the effectiveness of a treatment program using a smartphone will be proven through additional experimental studies in the future.

Effect of K University Dental Hygiene Department students' participation in overseas clinical practice on satisfaction with practice, major, and intention to work abroad (K 대학교 치위생학과 학생의 해외임상실습참여가 임상실습만족도, 전공만족도 및 해외취업의사에 미치는 영향)

  • Min-Sun Lee;Ma I Choi
    • Journal of Korean Dental Hygiene Science
    • /
    • v.6 no.2
    • /
    • pp.151-160
    • /
    • 2023
  • Background: This study analyzed the differences in practice satisfaction, major, and willingness to work abroad among dental hygiene department students at K University in Gangwon-do based on their participation in international practicums. Methods: A survey was conducted on a total of 215 students through convenience sampling of dental hygiene students, and the final 214 responses were analyzed. General characteristics were examined. Major satisfaction and grades were measured on a 5-point Likert scale. And satisfaction with practice, intention to participate in international practicums, and employment were investigated by papers. Descriptive statistical analysis was performed on general characteristics using SPSS software (version 26.0). Due to the convenience sample, nonparametric analysis was used to determine satisfaction with practice and major according to general characteristics using the Mann-whitney U and Kruskal-wallis test. An independent samples t-test was conducted to determine the difference between practice satisfaction and major satisfaction depending on participation in international practicums, and Fisher's exact test was conducted to determine practice satisfaction, willingness to participate in overseas internships, and employment. Results: Concerning future participation in international clinical practicums, 66.7% of students who had previously participated in overseas training expressed willingness to engage again. 40.9% of those who had not participated showed no intention of participating, and there was a significant difference (p<0.05). Additionally, 76.2% of the participants expressed interest in overseas employment, with this difference also being statistically significant (p<0.05). Conclusion: It was confirmed that students' satisfaction with practice and major increased through participation in international practicums, and that they had a positive intention to work abroad and participate in overseas internship programs in the future.

Accelerometer-based Gesture Recognition for Robot Interface (로봇 인터페이스 활용을 위한 가속도 센서 기반 제스처 인식)

  • Jang, Min-Su;Cho, Yong-Suk;Kim, Jae-Hong;Sohn, Joo-Chan
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.53-69
    • /
    • 2011
  • Vision and voice-based technologies are commonly utilized for human-robot interaction. But it is widely recognized that the performance of vision and voice-based interaction systems is deteriorated by a large margin in the real-world situations due to environmental and user variances. Human users need to be very cooperative to get reasonable performance, which significantly limits the usability of the vision and voice-based human-robot interaction technologies. As a result, touch screens are still the major medium of human-robot interaction for the real-world applications. To empower the usability of robots for various services, alternative interaction technologies should be developed to complement the problems of vision and voice-based technologies. In this paper, we propose the use of accelerometer-based gesture interface as one of the alternative technologies, because accelerometers are effective in detecting the movements of human body, while their performance is not limited by environmental contexts such as lighting conditions or camera's field-of-view. Moreover, accelerometers are widely available nowadays in many mobile devices. We tackle the problem of classifying acceleration signal patterns of 26 English alphabets, which is one of the essential repertoires for the realization of education services based on robots. Recognizing 26 English handwriting patterns based on accelerometers is a very difficult task to take over because of its large scale of pattern classes and the complexity of each pattern. The most difficult problem that has been undertaken which is similar to our problem was recognizing acceleration signal patterns of 10 handwritten digits. Most previous studies dealt with pattern sets of 8~10 simple and easily distinguishable gestures that are useful for controlling home appliances, computer applications, robots etc. Good features are essential for the success of pattern recognition. To promote the discriminative power upon complex English alphabet patterns, we extracted 'motion trajectories' out of input acceleration signal and used them as the main feature. Investigative experiments showed that classifiers based on trajectory performed 3%~5% better than those with raw features e.g. acceleration signal itself or statistical figures. To minimize the distortion of trajectories, we applied a simple but effective set of smoothing filters and band-pass filters. It is well known that acceleration patterns for the same gesture is very different among different performers. To tackle the problem, online incremental learning is applied for our system to make it adaptive to the users' distinctive motion properties. Our system is based on instance-based learning (IBL) where each training sample is memorized as a reference pattern. Brute-force incremental learning in IBL continuously accumulates reference patterns, which is a problem because it not only slows down the classification but also downgrades the recall performance. Regarding the latter phenomenon, we observed a tendency that as the number of reference patterns grows, some reference patterns contribute more to the false positive classification. Thus, we devised an algorithm for optimizing the reference pattern set based on the positive and negative contribution of each reference pattern. The algorithm is performed periodically to remove reference patterns that have a very low positive contribution or a high negative contribution. Experiments were performed on 6500 gesture patterns collected from 50 adults of 30~50 years old. Each alphabet was performed 5 times per participant using $Nintendo{(R)}$ $Wii^{TM}$ remote. Acceleration signal was sampled in 100hz on 3 axes. Mean recall rate for all the alphabets was 95.48%. Some alphabets recorded very low recall rate and exhibited very high pairwise confusion rate. Major confusion pairs are D(88%) and P(74%), I(81%) and U(75%), N(88%) and W(100%). Though W was recalled perfectly, it contributed much to the false positive classification of N. By comparison with major previous results from VTT (96% for 8 control gestures), CMU (97% for 10 control gestures) and Samsung Electronics(97% for 10 digits and a control gesture), we could find that the performance of our system is superior regarding the number of pattern classes and the complexity of patterns. Using our gesture interaction system, we conducted 2 case studies of robot-based edutainment services. The services were implemented on various robot platforms and mobile devices including $iPhone^{TM}$. The participating children exhibited improved concentration and active reaction on the service with our gesture interface. To prove the effectiveness of our gesture interface, a test was taken by the children after experiencing an English teaching service. The test result showed that those who played with the gesture interface-based robot content marked 10% better score than those with conventional teaching. We conclude that the accelerometer-based gesture interface is a promising technology for flourishing real-world robot-based services and content by complementing the limits of today's conventional interfaces e.g. touch screen, vision and voice.

Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating (유전자 알고리즘을 이용한 다분류 SVM의 최적화: 기업신용등급 예측에의 응용)

  • Ahn, Hyunchul
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.161-177
    • /
    • 2014
  • Corporate credit rating assessment consists of complicated processes in which various factors describing a company are taken into consideration. Such assessment is known to be very expensive since domain experts should be employed to assess the ratings. As a result, the data-driven corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has received considerable attention from researchers and practitioners. In particular, statistical methods such as multiple discriminant analysis (MDA) and multinomial logistic regression analysis (MLOGIT), and AI methods including case-based reasoning (CBR), artificial neural network (ANN), and multiclass support vector machine (MSVM) have been applied to corporate credit rating.2) Among them, MSVM has recently become popular because of its robustness and high prediction accuracy. In this study, we propose a novel optimized MSVM model, and appy it to corporate credit rating prediction in order to enhance the accuracy. Our model, named 'GAMSVM (Genetic Algorithm-optimized Multiclass Support Vector Machine),' is designed to simultaneously optimize the kernel parameters and the feature subset selection. Prior studies like Lorena and de Carvalho (2008), and Chatterjee (2013) show that proper kernel parameters may improve the performance of MSVMs. Also, the results from the studies such as Shieh and Yang (2008) and Chatterjee (2013) imply that appropriate feature selection may lead to higher prediction accuracy. Based on these prior studies, we propose to apply GAMSVM to corporate credit rating prediction. As a tool for optimizing the kernel parameters and the feature subset selection, we suggest genetic algorithm (GA). GA is known as an efficient and effective search method that attempts to simulate the biological evolution phenomenon. By applying genetic operations such as selection, crossover, and mutation, it is designed to gradually improve the search results. Especially, mutation operator prevents GA from falling into the local optima, thus we can find the globally optimal or near-optimal solution using it. GA has popularly been applied to search optimal parameters or feature subset selections of AI techniques including MSVM. With these reasons, we also adopt GA as an optimization tool. To empirically validate the usefulness of GAMSVM, we applied it to a real-world case of credit rating in Korea. Our application is in bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. The experimental dataset was collected from a large credit rating company in South Korea. It contained 39 financial ratios of 1,295 companies in the manufacturing industry, and their credit ratings. Using various statistical methods including the one-way ANOVA and the stepwise MDA, we selected 14 financial ratios as the candidate independent variables. The dependent variable, i.e. credit rating, was labeled as four classes: 1(A1); 2(A2); 3(A3); 4(B and C). 80 percent of total data for each class was used for training, and remaining 20 percent was used for validation. And, to overcome small sample size, we applied five-fold cross validation to our dataset. In order to examine the competitiveness of the proposed model, we also experimented several comparative models including MDA, MLOGIT, CBR, ANN and MSVM. In case of MSVM, we adopted One-Against-One (OAO) and DAGSVM (Directed Acyclic Graph SVM) approaches because they are known to be the most accurate approaches among various MSVM approaches. GAMSVM was implemented using LIBSVM-an open-source software, and Evolver 5.5-a commercial software enables GA. Other comparative models were experimented using various statistical and AI packages such as SPSS for Windows, Neuroshell, and Microsoft Excel VBA (Visual Basic for Applications). Experimental results showed that the proposed model-GAMSVM-outperformed all the competitive models. In addition, the model was found to use less independent variables, but to show higher accuracy. In our experiments, five variables such as X7 (total debt), X9 (sales per employee), X13 (years after founded), X15 (accumulated earning to total asset), and X39 (the index related to the cash flows from operating activity) were found to be the most important factors in predicting the corporate credit ratings. However, the values of the finally selected kernel parameters were found to be almost same among the data subsets. To examine whether the predictive performance of GAMSVM was significantly greater than those of other models, we used the McNemar test. As a result, we found that GAMSVM was better than MDA, MLOGIT, CBR, and ANN at the 1% significance level, and better than OAO and DAGSVM at the 5% significance level.

Ensemble of Nested Dichotomies for Activity Recognition Using Accelerometer Data on Smartphone (Ensemble of Nested Dichotomies 기법을 이용한 스마트폰 가속도 센서 데이터 기반의 동작 인지)

  • Ha, Eu Tteum;Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.123-132
    • /
    • 2013
  • As the smartphones are equipped with various sensors such as the accelerometer, GPS, gravity sensor, gyros, ambient light sensor, proximity sensor, and so on, there have been many research works on making use of these sensors to create valuable applications. Human activity recognition is one such application that is motivated by various welfare applications such as the support for the elderly, measurement of calorie consumption, analysis of lifestyles, analysis of exercise patterns, and so on. One of the challenges faced when using the smartphone sensors for activity recognition is that the number of sensors used should be minimized to save the battery power. When the number of sensors used are restricted, it is difficult to realize a highly accurate activity recognizer or a classifier because it is hard to distinguish between subtly different activities relying on only limited information. The difficulty gets especially severe when the number of different activity classes to be distinguished is very large. In this paper, we show that a fairly accurate classifier can be built that can distinguish ten different activities by using only a single sensor data, i.e., the smartphone accelerometer data. The approach that we take to dealing with this ten-class problem is to use the ensemble of nested dichotomy (END) method that transforms a multi-class problem into multiple two-class problems. END builds a committee of binary classifiers in a nested fashion using a binary tree. At the root of the binary tree, the set of all the classes are split into two subsets of classes by using a binary classifier. At a child node of the tree, a subset of classes is again split into two smaller subsets by using another binary classifier. Continuing in this way, we can obtain a binary tree where each leaf node contains a single class. This binary tree can be viewed as a nested dichotomy that can make multi-class predictions. Depending on how a set of classes are split into two subsets at each node, the final tree that we obtain can be different. Since there can be some classes that are correlated, a particular tree may perform better than the others. However, we can hardly identify the best tree without deep domain knowledge. The END method copes with this problem by building multiple dichotomy trees randomly during learning, and then combining the predictions made by each tree during classification. The END method is generally known to perform well even when the base learner is unable to model complex decision boundaries As the base classifier at each node of the dichotomy, we have used another ensemble classifier called the random forest. A random forest is built by repeatedly generating a decision tree each time with a different random subset of features using a bootstrap sample. By combining bagging with random feature subset selection, a random forest enjoys the advantage of having more diverse ensemble members than a simple bagging. As an overall result, our ensemble of nested dichotomy can actually be seen as a committee of committees of decision trees that can deal with a multi-class problem with high accuracy. The ten classes of activities that we distinguish in this paper are 'Sitting', 'Standing', 'Walking', 'Running', 'Walking Uphill', 'Walking Downhill', 'Running Uphill', 'Running Downhill', 'Falling', and 'Hobbling'. The features used for classifying these activities include not only the magnitude of acceleration vector at each time point but also the maximum, the minimum, and the standard deviation of vector magnitude within a time window of the last 2 seconds, etc. For experiments to compare the performance of END with those of other methods, the accelerometer data has been collected at every 0.1 second for 2 minutes for each activity from 5 volunteers. Among these 5,900 ($=5{\times}(60{\times}2-2)/0.1$) data collected for each activity (the data for the first 2 seconds are trashed because they do not have time window data), 4,700 have been used for training and the rest for testing. Although 'Walking Uphill' is often confused with some other similar activities, END has been found to classify all of the ten activities with a fairly high accuracy of 98.4%. On the other hand, the accuracies achieved by a decision tree, a k-nearest neighbor, and a one-versus-rest support vector machine have been observed as 97.6%, 96.5%, and 97.6%, respectively.