Application of Support Vector Regression for Improving the Performance of the Emotion Prediction Model (감정예측모형의 성과개선을 위한 Support Vector Regression 응용)
-
- Journal of Intelligence and Information Systems
- /
- v.18 no.3
- /
- pp.185-202
- /
- 2012
.Since the value of information has been realized in the information society, the usage and collection of information has become important. A facial expression that contains thousands of information as an artistic painting can be described in thousands of words. Followed by the idea, there has recently been a number of attempts to provide customers and companies with an intelligent service, which enables the perception of human emotions through one's facial expressions. For example, MIT Media Lab, the leading organization in this research area, has developed the human emotion prediction model, and has applied their studies to the commercial business. In the academic area, a number of the conventional methods such as Multiple Regression Analysis (MRA) or Artificial Neural Networks (ANN) have been applied to predict human emotion in prior studies. However, MRA is generally criticized because of its low prediction accuracy. This is inevitable since MRA can only explain the linear relationship between the dependent variables and the independent variable. To mitigate the limitations of MRA, some studies like Jung and Kim (2012) have used ANN as the alternative, and they reported that ANN generated more accurate prediction than the statistical methods like MRA. However, it has also been criticized due to over fitting and the difficulty of the network design (e.g. setting the number of the layers and the number of the nodes in the hidden layers). Under this background, we propose a novel model using Support Vector Regression (SVR) in order to increase the prediction accuracy. SVR is an extensive version of Support Vector Machine (SVM) designated to solve the regression problems. The model produced by SVR only depends on a subset of the training data, because the cost function for building the model ignores any training data that is close (within a threshold
In line with the trend of industrial innovation, IoT technology utilized in a variety of fields is emerging as a key element in creation of new business models and the provision of user-friendly services through the combination of big data. The accumulated data from devices with the Internet-of-Things (IoT) is being used in many ways to build a convenience-based smart system as it can provide customized intelligent systems through user environment and pattern analysis. Recently, it has been applied to innovation in the public domain and has been using it for smart city and smart transportation, such as solving traffic and crime problems using CCTV. In particular, it is necessary to comprehensively consider the easiness of securing real-time service data and the stability of security when planning underground services or establishing movement amount control information system to enhance citizens' or commuters' convenience in circumstances with the congestion of public transportation such as subways, urban railways, etc. However, previous studies that utilize image data have limitations in reducing the performance of object detection under private issue and abnormal conditions. The IoT device-based sensor data used in this study is free from private issue because it does not require identification for individuals, and can be effectively utilized to build intelligent public services for unspecified people. Especially, sensor data stored by the IoT device need not be identified to an individual, and can be effectively utilized for constructing intelligent public services for many and unspecified people as data free form private issue. We utilize the IoT-based infrared sensor devices for an intelligent pedestrian tracking system in metro service which many people use on a daily basis and temperature data measured by sensors are therein transmitted in real time. The experimental environment for collecting data detected in real time from sensors was established for the equally-spaced midpoints of 4×4 upper parts in the ceiling of subway entrances where the actual movement amount of passengers is high, and it measured the temperature change for objects entering and leaving the detection spots. The measured data have gone through a preprocessing in which the reference values for 16 different areas are set and the difference values between the temperatures in 16 distinct areas and their reference values per unit of time are calculated. This corresponds to the methodology that maximizes movement within the detection area. In addition, the size of the data was increased by 10 times in order to more sensitively reflect the difference in temperature by area. For example, if the temperature data collected from the sensor at a given time were 28.5℃, the data analysis was conducted by changing the value to 285. As above, the data collected from sensors have the characteristics of time series data and image data with 4×4 resolution. Reflecting the characteristics of the measured, preprocessed data, we finally propose a hybrid algorithm that combines CNN in superior performance for image classification and LSTM, especially suitable for analyzing time series data, as referred to CNN-LSTM (Convolutional Neural Network-Long Short Term Memory). In the study, the CNN-LSTM algorithm is used to predict the number of passing persons in one of 4×4 detection areas. We verified the validation of the proposed model by taking performance comparison with other artificial intelligence algorithms such as Multi-Layer Perceptron (MLP), Long Short Term Memory (LSTM) and RNN-LSTM (Recurrent Neural Network-Long Short Term Memory). As a result of the experiment, proposed CNN-LSTM hybrid model compared to MLP, LSTM and RNN-LSTM has the best predictive performance. By utilizing the proposed devices and models, it is expected various metro services will be provided with no illegal issue about the personal information such as real-time monitoring of public transport facilities and emergency situation response services on the basis of congestion. However, the data have been collected by selecting one side of the entrances as the subject of analysis, and the data collected for a short period of time have been applied to the prediction. There exists the limitation that the verification of application in other environments needs to be carried out. In the future, it is expected that more reliability will be provided for the proposed model if experimental data is sufficiently collected in various environments or if learning data is further configured by measuring data in other sensors.
Soybeans (Glycine max), one of major upland crops, require precise management of environmental conditions, such as temperature, water, and soil, during cultivation since they are sensitive to environmental changes. Application of spectral technologies that measure the physiological state of crops remotely has great potential for improving quality and productivity of the soybean by estimating yields, physiological stresses, and diseases. In this study, we developed and validated a soybean growth prediction model using multispectral imagery. We conducted a linear regression analysis between vegetation indices and soybean growth data (fresh weight and LAI) obtained at Miryang fields. The linear regression model was validated at Goesan fields. It was found that the model based on green ratio vegetation index (GRVI) had the greatest performance in prediction of fresh weight at the calibration stage (R2=0.74, RMSE=246 g/m2, RE=34.2%). In the validation stage, RMSE and RE of the model were 392 g/m2 and 32%, respectively. The errors of the model differed by cropping system, For example, RMSE and RE of model in single crop fields were 315 g/m2 and 26%, respectively. On the other hand, the model had greater values of RMSE (381 g/m2) and RE (31%) in double crop fields. As a result of developing models for predicting a fresh weight into two years (2018+2020) with similar accumulated temperature (AT) in three years and a single year (2019) that was different from that AT, the prediction performance of a single year model was better than a two years model. Consequently, compared with those models divided by AT and a three years model, RMSE of a single crop fields were improved by about 29.1%. However, those of double crop fields decreased by about 19.6%. When environmental factors are used along with, spectral data, the reliability of soybean growth prediction can be achieved various environmental conditions.
Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.
Field experiments were conducted in the 101 tobacco fields(51 fields in 1985 and 50 fields in 1986) of chief tobacco producing counties of Chungbuk province(Jincheon, Eumseong, Goesan, and Joongweon counties), Chungnam province(Cheonweon county), and Kyongbuk province (Cheongdo, Seongju, and Andong counties) for two years from 1985 to 1986 in order to evaluate soil fertility using chemical properties and soil map database. Pot experiments also on the same soils were conducted and the results were compared to those of field experiments. The yield of tobacco in the plots of no fertilization was considered as a basic factor representing the soil fertility and was evaluated by nineteen independent variables, that was 9 chemical properties and 10 soil map databases. These independent variables were classified into two groups, 11 quantitative indexes and 9 qualitative indexes, and were analyzed by multiple linear regression(MLR) of SAS by REG and GLM models. The yield of tobacco in the plot of no fertilization showed high variations, e.g. the difference between minimum and maximum yields was about 5.0-5.5 times in the pot experiment and 8.2-14.9 times in the field experiment. The indexes indicating close link between yield of tobacco and soil chemical indexes, was selected but it was not well matched by the years or between pot and field experiments. Also, the standardized partial regression coefficients of quantitative indexes for the yield of field were less than 1.0, suggesting that it is difficult to develop an available single index for the evaluation of soil fertility. Evaluation for the soil fertility of field by MLR was better than that of single regression and it was gradually improved by adding chemical properties, quantitative indexes, and qualitative indexes of soil map. For example, the coefficient of determination (
This study was aimed at investigating the health seeking behaviors of patients; For the purpose of analyzing the research theme we classified the study into two phase. First, the types of patients' health seeking behavior were categorized into a scheme according to what medical care resources were utilized in patients' coping process. Second, from patients' first visits to third visits to medical resources, we analyzed variations of factors which noted as crucial elements in constituting the patients' sickness career. To grasp the generalized characteristics from complicated empirical data, we limited the scope of our analysis to third stage of health seeking. A total of 121 persons who had beer suffering from chronic diseases more than 3 months was sampled among the residents of Banwol-Eup, the target Area of Korea University Health Project. The findings are as follows ; 1) In the course of visiting medical care resources, 34 different types of health seeking Behavior were found. From this result we inferred the idea that patients in Banwol-Eup had not any stable norms to cope with their pains. Clinics, hospital, pharmacy, Herb-doctors', folkways (self-treatment) were accessed by patients in orders. But more than half of patients who had utilized clinics or hospitals from their first to third visits, changed medical care resources to others, for example herb doctors or folkways, which had fundamentally different treatment models. Upon these two facts, the diversified types and capricious patterns in the health seeking behavior of Banwol patients, we observed a typical Shopping-Around phenomenon. 2) Factors which influenced patients' to their sickness career were changed along the courses of health seeking, from first to third visits as follows ;
Agriculture is a primary industry that influenced by the weather or meterological factors more than other industry. Global warming and worldwide climate changes, and unusual weather phenomena are fatal in agricultural industry and human life. Therefore, many previous studies have been made to find the relationship between weather and the productivity of agriculture. Meterological factors also influence on the distribution of agricultural product. For example, price of agricultural product is determined in the market, and also influenced by the weather of the market. However, there is only a few study was made to find this link. The objective of this study is to investigate the effects of meterological factors on the distribution of agricultural products, focusing on the distribution of chinese cabbages. Chinese cabbage is a main ingredient of Kimchi, and basic essential vegetable in Korean dinner table. However, the production of chinese cabbages is influenced by weather and very fluctuating so that the variation of its price is so unstable. Therefore, both consumers and farmers do not feel comfortable at the unstable price of chinese cabbages. In this study, we analyze the real transaction data of chinese cabbage in wholesale markets and meterological factors depending on the variety and geography. We collect and analyze data of meterological factors such as temperatures, humidity, cloudiness, rainfall, snowfall, wind speed, insolation, sunshine duration in producing and consuming region of chinese cabbages. The result of this study shows that the meterological factors such as temperature and humidity significantly influence on the volume and price of chinese cabbage transaction in wholesale market. Especially, the weather of consuming region has greater correlation effects on transaction than that of producing region in all types of chinese cabbages. Among the whole agricultural lifecycle of chinese cabbages, 'seeding - harvest - shipment - wholesale', meterological factors such as temperature and rainfall in shipment and wholesale period are significantly correlated with transaction volume and price of crops. Based on the result of correlation analysis, we make a regression analysis to verify the meterological factors' effects on the volume and price of chines cabbage transaction in wholesale market. The results of stepwise regression analysis are shown in