• Title/Summary/Keyword: business process performance

Search Result 1,300, Processing Time 0.028 seconds

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

SysML-Based System Modeling for Design of BIPV Electric Power Generation (건물일체형 태양광 시스템의 전력발전부 설계를 위한 SysML기반 시스템 모델링)

  • Lee, Seung-Joon;Lee, Jae-Chon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.10
    • /
    • pp.578-589
    • /
    • 2018
  • Building Integrated Photovoltaic (BIPV) system is a typical integrated system that simultaneously performs both building function and solar power generation function. To maximize its potential advantage, however, the solar photovoltaic power generation function must be integrated from the early conceptual design stage, and maximum power generation must be designed. To cope with such requirements, preliminary research on BIPV design process based on architectural design model and computer simulation results for improving solar power generation performance have been published. However, the requirements of the BIPV system have not been clearly identified and systematically reflected in the subsequent design. Moreover, no model has verified the power generation design. To solve these problems, we systematically model the requirements of BIPV system and study power generation design based on the system requirements model. Through the study, we consistently use the standard system modeling language, SysML. Specifically, stakeholder requirements were first identified from stakeholders and related BIPV standards. Then, based on the domain model, the design requirements of the BIPV system were derived at the system level, and the functional and physical architectures of the target system were created based on the system requirements. Finally, the power generation performance of the BIPV system was evaluated through a simulated SysML model (Parametric diagram). If the SysML system model developed herein can be reinforced by reflecting the conditions resulting from building design, it will open an opportunity to study and optimize the power generation in the BIPV system in an integrated fashion.

An Intelligent Intrusion Detection Model Based on Support Vector Machines and the Classification Threshold Optimization for Considering the Asymmetric Error Cost (비대칭 오류비용을 고려한 분류기준값 최적화와 SVM에 기반한 지능형 침입탐지모형)

  • Lee, Hyeon-Uk;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.157-173
    • /
    • 2011
  • As the Internet use explodes recently, the malicious attacks and hacking for a system connected to network occur frequently. This means the fatal damage can be caused by these intrusions in the government agency, public office, and company operating various systems. For such reasons, there are growing interests and demand about the intrusion detection systems (IDS)-the security systems for detecting, identifying and responding to unauthorized or abnormal activities appropriately. The intrusion detection models that have been applied in conventional IDS are generally designed by modeling the experts' implicit knowledge on the network intrusions or the hackers' abnormal behaviors. These kinds of intrusion detection models perform well under the normal situations. However, they show poor performance when they meet a new or unknown pattern of the network attacks. For this reason, several recent studies try to adopt various artificial intelligence techniques, which can proactively respond to the unknown threats. Especially, artificial neural networks (ANNs) have popularly been applied in the prior studies because of its superior prediction accuracy. However, ANNs have some intrinsic limitations such as the risk of overfitting, the requirement of the large sample size, and the lack of understanding the prediction process (i.e. black box theory). As a result, the most recent studies on IDS have started to adopt support vector machine (SVM), the classification technique that is more stable and powerful compared to ANNs. SVM is known as a relatively high predictive power and generalization capability. Under this background, this study proposes a novel intelligent intrusion detection model that uses SVM as the classification model in order to improve the predictive ability of IDS. Also, our model is designed to consider the asymmetric error cost by optimizing the classification threshold. Generally, there are two common forms of errors in intrusion detection. The first error type is the False-Positive Error (FPE). In the case of FPE, the wrong judgment on it may result in the unnecessary fixation. The second error type is the False-Negative Error (FNE) that mainly misjudges the malware of the program as normal. Compared to FPE, FNE is more fatal. Thus, when considering total cost of misclassification in IDS, it is more reasonable to assign heavier weights on FNE rather than FPE. Therefore, we designed our proposed intrusion detection model to optimize the classification threshold in order to minimize the total misclassification cost. In this case, conventional SVM cannot be applied because it is designed to generate discrete output (i.e. a class). To resolve this problem, we used the revised SVM technique proposed by Platt(2000), which is able to generate the probability estimate. To validate the practical applicability of our model, we applied it to the real-world dataset for network intrusion detection. The experimental dataset was collected from the IDS sensor of an official institution in Korea from January to June 2010. We collected 15,000 log data in total, and selected 1,000 samples from them by using random sampling method. In addition, the SVM model was compared with the logistic regression (LOGIT), decision trees (DT), and ANN to confirm the superiority of the proposed model. LOGIT and DT was experimented using PASW Statistics v18.0, and ANN was experimented using Neuroshell 4.0. For SVM, LIBSVM v2.90-a freeware for training SVM classifier-was used. Empirical results showed that our proposed model based on SVM outperformed all the other comparative models in detecting network intrusions from the accuracy perspective. They also showed that our model reduced the total misclassification cost compared to the ANN-based intrusion detection model. As a result, it is expected that the intrusion detection model proposed in this paper would not only enhance the performance of IDS, but also lead to better management of FNE.

Value of Information Technology Outsourcing: An Empirical Analysis of Korean Industries (IT 아웃소싱의 가치에 관한 연구: 한국 산업에 대한 실증분석)

  • Han, Kun-Soo;Lee, Kang-Bae
    • Asia pacific journal of information systems
    • /
    • v.20 no.3
    • /
    • pp.115-137
    • /
    • 2010
  • Information technology (IT) outsourcing, the use of a third-party vendor to provide IT services, started in the late 1980s and early 1990s in Korea, and has increased rapidly since 2000. Recently, firms have increased their efforts to capture greater value from IT outsourcing. To date, there have been a large number of studies on IT outsourcing. Most prior studies on IT outsourcing have focused on outsourcing practices and decisions, and little attention has been paid to objectively measuring the value of IT outsourcing. In addition, studies that examined the performance of IT outsourcing have mainly relied on anecdotal evidence or practitioners' perceptions. Our study examines the contribution of IT outsourcing to economic growth in Korean industries over the 1990 to 2007 period, using a production function framework and a panel data set for 54 industries constructed from input-output tables, fixed-capital formation tables, and employment tables. Based on the framework and estimation procedures that Han, Kauffman and Nault (2010) used to examine the economic impact of IT outsourcing in U.S. industries, we evaluate the impact of IT outsourcing on output and productivity in Korean industries. Because IT outsourcing started to grow at a significantly more rapid pace in 2000, we compare the impact of IT outsourcing in pre- and post-2000 periods. Our industry-level panel data cover a large proportion of Korean economy-54 out of 58 Korean industries. This allows us greater opportunity to assess the impacts of IT outsourcing on objective performance measures, such as output and productivity. Using IT outsourcing and IT capital as our primary independent variables, we employ an extended Cobb-Douglas production function in which both variables are treated as factor inputs. We also derive and estimate a labor productivity equation to assess the impact of our IT variables on labor productivity. We use data from seven years (1990, 1993, 2000, 2003, 2005, 2006, and 2007) for which both input-output tables and fixed-capital formation tables are available. Combining the input-output tables and fixed-capital formation tables resulted in 54 industries. IT outsourcing is measured as the value of computer-related services purchased by each industry in a given year. All the variables have been converted to 2000 Korean Won using GDP deflators. To calculate labor hours, we use the average work hours for each sector provided by the OECD. To effectively control for heteroskedasticity and autocorrelation present in our dataset, we use the feasible generalized least squares (FGLS) procedures. Because the AR1 process may be industry-specific (i.e., panel-specific), we consider both common AR1 and panel-specific AR1 (PSAR1) processes in our estimations. We also include year dummies to control for year-specific effects common across industries, and sector dummies (as defined in the GDP deflator) to control for time-invariant sector-specific effects. Based on the full sample of 378 observations, we find that a 1% increase in IT outsourcing is associated with a 0.012~0.014% increase in gross output and a 1% increase in IT capital is associated with a 0.024~0.027% increase in gross output. To compare the contribution of IT outsourcing relative to that of IT capital, we examined gross marginal product (GMP). The average GMP of IT outsourcing was 6.423, which is substantially greater than that of IT capital at 2.093. This indicates that on average if an industry invests KRW 1 millon, it can increase its output by KRW 6.4 million. In terms of the contribution to labor productivity, we find that a 1% increase in IT outsourcing is associated with a 0.009~0.01% increase in labor productivity while a 1% increase in IT capital is associated with a 0.024~0.025% increase in labor productivity. Overall, our results indicate that IT outsourcing has made positive and economically meaningful contributions to output and productivity in Korean industries over the 1990 to 2007 period. The average GMP of IT outsourcing we report about Korean industries is 1.44 times greater than that in U.S. industries reported in Han et al. (2010). Further, we find that the contribution of IT outsourcing has been significantly greater in the 2000~2007 period during which the growth of IT outsourcing accelerated. Our study provides implication for policymakers and managers. First, our results suggest that Korean industries can capture further benefits by increasing investments in IT outsourcing. Second, our analyses and results provide a basis for managers to assess the impact of investments in IT outsourcing and IT capital in an objective and quantitative manner. Building on our study, future research should examine the impact of IT outsourcing at a more detailed industry level and the firm level.

Position and function of dance education in arts and cultural education (문화예술교육에서 무용교육의 위치와 기능)

  • Hwang, Jeong-ok
    • (The) Research of the performance art and culture
    • /
    • no.36
    • /
    • pp.531-551
    • /
    • 2018
  • The educational trait that the arts and cultural education and dance strive for at a time when the ethical tasks of life is the experience for insight of life. The awareness of time entrusted with the intensity [depth] of artistic and aesthetic experience is to contain its implication with policy and system. In the policy territory, broad perception and strategy are combined and practiced to produce new implication. Therefore, on the basis of characteristics and spectrum persuaded at a time when the arts and cultural education and dance education are broadly expanded, the result of this study after taking a look at the role of dance education within the arts and cultural education is shown as follows. The value striving for by the culture and arts education and dance education is to structure the life form with the artistic experience through the art as the ultimate life description. This is attributable to the fact that the artistic trait structured with self-understanding and self-expression contains the directivity of life that is recorded and depicted in the process of life. The dance education in the culture and arts education has the trait to view the world with the dance structure as the comprehensive study as in other textbook or art genre under the awareness of time and education system category within the school system and it has diverse social issues combined as related to the frame of social growth and advancement outside of school. When taking a look at the practical characteristics (method) of dance based on the arts and cultural education business, it facilitates the practice strategy through dance, in dance, about dance, between dance with the artist for art [dance]. At this time, the approachability of dance is deployed in a program based on diverse artistry for technology, expression, understanding, symbolism and others and it has the participation of enjoyment and preference. In the policy project of the culture and arts education, the dance education works as the function of education project as an alternative model on the education system and it also sometimes works as the function for social improvement and development to promote the community awareness and cultural transformation through the involvement and intervention of social issues.

Information Privacy Concern in Context-Aware Personalized Services: Results of a Delphi Study

  • Lee, Yon-Nim;Kwon, Oh-Byung
    • Asia pacific journal of information systems
    • /
    • v.20 no.2
    • /
    • pp.63-86
    • /
    • 2010
  • Personalized services directly and indirectly acquire personal data, in part, to provide customers with higher-value services that are specifically context-relevant (such as place and time). Information technologies continue to mature and develop, providing greatly improved performance. Sensory networks and intelligent software can now obtain context data, and that is the cornerstone for providing personalized, context-specific services. Yet, the danger of overflowing personal information is increasing because the data retrieved by the sensors usually contains privacy information. Various technical characteristics of context-aware applications have more troubling implications for information privacy. In parallel with increasing use of context for service personalization, information privacy concerns have also increased such as an unrestricted availability of context information. Those privacy concerns are consistently regarded as a critical issue facing context-aware personalized service success. The entire field of information privacy is growing as an important area of research, with many new definitions and terminologies, because of a need for a better understanding of information privacy concepts. Especially, it requires that the factors of information privacy should be revised according to the characteristics of new technologies. However, previous information privacy factors of context-aware applications have at least two shortcomings. First, there has been little overview of the technology characteristics of context-aware computing. Existing studies have only focused on a small subset of the technical characteristics of context-aware computing. Therefore, there has not been a mutually exclusive set of factors that uniquely and completely describe information privacy on context-aware applications. Second, user survey has been widely used to identify factors of information privacy in most studies despite the limitation of users' knowledge and experiences about context-aware computing technology. To date, since context-aware services have not been widely deployed on a commercial scale yet, only very few people have prior experiences with context-aware personalized services. It is difficult to build users' knowledge about context-aware technology even by increasing their understanding in various ways: scenarios, pictures, flash animation, etc. Nevertheless, conducting a survey, assuming that the participants have sufficient experience or understanding about the technologies shown in the survey, may not be absolutely valid. Moreover, some surveys are based solely on simplifying and hence unrealistic assumptions (e.g., they only consider location information as a context data). A better understanding of information privacy concern in context-aware personalized services is highly needed. Hence, the purpose of this paper is to identify a generic set of factors for elemental information privacy concern in context-aware personalized services and to develop a rank-order list of information privacy concern factors. We consider overall technology characteristics to establish a mutually exclusive set of factors. A Delphi survey, a rigorous data collection method, was deployed to obtain a reliable opinion from the experts and to produce a rank-order list. It, therefore, lends itself well to obtaining a set of universal factors of information privacy concern and its priority. An international panel of researchers and practitioners who have the expertise in privacy and context-aware system fields were involved in our research. Delphi rounds formatting will faithfully follow the procedure for the Delphi study proposed by Okoli and Pawlowski. This will involve three general rounds: (1) brainstorming for important factors; (2) narrowing down the original list to the most important ones; and (3) ranking the list of important factors. For this round only, experts were treated as individuals, not panels. Adapted from Okoli and Pawlowski, we outlined the process of administrating the study. We performed three rounds. In the first and second rounds of the Delphi questionnaire, we gathered a set of exclusive factors for information privacy concern in context-aware personalized services. The respondents were asked to provide at least five main factors for the most appropriate understanding of the information privacy concern in the first round. To do so, some of the main factors found in the literature were presented to the participants. The second round of the questionnaire discussed the main factor provided in the first round, fleshed out with relevant sub-factors. Respondents were then requested to evaluate each sub factor's suitability against the corresponding main factors to determine the final sub-factors from the candidate factors. The sub-factors were found from the literature survey. Final factors selected by over 50% of experts. In the third round, a list of factors with corresponding questions was provided, and the respondents were requested to assess the importance of each main factor and its corresponding sub factors. Finally, we calculated the mean rank of each item to make a final result. While analyzing the data, we focused on group consensus rather than individual insistence. To do so, a concordance analysis, which measures the consistency of the experts' responses over successive rounds of the Delphi, was adopted during the survey process. As a result, experts reported that context data collection and high identifiable level of identical data are the most important factor in the main factors and sub factors, respectively. Additional important sub-factors included diverse types of context data collected, tracking and recording functionalities, and embedded and disappeared sensor devices. The average score of each factor is very useful for future context-aware personalized service development in the view of the information privacy. The final factors have the following differences comparing to those proposed in other studies. First, the concern factors differ from existing studies, which are based on privacy issues that may occur during the lifecycle of acquired user information. However, our study helped to clarify these sometimes vague issues by determining which privacy concern issues are viable based on specific technical characteristics in context-aware personalized services. Since a context-aware service differs in its technical characteristics compared to other services, we selected specific characteristics that had a higher potential to increase user's privacy concerns. Secondly, this study considered privacy issues in terms of service delivery and display that were almost overlooked in existing studies by introducing IPOS as the factor division. Lastly, in each factor, it correlated the level of importance with professionals' opinions as to what extent users have privacy concerns. The reason that it did not select the traditional method questionnaire at that time is that context-aware personalized service considered the absolute lack in understanding and experience of users with new technology. For understanding users' privacy concerns, professionals in the Delphi questionnaire process selected context data collection, tracking and recording, and sensory network as the most important factors among technological characteristics of context-aware personalized services. In the creation of a context-aware personalized services, this study demonstrates the importance and relevance of determining an optimal methodology, and which technologies and in what sequence are needed, to acquire what types of users' context information. Most studies focus on which services and systems should be provided and developed by utilizing context information on the supposition, along with the development of context-aware technology. However, the results in this study show that, in terms of users' privacy, it is necessary to pay greater attention to the activities that acquire context information. To inspect the results in the evaluation of sub factor, additional studies would be necessary for approaches on reducing users' privacy concerns toward technological characteristics such as highly identifiable level of identical data, diverse types of context data collected, tracking and recording functionality, embedded and disappearing sensor devices. The factor ranked the next highest level of importance after input is a context-aware service delivery that is related to output. The results show that delivery and display showing services to users in a context-aware personalized services toward the anywhere-anytime-any device concept have been regarded as even more important than in previous computing environment. Considering the concern factors to develop context aware personalized services will help to increase service success rate and hopefully user acceptance for those services. Our future work will be to adopt these factors for qualifying context aware service development projects such as u-city development projects in terms of service quality and hence user acceptance.

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.

Development of Yóukè Mining System with Yóukè's Travel Demand and Insight Based on Web Search Traffic Information (웹검색 트래픽 정보를 활용한 유커 인바운드 여행 수요 예측 모형 및 유커마이닝 시스템 개발)

  • Choi, Youji;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.155-175
    • /
    • 2017
  • As social data become into the spotlight, mainstream web search engines provide data indicate how many people searched specific keyword: Web Search Traffic data. Web search traffic information is collection of each crowd that search for specific keyword. In a various area, web search traffic can be used as one of useful variables that represent the attention of common users on specific interests. A lot of studies uses web search traffic data to nowcast or forecast social phenomenon such as epidemic prediction, consumer pattern analysis, product life cycle, financial invest modeling and so on. Also web search traffic data have begun to be applied to predict tourist inbound. Proper demand prediction is needed because tourism is high value-added industry as increasing employment and foreign exchange. Among those tourists, especially Chinese tourists: Youke is continuously growing nowadays, Youke has been largest tourist inbound of Korea tourism for many years and tourism profits per one Youke as well. It is important that research into proper demand prediction approaches of Youke in both public and private sector. Accurate tourism demands prediction is important to efficient decision making in a limited resource. This study suggests improved model that reflects latest issue of society by presented the attention from group of individual. Trip abroad is generally high-involvement activity so that potential tourists likely deep into searching for information about their own trip. Web search traffic data presents tourists' attention in the process of preparation their journey instantaneous and dynamic way. So that this study attempted select key words that potential Chinese tourists likely searched out internet. Baidu-Chinese biggest web search engine that share over 80%- provides users with accessing to web search traffic data. Qualitative interview with potential tourists helps us to understand the information search behavior before a trip and identify the keywords for this study. Selected key words of web search traffic are categorized by how much directly related to "Korean Tourism" in a three levels. Classifying categories helps to find out which keyword can explain Youke inbound demands from close one to far one as distance of category. Web search traffic data of each key words gathered by web crawler developed to crawling web search data onto Baidu Index. Using automatically gathered variable data, linear model is designed by multiple regression analysis for suitable for operational application of decision and policy making because of easiness to explanation about variables' effective relationship. After regression linear models have composed, comparing with model composed traditional variables and model additional input web search traffic data variables to traditional model has conducted by significance and R squared. after comparing performance of models, final model is composed. Final regression model has improved explanation and advantage of real-time immediacy and convenience than traditional model. Furthermore, this study demonstrates system intuitively visualized to general use -Youke Mining solution has several functions of tourist decision making including embed final regression model. Youke Mining solution has algorithm based on data science and well-designed simple interface. In the end this research suggests three significant meanings on theoretical, practical and political aspects. Theoretically, Youke Mining system and the model in this research are the first step on the Youke inbound prediction using interactive and instant variable: web search traffic information represents tourists' attention while prepare their trip. Baidu web search traffic data has more than 80% of web search engine market. Practically, Baidu data could represent attention of the potential tourists who prepare their own tour as real-time. Finally, in political way, designed Chinese tourist demands prediction model based on web search traffic can be used to tourism decision making for efficient managing of resource and optimizing opportunity for successful policy.

A Study on Relationship of Salesperson's, Relationship Beliefs, Negative Emotion Regulation Strategies, and Prosocial Behavior to Customer (판매원의 관계신념, 부정적 감정 조절전략, 그리고 친소비자행동의 관계에 관한 연구)

  • Kim, Sang-Hee
    • Management & Information Systems Review
    • /
    • v.34 no.5
    • /
    • pp.191-212
    • /
    • 2015
  • Unlike the existing researches related to salespersons, this study intends to place the focus on salespersons' psychological characteristic as an element affecting their selling behavior. This is because employees' psychological characteristic is very likely to affect their devotion and commitment to relationship with customers and long-term production by a company. In particular, salespersons are likely to get a feeling of fatigue or loss, or make a cynical or cold response to customers because of frequent interaction with them, and to show emotional indifference in an attempt to keep their distance from customers. But the likelihood can vary depending on salespersons' own psychological characteristic; in particular, the occurrence of these phenomena is very likely to vary significantly depending on relationship belief in interpersonal relations. In the field of psychology, under way are researches related to personal psychological characteristics to improve the quality of interpersonal relations and to maximize personal performance and enhance situational adaptability during this process; it is a personal relationship belief that is recently mentioned as such a psychological characteristic. For salespersons having frequent interaction with customers, particularly, relationship belief can be a very important element in forming relations with customers. So this study aims at determining how salespersons' relationship belief affects negative emotion regulation strategies and prosocial behavior to customer. As a result, salespersons' relationship belief was found to have effects on their negative emotion regulation strategies and prosocial behavior to customer. Negative emotion regulation strategies was found to have effects on prosocial behavior. Salespersons with intimate relationship belief try to use active regulation, support-seeking regulation and salespersons with controlling relationship belief try to use avoidant/distractive regulation. Intimate relationship belief was found to have more prosocial behavior, controlling relationship belief was found to have less prosocial behavior to customer. salespersons' negative emotion regulation strategies was found to have effects on their prosocial behavior to customer. Active, support-seeking influence prosocial behavior to customer positively, avoidant/distractive regulation influence prosocial behavior to customer negatively.

  • PDF