• 제목/요약/키워드: Direction Vector

Search Result 705, Processing Time 0.021 seconds

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.

Development of an anisotropic spatial interpolation method for velocity in meandering river channel (비등방성을 고려한 사행하천의 유속 공간보간기법 개발)

  • You, Hojun;Kim, Dongsu
    • Journal of Korea Water Resources Association
    • /
    • v.50 no.7
    • /
    • pp.455-465
    • /
    • 2017
  • Understanding of the two-dimensional velocity field is crucial in terms of analyzing various hydrodynamic and fluvial processes in the riverine environments. Until recently, many numerical models have played major roles of providing such velocity field instead of in-situ flow measurements, because there were limitations in instruments and methodologies suitable for efficiently measuring in the broad range of river reaches. In the last decades, however, the advent of modernized instrumentations started to revolutionize the flow measurements. Among others, acoustic Doppler current profilers (ADCPs) became very promising especially for accurately assessing streamflow discharge, and they are also able to provide the detailed velocity field very efficiently. Thus it became possible to capture the velocity field only with field observations. Since most of ADCPs measurements have been mostly conducted in the cross-sectional lines despite their capabilities, it is still required to apply appropriate interpolation methods to obtain dense velocity field as likely as results from numerical simulations. However, anisotropic nature of the meandering river channel could have brought in the difficulties for applying simple spatial interpolation methods for handling dynamic flow velocity vector, since the flow direction continuously changes over the curvature of the channel shape. Without considering anisotropic characteristics in terms of the meandering, therefore, conventional interpolation methods such as IDW and Kriging possibly lead to erroneous results, when they dealt with velocity vectors in the meandering channel. Based on the consecutive ADCP cross-sectional measurements in the meandering river channel. For this purpose, the geographic coordinate with the measured ADCP velocity was converted from the conventional Cartesian coordinate (x, y) to a curvilinear coordinate (s, n). The results from application of A-VIM showed significant improvement in accuracy as much as 41.5% in RMSE.

Analysis of Trading Performance on Intelligent Trading System for Directional Trading (방향성매매를 위한 지능형 매매시스템의 투자성과분석)

  • Choi, Heung-Sik;Kim, Sun-Woong;Park, Sung-Cheol
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.187-201
    • /
    • 2011
  • KOSPI200 index is the Korean stock price index consisting of actively traded 200 stocks in the Korean stock market. Its base value of 100 was set on January 3, 1990. The Korea Exchange (KRX) developed derivatives markets on the KOSPI200 index. KOSPI200 index futures market, introduced in 1996, has become one of the most actively traded indexes markets in the world. Traders can make profit by entering a long position on the KOSPI200 index futures contract if the KOSPI200 index will rise in the future. Likewise, they can make profit by entering a short position if the KOSPI200 index will decline in the future. Basically, KOSPI200 index futures trading is a short-term zero-sum game and therefore most futures traders are using technical indicators. Advanced traders make stable profits by using system trading technique, also known as algorithm trading. Algorithm trading uses computer programs for receiving real-time stock market data, analyzing stock price movements with various technical indicators and automatically entering trading orders such as timing, price or quantity of the order without any human intervention. Recent studies have shown the usefulness of artificial intelligent systems in forecasting stock prices or investment risk. KOSPI200 index data is numerical time-series data which is a sequence of data points measured at successive uniform time intervals such as minute, day, week or month. KOSPI200 index futures traders use technical analysis to find out some patterns on the time-series chart. Although there are many technical indicators, their results indicate the market states among bull, bear and flat. Most strategies based on technical analysis are divided into trend following strategy and non-trend following strategy. Both strategies decide the market states based on the patterns of the KOSPI200 index time-series data. This goes well with Markov model (MM). Everybody knows that the next price is upper or lower than the last price or similar to the last price, and knows that the next price is influenced by the last price. However, nobody knows the exact status of the next price whether it goes up or down or flat. So, hidden Markov model (HMM) is better fitted than MM. HMM is divided into discrete HMM (DHMM) and continuous HMM (CHMM). The only difference between DHMM and CHMM is in their representation of state probabilities. DHMM uses discrete probability density function and CHMM uses continuous probability density function such as Gaussian Mixture Model. KOSPI200 index values are real number and these follow a continuous probability density function, so CHMM is proper than DHMM for the KOSPI200 index. In this paper, we present an artificial intelligent trading system based on CHMM for the KOSPI200 index futures system traders. Traders have experienced on technical trading for the KOSPI200 index futures market ever since the introduction of the KOSPI200 index futures market. They have applied many strategies to make profit in trading the KOSPI200 index futures. Some strategies are based on technical indicators such as moving averages or stochastics, and others are based on candlestick patterns such as three outside up, three outside down, harami or doji star. We show a trading system of moving average cross strategy based on CHMM, and we compare it to a traditional algorithmic trading system. We set the parameter values of moving averages at common values used by market practitioners. Empirical results are presented to compare the simulation performance with the traditional algorithmic trading system using long-term daily KOSPI200 index data of more than 20 years. Our suggested trading system shows higher trading performance than naive system trading.

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.

An accuracy analysis of Cyberknife tumor tracking radiotherapy according to unpredictable change of respiration (예측 불가능한 호흡 변화에 따른 사이버나이프 종양 추적 방사선 치료의 정확도 분석)

  • Seo, jung min;Lee, chang yeol;Huh, hyun do;Kim, wan sun
    • The Journal of Korean Society for Radiation Therapy
    • /
    • v.27 no.2
    • /
    • pp.157-166
    • /
    • 2015
  • Purpose : Cyber-Knife tumor tracking system, based on the correlation relationship between the position of a tumor which moves in response to the real time respiratory cycle signal and respiration was obtained by the LED marker attached to the outside of the patient, the location of the tumor to predict in advance, the movement of the tumor in synchronization with the therapeutic device to track real-time tumor, is a system for treating. The purpose of this study, in the cyber knife tumor tracking radiation therapy, trying to evaluate the accuracy of tumor tracking radiation therapy system due to the change in the form of unpredictable sudden breathing due to cough and sleep. Materials and Methods : Breathing Log files that were used in the study, based on the Respiratory gating radiotherapy and Cyber-knife tracking radiosurgery breathing Log files of patients who received herein, measured using the Log files in the form of a Sinusoidal pattern and Sudden change pattern. it has been reconstituted as possible. Enter the reconstructed respiratory Log file cyber knife dynamic chest Phantom, so that it is possible to implement a motion due to respiration, add manufacturing the driving apparatus of the existing dynamic chest Phantom, Phantom the form of respiration we have developed a program that can be applied to. Movement of the phantom inside the target (Ball cube target) was driven by the displacement of three sizes of according to the size of the respiratory vertical (Superior-Inferior) direction to the 5 mm, 10 mm, 20 mm. Insert crosses two EBT3 films in phantom inside the target in response to changes in the target movement, the End-to-End (E2E) test provided in Cyber-Knife manufacturer depending on the form of the breathing five times each. It was determined by carrying. Accuracy of tumor tracking system is indicated by the target error by analyzing the inserted film, additional E2E test is analyzed by measuring the correlation error while being advanced. Results : If the target error is a sine curve breathing form, the size of the target of the movement is in response to the 5 mm, 10 mm, 20 mm, respectively, of the average $1.14{\pm}0.13mm$, $1.05{\pm}0.20mm$, with $2.37{\pm}0.17mm$, suddenly for it is variations in breathing, respective average $1.87{\pm}0.19mm$, $2.15{\pm}0.21mm$, and analyzed with $2.44{\pm}0.26mm$. If the correlation error can be defined by the length of the displacement vector in the target track is a sinusoidal breathing mode, the size of the target of the movement in response to 5 mm, 10 mm, 20 mm, respective average $0.84{\pm}0.01mm$, $0.70{\pm}0.13mm$, with $1.63{\pm}0.10mm$, if it is a variant of sudden breathing respective average $0.97{\pm}0.06mm$, $1.44{\pm}0.11mm$, and analyzed with $1.98{\pm}0.10mm$. The larger the correlation error values in both the both the respiratory form, the target error value is large. If the motion size of the target of the sine curve breathing form is greater than or equal to 20 mm, was measured at 1.5 mm or more is a recommendation value of both cyber knife manufacturer of both error value. Conclusion : There is a tendency that the correlation error value between about target error value magnitude of the target motion is large is increased, the error value becomes large in variation of rapid respiration than breathing the form of a sine curve. The more the shape of the breathing large movements regular shape of sine curves target accuracy of the tumor tracking system can be judged to be reduced. Using the algorithm of Cyber-Knife tumor tracking system, when there is a change in the sudden unpredictable respiratory due patient coughing during treatment enforcement is to stop the treatment, it is assumed to carry out the internal target validation process again, it is necessary to readjust the form of respiration. Patients under treatment is determined to be able to improve the treatment of accuracy to induce the observed form of regular breathing and put like to see the goggles monitor capable of the respiratory form of the person.

  • PDF