• Title/Summary/Keyword: Data-driven

Search Result 1,944, Processing Time 0.042 seconds

An Intelligence Support System Research on KTX Rolling Stock Failure Using Case-based Reasoning and Text Mining (사례기반추론과 텍스트마이닝 기법을 활용한 KTX 차량고장 지능형 조치지원시스템 연구)

  • Lee, Hyung Il;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.47-73
    • /
    • 2020
  • KTX rolling stocks are a system consisting of several machines, electrical devices, and components. The maintenance of the rolling stocks requires considerable expertise and experience of maintenance workers. In the event of a rolling stock failure, the knowledge and experience of the maintainer will result in a difference in the quality of the time and work to solve the problem. So, the resulting availability of the vehicle will vary. Although problem solving is generally based on fault manuals, experienced and skilled professionals can quickly diagnose and take actions by applying personal know-how. Since this knowledge exists in a tacit form, it is difficult to pass it on completely to a successor, and there have been studies that have developed a case-based rolling stock expert system to turn it into a data-driven one. Nonetheless, research on the most commonly used KTX rolling stock on the main-line or the development of a system that extracts text meanings and searches for similar cases is still lacking. Therefore, this study proposes an intelligence supporting system that provides an action guide for emerging failures by using the know-how of these rolling stocks maintenance experts as an example of problem solving. For this purpose, the case base was constructed by collecting the rolling stocks failure data generated from 2015 to 2017, and the integrated dictionary was constructed separately through the case base to include the essential terminology and failure codes in consideration of the specialty of the railway rolling stock sector. Based on a deployed case base, a new failure was retrieved from past cases and the top three most similar failure cases were extracted to propose the actual actions of these cases as a diagnostic guide. In this study, various dimensionality reduction measures were applied to calculate similarity by taking into account the meaningful relationship of failure details in order to compensate for the limitations of the method of searching cases by keyword matching in rolling stock failure expert system studies using case-based reasoning in the precedent case-based expert system studies, and their usefulness was verified through experiments. Among the various dimensionality reduction techniques, similar cases were retrieved by applying three algorithms: Non-negative Matrix Factorization(NMF), Latent Semantic Analysis(LSA), and Doc2Vec to extract the characteristics of the failure and measure the cosine distance between the vectors. The precision, recall, and F-measure methods were used to assess the performance of the proposed actions. To compare the performance of dimensionality reduction techniques, the analysis of variance confirmed that the performance differences of the five algorithms were statistically significant, with a comparison between the algorithm that randomly extracts failure cases with identical failure codes and the algorithm that applies cosine similarity directly based on words. In addition, optimal techniques were derived for practical application by verifying differences in performance depending on the number of dimensions for dimensionality reduction. The analysis showed that the performance of the cosine similarity was higher than that of the dimension using Non-negative Matrix Factorization(NMF) and Latent Semantic Analysis(LSA) and the performance of algorithm using Doc2Vec was the highest. Furthermore, in terms of dimensionality reduction techniques, the larger the number of dimensions at the appropriate level, the better the performance was found. Through this study, we confirmed the usefulness of effective methods of extracting characteristics of data and converting unstructured data when applying case-based reasoning based on which most of the attributes are texted in the special field of KTX rolling stock. Text mining is a trend where studies are being conducted for use in many areas, but studies using such text data are still lacking in an environment where there are a number of specialized terms and limited access to data, such as the one we want to use in this study. In this regard, it is significant that the study first presented an intelligent diagnostic system that suggested action by searching for a case by applying text mining techniques to extract the characteristics of the failure to complement keyword-based case searches. It is expected that this will provide implications as basic study for developing diagnostic systems that can be used immediately on the site.

An Evaluation on the Operating of Fisheries Extension Services (어촌지도사업의 평가)

  • 최정윤
    • The Journal of Fisheries Business Administration
    • /
    • v.17 no.2
    • /
    • pp.65-106
    • /
    • 1986
  • 1, The Purpose of Study This is a study on the Evaluation of the operating of Fisheries Extension Services of Korea, for performing the activities such as guiding fisheries technique as well as offering industrial information to the fishermen in fishing village. By doing so, the Fisheries Extension Sevices(FES) can materialize the continued growth of fisheries, the social and economic development of fishing village, and the increase in income by enhancing the knowledge level of Fishermen, etc. In performing fisheries policy, this activity plays a great role on the research and development activity, and it has become practical since 1976 in Korea. In order to meet immediately with the problem of fisheries technical innovation and rapid environmental changes surrounding the fisheries, the fishermen should not only enhance their scientific and comprehensive capacity in fisheries technique but abtain various effective information. Generally, as most of all the fishemen are poor in the managerial structure and scattered in fishing villages, they have little opportunity in the contact of information. As a result, it is nessessary for the FES to perform the fishing business by the extension service officials who has received special training and acquired fisheries know-how in these fields. And yet, FES is under the unfullfilled circumstance in such factors as manpower, technical know-how, equipment, and the service system etc., which is required in promoting the social, economic development of fishing village and in resolving the high technique demand of fisherman. This study on the fisheries extension services have been studied from those backgrounds. 2. Research Method The data of collecting methods which were necessary in carrying out this study was adopted by the questionaire research on the present extension service activity, through the subject of the extension services (driving agency of the work and the officials), the object(fishemen) and the 3rd observers to the extension services (the authorities concerned). The research sample was taken by the sampling extraction of total 1, 774 men from the above 3 groups. And the research was carried out from August, 1986 to October, 1986, supported from the Fisheries Extension Office (FEO) located in field during the research process. In this study, the levels of the extension operating were determined and estimated in accordance with the extension service method, morale of extension service officials and the extension service system, etc. through the collected data of the research questionaire paper. And based on this result, the essential conditions of the extension services were grasped, and also we tried to present the various activity plan necessary to promote the operating of the extension services. The questionaire research data was calculated by the computer center of National Fisheries University of Pusan, and the total result was again tried on the one demension analysis along with two dimension analysis to search out the relativity between the questionaire, and the statistical test was done $\chi$$^2$test in significance level of l~5%. 3. Contents of Study This study consists of 7 chapters and the contents are as follows : Chapter I : The object and method of the study Chapter II : The assessment and analysis of the extension services Chapter III : The contents and method of the extension services Chapter IV : Analysis of the essential conditions for the extension services Chapter V : The evaluation of activities of extension services Chapter Ⅵ : Conclusion.4. Results and RecommendationTherefore, the results of this study estimated by logical process and analysis are as follows : 1) Most of Korean fishing villages and coastal fishermen have shown much concerns about fisheries technique and social changes, thus many of them were confronted with new problems on how to adapt and to meet changes. 2) Majority of fishermen estimated FEO as an organization of specific technologies with all the thing concerning the fisheries technique in general. Therefore the fishermen wanted to utilize the FEO as an adaptable method for the modern fisheries techniques as well as the environmental changes. 3) In contrast with the fast changes of the fisheries technique, the complexity and variety of technical system and the broadness of fishing village and fishermen, it was revealed that the necessary factors such as the facilities, manpower, budget, and the level of applying techniques of the FEO located in field were highly insufficient. Accordingly, the guiding efficiency was low and the extension services did not provide full solution to the various request from fishermen. 4) It is possible to classify the activation factor for the extension service into two large dimension ; personal dimension relevant to guidance officials and work dimension relevant to the organization. And it was found that the activation level of the work dimension was far lower than the personal dimension between them. So, the activation should be done first in the dimesion to promote the activation of the extension services. 5) The extension services officials are now demoralized in general, thus it is necessary to take reality into consideration : the expense of activity, the adequate endowment of activity scope and the reasonable operation of the position class, etc to enhance its morale. However, in order to do the FES activation, first of all, the systems should be established which is lain unsettled stage until now. And there must be change in the understanding of government i.e. the fisheries extension services are the essential policy subject to build up the base of fisheries growth and modernize the fisheries management. And it should be driven positively with the recognition of the "lasting project".g project".uot;.

  • PDF

Effects of the Exercise Training on Aging Heart in Rat I. Long Term Endurance Exercise (운동훈련이 흰쥐 노화심근에 미치는 영향 I. 장기간 지구력 운동 훈련)

  • 박원학;이상선;이용덕
    • Biomedical Science Letters
    • /
    • v.2 no.1
    • /
    • pp.71-90
    • /
    • 1996
  • There is considerable current interest in the effect of regular vigorous exercise and in particular endurance-running as a possible measure in improving myocardial function. Some data indicate that the aging heart may actually suffer from vigorous endurance exercise. On the contrary appropriate exercise in aged animals improves myocardial function and aerobic energy metabolism. So far there is relatively little data to indicate that endurance exercise is in fact beneficial in improving myocardial function or damaging to heart of aged animals. The present investigation aimed to study the possible effect of a long range treadmill training program on the heart in aging rats. Male rats aged 3, 10, and 20 months were divided at random into a control (sedentary) and an exercise group. The training group was exercised for 5 days a week on an automated treadmill for 20minutes at 18m/min over a period of 5 months. The exercise regimen of our experiments did not cause any significant changes in the tissues and ultrastructural as com-pared with sedentary age-matched control. Tissues and ultrastructures of myocardial cells in trained group aged 8 months are intact and well organized as well as sedentary control group. Age associated tissue and ultrastructural changes of trained group aged 15 months included : an increase in transformed mitochondria, vacuoles, lysosomes, lipid droplets and early lipofuscin. But the trained heart did not show significant difference in tissue and ultrastructural properties from those of sedentary controls. Endurance-trained group aged 25 months showed significant qualitative tissue and ultrastructural difference as compared with age-matched controls. In addition to those found in 25 months control group, focal necrosis, myofibril fraying, hypercontraction band, seperation of intercalated discs, degenerating nucleus and infiltration of collagenous fiber into myocyte were noted in trained 25 months group. The stereological examination of the mi-crographs disclosed no significant difference in the myoflbril, mitochondrion, sarcotubule and in-terstitium volume density and surface density of mitochondrial cristae and numerical density of mitochondria between trained and control group aged 8 and 15 months. In the trained 25 months group, significant increase in volume density of interstitium, lipofucsin granule were shown as compared to untrained age-matched control. On the other hand, significant decrease in mitochondrion volume density was shown. The myofibril volume density did not differ between trained and control group although trained group showed slight increase. From the data obtained a reduced mitochondria/myofibrils ratio was found in trained rat heart aged 25 months and there was no difference between trained and control rat aged 15 months. But a slight but not significant increase was found in the trained group aged 8 months as compared with same age control group. Such increase in the ratio in young animals is considered to be of great importance to cardiac pumping and adaptability. Whereas such adaptations don't seem to occur in aged heart muscle. This study proposed that repeated endurance exercise do not cause any significant qualitative and quantitative ultrastructural change of heart muscle in young(3months) and adult (10months) suggesting that the heart is able to adapt to the exercise. On the contrary, the repeated endurance exercise stress may actually induce degenerative changes in the aged heart muscle(20months).

  • PDF

Designing an Intelligent Advertising Business Model in Seoul's Metro Network (서울지하철의 지능형 광고 비즈니스모델 설계)

  • Musyoka, Kavoya Job;Lim, Gyoo Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.1-31
    • /
    • 2017
  • Modern businesses are adopting new technologies to serve their markets better as well as to improve efficiency and productivity. The advertising industry has continuously experienced disruptions from the traditional channels (radio, television and print media) to new complex ones including internet, social media and mobile-based advertising. This case study focuses on proposing intelligent advertising business model in Seoul's metro network. Seoul has one of the world's busiest metro network and transports a huge number of travelers on a daily basis. The high number of travelers coupled with a well-planned metro network creates a platform where marketers can initiate engagement and interact with both customers and potential customers. In the current advertising model, advertising is on illuminated and framed posters in the stations and in-car, non-illuminated posters, and digital screens that show scheduled arrivals and departures of metros. Some stations have digital screens that show adverts but they do not have location capability. Most of the current advertising media have one key limitation: space. For posters whether illuminated or not, one space can host only one advert at a time. Empirical literatures show that there is room for improving this advertising model and eliminate the space limitation by replacing the poster adverts with digital advertising platform. This new model will not only be digital, but will also provide intelligent advertising platform that is driven by data. The digital platform will incorporate location sensing, e-commerce, and mobile platform to create new value to all stakeholders. Travel cards used in the metro will be registered and the card scanners will have a capability to capture traveler's data when travelers tap their cards. This data once analyzed will make it possible to identify different customer groups. Advertisers and marketers will then be able to target specific customer groups, customize adverts based on the targeted consumer group, and offer a wide variety of advertising formats. Format includes video, cinemagraphs, moving pictures, and animation. Different advert formats create different emotions in the customer's mind and the goal should be to use format or combination of formats that arouse the expected emotion and lead to an engagement. Combination of different formats will be more effective and this can only work in a digital platform. Adverts will be location based, ensuring that adverts will show more frequently when the metro is near the premises of an advertiser. The advertising platform will automatically detect the next station and screens inside the metro will prioritize adverts in the station where the metro will be stopping. In the mobile platform, customers who opt to receive notifications will receive them when they approach the business premises of advertiser. The mobile platform will have indoor navigation for the underground shopping malls that will allow customers to search for facilities within the mall, products they may want to buy as well as deals going on in the underground mall. To create an end-to-end solution, the mobile solution will have a capability to allow customers purchase products through their phones, get coupons for deals, and review products and shops where they have bought a product. The indoor navigation will host intelligent mobile-based advertisement and a recommendation system. The indoor navigation will have adverts such that when a customer is searching for information, the recommendation system shows adverts that are near the place traveler is searching or in the direction that the traveler is moving. These adverts will be linked to the e-commerce platform such that if a customer clicks on an advert, it leads them to the product description page. The whole system will have multi-language as well as text-to-speech capability such that both locals and tourists have no language barrier. The implications of implementing this model are varied including support for small and medium businesses operating in the underground malls, improved customer experience, new job opportunities, additional revenue to business model operator, and flexibility in advertising. The new value created will benefit all the stakeholders.

Ore Minerals, Fluid Inclusions, and Isotopic(S.C.O) Compositions in the Diatreme-Hosted Nokdong As-Zn Deposit, Southeastern Korea: The Character and Evolution of the Hydrothermal Fluids (다이아튜림 내에 부존한 녹동 비소-아연광상의 광석광물, 유체포유물, 유황-탄소-산소 동위원소 : 광화용액의 특성과 진화)

  • Park, Ki-Hwa;Park, Hee-In;Eastoe, Christopher J.;Choi, Suck-Won
    • Economic and Environmental Geology
    • /
    • v.24 no.2
    • /
    • pp.131-150
    • /
    • 1991
  • The Weolseong diatreme was temporally and spatially related to the intrusion of the Gadaeri granite, and was -mineralized by meteoric aqueous fluids. In the Nokdong As-Zn deposit, pyrite, aresenopyrite and sphalerite are the most abundant sulfide minerals. They are associated with minor amount of magnetite, pyrrhotite, chalcopyrite and cassiterite, and trace amounts of Pb-Sb-Bi-Ag sulphosalts. The AsZn ore probably occurred at about $350^{\circ}C$ according to fluid inclusion and compositional data estimated from the arsenic content of arsenopyrite and iron content of sphalerite intergrown with pyrrhotite + chalcopyrite + cubanite. Heating studies of fluid inclusions in quartz indicate a temperature range between 180 and $360^{\circ}C$, and freezing data indicate a salinity range from 0.8 to 4.1 eq.wt % NaCl. The coexisting assemblage pyrite + pyrrhotite + arsenopyrite suggests that $H_2S$ was the dominate reduced sulfur species, and defines fluid parameter thus: $10^{-34.5}$ < ${\alpha}_{S_2}$ < $10^{-33}$, $10^{-11}$ < $f_{S_2}$ < $10^{-8}$, -2.4 < ${\alpha}_{S_2}$ < -1.6 atm and pH= 5.2 (sericte stable) at $300^{\circ}C$. The sulfur isotope values ranged from 1.8 to 5.5% and indicate that the sulfur in the sulfides is of magmatic in origin. The carbon isotope values range from -7.8 to -11.6%, and the oxygen isotope values from the carbonates in mineralized wall rock range from 2 to 11.4%. The oxygen isotope compositions of water coexisting with calcite require an input of meteoric water. The geochemical data indicate that the ore-forming fluid probably was generated by a variety of mechanisms, including deep circulation of meteoric water driven by magmatic heat, with possible input of magniatic water and ore component.

  • PDF

Designing Mobile Framework for Intelligent Personalized Marketing Service in Interactive Exhibition Space (인터랙티브 전시 환경에서 개인화 마케팅 서비스를 위한 모바일 프레임워크 설계)

  • Bae, Jong-Hwan;Sho, Su-Hwan;Choi, Lee-Kwon
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.59-69
    • /
    • 2012
  • As exhibition industry, which is a part of 17 new growth engines of the government, is related to other industries such as tourism, transportation and financial industries. So it has a significant ripple effect on other industries. Exhibition is a knowledge-intensive, eco-friendly and high value-added Industry. Over 13,000 exhibitions are held every year around the world which contributes to getting foreign currency. Exhibition industry is closely related with culture and tourism and could be utilized as local and national development strategies and improve national brand image as well. Many countries try various efforts to invigorate exhibition industry by arranging related laws and support system. In Korea, more than 200 exhibitions are being held every year, but only 2~3 exhibitions are hosted with over 400 exhibitors and except these exhibitions most exhibitions have few foreign exhibitors. The main reason of weakness of domestic trade show is that there are no agencies managing exhibitionrelated statistics and there is no specific and reliable evaluation. This might cause impossibility of providing buyer or seller with reliable data, poor growth of exhibitions in terms of quality and thus service quality of trade shows cannot be improved. Hosting a lot of visitors (Public/Buyer/Exhibitor) is very crucial to the development of domestic exhibition industry. In order to attract many visitors, service quality of exhibition and visitor's satisfaction should be enhanced. For this purpose, a variety of real-time customized services through digital media and the services for creating new customers and retaining existing customers should be provided. In addition, by providing visitors with personalized information services they could manage their time and space efficiently avoiding the complexity of exhibition space. Exhibition industry can have competitiveness and industrial foundation through building up exhibition-related statistics, creating new information and enhancing research ability. Therefore, this paper deals with customized service with visitor's smart-phone at the exhibition space and designing mobile framework which enables exhibition devices to interact with other devices. Mobile server framework is composed of three different systems; multi-server interaction, server, client, display device. By making knowledge pool of exhibition environment, the accumulated data for each visitor can be provided as personalized service. In addition, based on the reaction of visitors each of all information is utilized as customized information and so the cyclic chain structure is designed. Multiple interaction server is designed to have functions of event handling, interaction process between exhibition device and visitor's smart-phone and data management. Client is an application processed by visitor's smart-phone and could be driven on a variety of platforms. Client functions as interface representing customized service for individual visitors and event input and output for simultaneous participation. Exhibition device consists of display system to show visitors contents and information, interaction input-output system to receive event from visitors and input toward action and finally the control system to connect above two systems. The proposed mobile framework in this paper provides individual visitors with customized and active services using their information profile and advanced Knowledge. In addition, user participation service is suggested as well by using interaction connection system between server, client, and exhibition devices. Suggested mobile framework is a technology which could be applied to culture industry such as performance, show and exhibition. Thus, this builds up the foundation to improve visitor's participation in exhibition and bring about development of exhibition industry by raising visitor's interest.

Development of High-frequency Data-based Inflow Water Temperature Prediction Model and Prediction of Changesin Stratification Strength of Daecheong Reservoir Due to Climate Change (고빈도 자료기반 유입 수온 예측모델 개발 및 기후변화에 따른 대청호 성층강도 변화 예측)

  • Han, Jongsu;Kim, Sungjin;Kim, Dongmin;Lee, Sawoo;Hwang, Sangchul;Kim, Jiwon;Chung, Sewoong
    • Journal of Environmental Impact Assessment
    • /
    • v.30 no.5
    • /
    • pp.271-296
    • /
    • 2021
  • Since the thermal stratification in a reservoir inhibits the vertical mixing of the upper and lower layers and causes the formation of a hypoxia layer and the enhancement of nutrients release from the sediment, changes in the stratification structure of the reservoir according to future climate change are very important in terms of water quality and aquatic ecology management. This study was aimed to develop a data-driven inflow water temperature prediction model for Daecheong Reservoir (DR), and to predict future inflow water temperature and the stratification structure of DR considering future climate scenarios of Representative Concentration Pathways (RCP). The random forest (RF)regression model (NSE 0.97, RMSE 1.86℃, MAPE 9.45%) developed to predict the inflow temperature of DR adequately reproduced the statistics and variability of the observed water temperature. Future meteorological data for each RCP scenario predicted by the regional climate model (HadGEM3-RA) was input into RF model to predict the inflow water temperature, and a three-dimensional hydrodynamic model (AEM3D) was used to predict the change in the future (2018~2037, 2038~2057, 2058~2077, 2078~2097) stratification structure of DR due to climate change. As a result, the rates of increase in air temperature and inflow water temperature was 0.14~0.48℃/10year and 0.21~0.41℃/10year,respectively. As a result of seasonal analysis, in all scenarios except spring and winter in the RCP 2.6, the increase in inflow water temperature was statistically significant, and the increase rate was higher as the carbon reduction effort was weaker. The increase rate of the surface water temperature of the reservoir was in the range of 0.04~0.38℃/10year, and the stratification period was gradually increased in all scenarios. In particular, when the RCP 8.5 scenario is applied, the number of stratification days is expected to increase by about 24 days. These results were consistent with the results of previous studies that climate change strengthens the stratification intensity of lakes and reservoirs and prolonged the stratification period, and suggested that prolonged water temperature stratification could cause changes in the aquatic ecosystem, such as spatial expansion of the low-oxygen layer, an increase in sediment nutrient release, and changed in the dominant species of algae in the water body.

A Thermal Time-Driven Dormancy Index as a Complementary Criterion for Grape Vine Freeze Risk Evaluation (포도 동해위험 판정기준으로서 온도시간 기반의 휴면심도 이용)

  • Kwon, Eun-Young;Jung, Jea-Eun;Chung, U-Ran;Lee, Seung-Jong;Song, Gi-Cheol;Choi, Dong-Geun;Yun, Jin-I.
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.8 no.1
    • /
    • pp.1-9
    • /
    • 2006
  • Regardless of the recent observed warmer winters in Korea, more freeze injuries and associated economic losses are reported in fruit industry than ever before. Existing freeze-frost forecasting systems employ only daily minimum temperature for judging the potential damage on dormant flowering buds but cannot accommodate potential biological responses such as short-term acclimation of plants to severe weather episodes as well as annual variation in climate. We introduce 'dormancy depth', in addition to daily minimum temperature, as a complementary criterion for judging the potential damage of freezing temperatures on dormant flowering buds of grape vines. Dormancy depth can be estimated by a phonology model driven by daily maximum and minimum temperature and is expected to make a reasonable proxy for physiological tolerance of buds to low temperature. Dormancy depth at a selected site was estimated for a climatological normal year by this model, and we found a close similarity in time course change pattern between the estimated dormancy depth and the known cold tolerance of fruit trees. Inter-annual and spatial variation in dormancy depth were identified by this method, showing the feasibility of using dormancy depth as a proxy indicator for tolerance to low temperature during the winter season. The model was applied to 10 vineyards which were recently damaged by a cold spell, and a temperature-dormancy depth-freeze injury relationship was formulated into an exponential-saturation model which can be used for judging freeze risk under a given set of temperature and dormancy depth. Based on this model and the expected lowest temperature with a 10-year recurrence interval, a freeze risk probability map was produced for Hwaseong County, Korea. The results seemed to explain why the vineyards in the warmer part of Hwaseong County have been hit by more freeBe damage than those in the cooler part of the county. A dormancy depth-minimum temperature dual engine freeze warning system was designed for vineyards in major production counties in Korea by combining the site-specific dormancy depth and minimum temperature forecasts with the freeze risk model. In this system, daily accumulation of thermal time since last fall leads to the dormancy state (depth) for today. The regional minimum temperature forecast for tomorrow by the Korea Meteorological Administration is converted to the site specific forecast at a 30m resolution. These data are input to the freeze risk model and the percent damage probability is calculated for each grid cell and mapped for the entire county. Similar approaches may be used to develop freeze warning systems for other deciduous fruit trees.

Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach (카테고리 중립 단어 활용을 통한 주가 예측 방안: 텍스트 마이닝 활용)

  • Lee, Minsik;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.123-138
    • /
    • 2017
  • Since the stock market is driven by the expectation of traders, studies have been conducted to predict stock price movements through analysis of various sources of text data. In order to predict stock price movements, research has been conducted not only on the relationship between text data and fluctuations in stock prices, but also on the trading stocks based on news articles and social media responses. Studies that predict the movements of stock prices have also applied classification algorithms with constructing term-document matrix in the same way as other text mining approaches. Because the document contains a lot of words, it is better to select words that contribute more for building a term-document matrix. Based on the frequency of words, words that show too little frequency or importance are removed. It also selects words according to their contribution by measuring the degree to which a word contributes to correctly classifying a document. The basic idea of constructing a term-document matrix was to collect all the documents to be analyzed and to select and use the words that have an influence on the classification. In this study, we analyze the documents for each individual item and select the words that are irrelevant for all categories as neutral words. We extract the words around the selected neutral word and use it to generate the term-document matrix. The neutral word itself starts with the idea that the stock movement is less related to the existence of the neutral words, and that the surrounding words of the neutral word are more likely to affect the stock price movements. And apply it to the algorithm that classifies the stock price fluctuations with the generated term-document matrix. In this study, we firstly removed stop words and selected neutral words for each stock. And we used a method to exclude words that are included in news articles for other stocks among the selected words. Through the online news portal, we collected four months of news articles on the top 10 market cap stocks. We split the news articles into 3 month news data as training data and apply the remaining one month news articles to the model to predict the stock price movements of the next day. We used SVM, Boosting and Random Forest for building models and predicting the movements of stock prices. The stock market opened for four months (2016/02/01 ~ 2016/05/31) for a total of 80 days, using the initial 60 days as a training set and the remaining 20 days as a test set. The proposed word - based algorithm in this study showed better classification performance than the word selection method based on sparsity. This study predicted stock price volatility by collecting and analyzing news articles of the top 10 stocks in market cap. We used the term - document matrix based classification model to estimate the stock price fluctuations and compared the performance of the existing sparse - based word extraction method and the suggested method of removing words from the term - document matrix. The suggested method differs from the word extraction method in that it uses not only the news articles for the corresponding stock but also other news items to determine the words to extract. In other words, it removed not only the words that appeared in all the increase and decrease but also the words that appeared common in the news for other stocks. When the prediction accuracy was compared, the suggested method showed higher accuracy. The limitation of this study is that the stock price prediction was set up to classify the rise and fall, and the experiment was conducted only for the top ten stocks. The 10 stocks used in the experiment do not represent the entire stock market. In addition, it is difficult to show the investment performance because stock price fluctuation and profit rate may be different. Therefore, it is necessary to study the research using more stocks and the yield prediction through trading simulation.

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.