• Title/Summary/Keyword: flow mining

Search Result 232, Processing Time 0.03 seconds

Predicting stock movements based on financial news with systematic group identification (시스템적인 군집 확인과 뉴스를 이용한 주가 예측)

  • Seong, NohYoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.1-17
    • /
    • 2019
  • Because stock price forecasting is an important issue both academically and practically, research in stock price prediction has been actively conducted. The stock price forecasting research is classified into using structured data and using unstructured data. With structured data such as historical stock price and financial statements, past studies usually used technical analysis approach and fundamental analysis. In the big data era, the amount of information has rapidly increased, and the artificial intelligence methodology that can find meaning by quantifying string information, which is an unstructured data that takes up a large amount of information, has developed rapidly. With these developments, many attempts with unstructured data are being made to predict stock prices through online news by applying text mining to stock price forecasts. The stock price prediction methodology adopted in many papers is to forecast stock prices with the news of the target companies to be forecasted. However, according to previous research, not only news of a target company affects its stock price, but news of companies that are related to the company can also affect the stock price. However, finding a highly relevant company is not easy because of the market-wide impact and random signs. Thus, existing studies have found highly relevant companies based primarily on pre-determined international industry classification standards. However, according to recent research, global industry classification standard has different homogeneity within the sectors, and it leads to a limitation that forecasting stock prices by taking them all together without considering only relevant companies can adversely affect predictive performance. To overcome the limitation, we first used random matrix theory with text mining for stock prediction. Wherever the dimension of data is large, the classical limit theorems are no longer suitable, because the statistical efficiency will be reduced. Therefore, a simple correlation analysis in the financial market does not mean the true correlation. To solve the issue, we adopt random matrix theory, which is mainly used in econophysics, to remove market-wide effects and random signals and find a true correlation between companies. With the true correlation, we perform cluster analysis to find relevant companies. Also, based on the clustering analysis, we used multiple kernel learning algorithm, which is an ensemble of support vector machine to incorporate the effects of the target firm and its relevant firms simultaneously. Each kernel was assigned to predict stock prices with features of financial news of the target firm and its relevant firms. The results of this study are as follows. The results of this paper are as follows. (1) Following the existing research flow, we confirmed that it is an effective way to forecast stock prices using news from relevant companies. (2) When looking for a relevant company, looking for it in the wrong way can lower AI prediction performance. (3) The proposed approach with random matrix theory shows better performance than previous studies if cluster analysis is performed based on the true correlation by removing market-wide effects and random signals. The contribution of this study is as follows. First, this study shows that random matrix theory, which is used mainly in economic physics, can be combined with artificial intelligence to produce good methodologies. This suggests that it is important not only to develop AI algorithms but also to adopt physics theory. This extends the existing research that presented the methodology by integrating artificial intelligence with complex system theory through transfer entropy. Second, this study stressed that finding the right companies in the stock market is an important issue. This suggests that it is not only important to study artificial intelligence algorithms, but how to theoretically adjust the input values. Third, we confirmed that firms classified as Global Industrial Classification Standard (GICS) might have low relevance and suggested it is necessary to theoretically define the relevance rather than simply finding it in the GICS.

Effects of Geological Conditions on the Geomorphological Development of the Southwestern Coastal Regions of Korea (서남해안지역(西南海岸地域)의 지형발달(地形發達)에 미친 지질조건(地質條件))

  • Kim, Suh Woon
    • Economic and Environmental Geology
    • /
    • v.4 no.1
    • /
    • pp.11-18
    • /
    • 1971
  • The geotectonics and geomorphic structure of Korea resulted from the Song-rim Disturbance and the Daebo orogenic movements. Afterward this mountainous peninsula underwent several geological changes on a small scale, and it was also claimed that the steady rising of the elevated peneplain of the eastern coast and the submerging of the southwestern coastal area are largely due to the tilted block movement. These views have been generally accepted good in several ways, but they are limited in range or lacking in theoretical integration. The present writer investigated the geology of the Mt. Chi-ri-san and the Honam coal mining area for a geological map in 1965, respectively. The results of these studies convinced the present writer that the conventional views, which were based upon a theory of lateral pressure should be reconsidered in many respects, and more recent studies made it clear that the morphological development in the southwestern area can be better explained by the orogenic movement and rock control. The measurement of submerging speed of the western coastal area (Pak. Y. A., 1969) and a new account on the geology and tectonics of the Mid-central region of South Korea (Kim O.J., 1970) act as an encouragement to a new explanation. The present writer's researches on the extreme southwestern portion of the peninsula show that the steady submerging of this area cannot be attributed to a simple downthrown block phenomenon caused by block movement. It is no more than the result of the differential movement of uplifting in the eastern and western coastal areas and the rising of sea-level in the post-glacial period. This phenomenon could be easily explained by the comparison of the rate of rise in sea-level and amount of heat flow between Korea and other areas in the world. The existance of the erosional planes in the Sobaik-San ranges also provide an evidence of an upheaval in the western coast area. Though the Sobaik-San ranges largely follow the direction of the Sinian system. They consist of the numerous branches, whose trends run more or less differently from their main trend because of the disharmonic folding, are converged into Mt. Sobaik-San and Chupungryung. The undulation of the land is not wholely caused by orogenic movements, where as the present writer confirmed that the diversity of morphological development is the direct reflection of geological conditions such as rocks and processes which constitute the basic elements of geomorphic structure. An east-west directed mountain range which could be named as Hansan mountain range, was claimed to be oriented by the joint control. The geological conditions such as a special erosion and weathering of agglomerate and breccia tuff usually produce pot-hole like submarine features which cause the whirling phenomenon at the southwestern coast channel.

  • PDF

Finding the time sensitive frequent itemsets based on data mining technique in data streams (데이터 스트림에서 데이터 마이닝 기법 기반의 시간을 고려한 상대적인 빈발항목 탐색)

  • Park, Tae-Su;Chun, Seok-Ju;Lee, Ju-Hong;Kang, Yun-Hee;Choi, Bum-Ghi
    • Journal of The Korean Association of Information Education
    • /
    • v.9 no.3
    • /
    • pp.453-462
    • /
    • 2005
  • Recently, due to technical improvements of storage devices and networks, the amount of data increase rapidly. In addition, it is required to find the knowledge embedded in a data stream as fast as possible. Huge data in a data stream are created continuously and changed fast. Various algorithms for finding frequent itemsets in a data stream are actively proposed. Current researches do not offer appropriate method to find frequent itemsets in which flow of time is reflected but provide only frequent items using total aggregation values. In this paper we proposes a novel algorithm for finding the relative frequent itemsets according to the time in a data stream. We also propose the method to save frequent items and sub-frequent items in order to take limited memory into account and the method to update time variant frequent items. The performance of the proposed method is analyzed through a series of experiments. The proposed method can search both frequent itemsets and relative frequent itemsets only using the action patterns of the students at each time slot. Thus, our method can enhance the effectiveness of learning and make the best plan for individual learning.

  • PDF

A Study on the Soil Contamination(Maps) Using the Handheld XRF and GIS in Abandoned Mining Areas (휴대용 XRF와 GIS를 이용한 폐광산 지역의 토양오염에 관한 연구)

  • Lee, Hyeon-Gyu;Choi, Yo-Soon
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.17 no.3
    • /
    • pp.195-206
    • /
    • 2014
  • In this study, soil contamination maps related to Cu and Pb were created at the Busan abandoned mine in Korea using a handheld X-Ray Fluorescence(XRF) and Geographic Information Systems(GIS). Hydrological analysis was performed using the Digital Elevation Model(DEM) of the study area to identify the flow directions of surface runoff where pollutants can be dispersed from the soil contamination sources. 24 locations for measuring the soil contamination related to Cu and Pb were selected by considering the result of hydrological analysis. The results measured at the 24 locations using the handheld XRF showed that the highest value of Cu contamination is 8,255ppm and that of Pb is 2,146ppm. The field investigation data were entered into ArcGIS software, and then soil contamination maps regarding Cu and Pb with a 5m grid-spacing were created after performing spatial interpolations using the ordinary kriging method. As a result, we could know that high concentrations of Cu and Pb are presented at the waste and tailings dumps around the abandoned mine openings. This study also showed that the handheld XRF and GIS can be utilized to create soil contamination maps related to Cu and Pb in the field.

Design and Analysis of Efficient Operation Sequencing in FMC Robot Using Simulation and Sequential Patterns (시뮬레이션과 순차 패턴을 이용한 FMC 로봇의 효율적 작업 순서 설계 및 분석)

  • Kim, Sun-Gil;Kim, Youn-Jin;Lee, Hong-Chul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.6
    • /
    • pp.2021-2029
    • /
    • 2010
  • This paper suggested the method to design and analyze FMC robot's dispatching rule using the Simulation and Sequential Patterns. To do this, first of all, we built FMC using simulation and then, extracted signals that facilities call a robot, saved it as the log type. Secondly, we built robot's optimal path using the Sequential Pattern Mining with the results of analyzing the log and relationship between machine and robot actions. Lastly, we adapted it to the A corp.'s manufacturing line for verifying its performance. As a result of applying the new dispatching rule in FMC, total throughput and total flow time decrease because of decreasing material loss time and increasing robot utility. Furthermore, because this method can be applied for every manufacturing plant using simulation, it can contribute to advance total FMC efficiency as well.

Relationship between Diurnal Patterns of Transit Ridership and Land Use in the Metropolitan Seoul Area (서울 대도시권 하루 시간대별 지하철 통행흐름 패턴과 토지이용과의 관계)

  • Lee, Keum-Sook;Song, Ye-Na;Park, Jong-Soo;Anderson, William P.
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.15 no.1
    • /
    • pp.26-41
    • /
    • 2012
  • This study investigates the time-space characteristics of intra-urban passenger flows in the Metropolitan Seoul area. In particular, we analyze the relationships between transit ridership and land use through the use of the subway passenger flow data obtained from the transit transaction databases. For this purpose, the strength of each subway station, i.e., the number of total in-coming and out-going passengers at each station, in the morning, afternoon, and evening, is calculated and visualized, which reflects urban land use patterns. Then the subway stations are classified into four groups via a hierarchical analysis of the in-coming and out-going passenger flows at 353 stations. Each group appears to have characteristic properties according to the region, e.g., residential areas and central business districts. This has been confirmed by the analysis which probes explicitly the relationship between the local socio-economic variables and station groups. This analysis, disclosing the inter-relationship between the subway network and urban land use, may be useful at various stages in urban as well as transportation planning, and provides analytical tools for a wide spectrum of applications ranging from impact evaluation to decision-making and planning support.

  • PDF

Analysis of Research Trends in Tax Compliance using Topic Modeling (토픽모델링을 활용한 조세순응 연구 동향 분석)

  • Kang, Min-Jo;Baek, Pyoung-Gu
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.1
    • /
    • pp.99-115
    • /
    • 2022
  • In this study, domestic academic journal papers on tax compliance, tax consciousness, and faithful tax payment (hereinafter referred to as "tax compliance") were comprehensively analyzed from an interdisciplinary perspective as a representative research topic in the field of tax science. To achieve the research purpose, topic modeling technique was applied as part of text mining. In the flow of data collection-keyword preprocessing-topic model analysis, potential research topics were presented from tax compliance related keywords registered by the researcher in a total of 347 papers. The results of this study can be summarized as follows. First, in the keyword analysis, keywords such as tax investigation, tax avoidance, and honest tax reporting system were included in the top 5 keywords based on simple term-frequency, and in the TF-IDF value considering the relative importance of keywords, they were also included in the top 5 keywords. On the other hand, the keyword, tax evasion, was included in the top keyword based on the TF-IDF value, whereas it was not highlighted in the simple term-frequency. Second, eight potential research topics were derived through topic modeling. The topics covered are (1) tax fairness and suppression of tax offenses, (2) the ideology of the tax law and the validity of tax policies, (3) the principle of substance over form and guarantee of tax receivables (4) tax compliance costs and tax administration services, (5) the tax returns self- assessment system and tax experts, (6) tax climate and strategic tax behavior, (7) multifaceted tax behavior and differential compliance intentions, (8) tax information system and tax resource management. The research comprehensively looked at the various perspectives on the tax compliance from an interdisciplinary perspective, thereby comprehensively grasping past research trends on tax compliance and suggesting the direction of future research.

Experimental Design of Column Flotation for Recovery of High Grade Molybdenite (고품위 몰리브덴 회수를 위한 컬럼부선 요인설계)

  • Hyun Soo Kim;Purev Oyunbileg;Chul-Hyun Park
    • Resources Recycling
    • /
    • v.32 no.6
    • /
    • pp.34-44
    • /
    • 2023
  • In this work, column flotation using factorial design was performed for recovering high-grade molybdenite concentrate. First, the flotation concentrate from Samyang Mining Plant was regrinded to a mean size of 165, 116, 46.7, and 38.4 ㎛ for an increase of the liberation degree. Tests were carried out for various variables affecting column flotation, and then the concentrates with molybdenite grade and recovery of 98.3 % and 95.28 % were obtained, respectively. Also, regression was performed using the statistical analysis program (SPSS 25) with the factorial design and experimental data on particle size, flow wash-water velocity and depressant that affect high grade. From the results, a model equation was derived to predict the molybdenite grade (MG) and recovery (MR) with the relationship between column flotation variables. Factors such as depressant concentration + wash-water velocity and particle size + depressant concentration + wash-water velocity were smaller than the significance level (0.05) and had a significant effect on the dependent variable, grade, and in the recovery model, only particle size and wash-water velocity factors affected the dependent variable, recovery.

Seasonal Variation and Natural Attenuation of Trace Elements in the Stream Water Affected by Mine Drainage from the Abandoned Indae Mine Areas (인대광산 지역 광산배수에 영향을 받은 하천에서 미량원소의 계절적인 수질변화와 자연저감)

  • Kang, Min-Ju;Lee, Pyeong-Koo;Choi, Sang-Hoon
    • Economic and Environmental Geology
    • /
    • v.40 no.3 s.184
    • /
    • pp.277-293
    • /
    • 2007
  • Seasonal and spatial variations in the concentrations of trace elements, pH and Eh were found in a creek watershed affected by mine drainage and leachate from several waste rock dumps within the As-Pb-rich Indae mine site. Because of mining activity dating back to about 40 years ago and rupture of the waste rock dumps, this creek was heavily contaminated. Due to the influx of leachate and mine drainage, the water quality of upstream reach in this creek was characterized by largest seasonal and spatial variations in concentrations of Zn(up to $5.830 mg/{\ell}$), Cu(up to $1.333 mg/{\ell}$), Cd(up to $0.031 mg/{\ell}$) and $SO_4^{2-}$(up to $173 mg/{\ell}$), relatively acidic pH values (3.8-5.1) and highly oxidized condition. The most abundant metals in the leachate samples were in order of Zn($0.045-13.909 mg/{\ell}$), Fe($0.017-8.730mg/{\ell}$), Cu($0.010-4.154mg/{\ell}$) and Cd($n.d.-0.077mg/{\ell}$), with low pH(3.1-6.1), and high $SO_4^{2-}$(up to $310 mg/{\ell}$). The mine drainage also contained high concentrations of Zn, Cu, Cd and $SO_4^{2-}$ and remained constantly near-neutral pH values(6.5-7.0) in all the year. While the leachate and mine drainage might not affect short-term fluctuations in flow, it may significantly influence the concentrations of chemicals in the stream. The abundance and chemistry of Fe-(oxy)hydroxide within this creek indicated that the Fe-(oxy)hydroxide formation could be responsible for some removal of trace elements from the creek waters. Spatial and seasonal variations along down-stream reach of this creek were caused largely by the influx of water from uncontaminated tributaries. In addition, the trace metal concentrations in this creek have been decreased nearly down to the background level at a short distance from the discharge points without any artificial treatments after hydrologic mixing in a tributary. The nonconservative(i.e. precipitation, adsorption, oxidation, dissolution etc.) and conservative(hydrologic mixing) reactions constituted an efficient mechanism of natural attenuation which reduces considerably the transference of trace elements to rivers.

Bankruptcy prediction using an improved bagging ensemble (개선된 배깅 앙상블을 활용한 기업부도예측)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.121-139
    • /
    • 2014
  • Predicting corporate failure has been an important topic in accounting and finance. The costs associated with bankruptcy are high, so the accuracy of bankruptcy prediction is greatly important for financial institutions. Lots of researchers have dealt with the topic associated with bankruptcy prediction in the past three decades. The current research attempts to use ensemble models for improving the performance of bankruptcy prediction. Ensemble classification is to combine individually trained classifiers in order to gain more accurate prediction than individual models. Ensemble techniques are shown to be very useful for improving the generalization ability of the classifier. Bagging is the most commonly used methods for constructing ensemble classifiers. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. Instance selection is to select critical instances while deleting and removing irrelevant and harmful instances from the original set. Instance selection and bagging are quite well known in data mining. However, few studies have dealt with the integration of instance selection and bagging. This study proposes an improved bagging ensemble based on instance selection using genetic algorithms (GA) for improving the performance of SVM. GA is an efficient optimization procedure based on the theory of natural selection and evolution. GA uses the idea of survival of the fittest by progressively accepting better solutions to the problems. GA searches by maintaining a population of solutions from which better solutions are created rather than making incremental changes to a single solution to the problem. The initial solution population is generated randomly and evolves into the next generation by genetic operators such as selection, crossover and mutation. The solutions coded by strings are evaluated by the fitness function. The proposed model consists of two phases: GA based Instance Selection and Instance based Bagging. In the first phase, GA is used to select optimal instance subset that is used as input data of bagging model. In this study, the chromosome is encoded as a form of binary string for the instance subset. In this phase, the population size was set to 100 while maximum number of generations was set to 150. We set the crossover rate and mutation rate to 0.7 and 0.1 respectively. We used the prediction accuracy of model as the fitness function of GA. SVM model is trained on training data set using the selected instance subset. The prediction accuracy of SVM model over test data set is used as fitness value in order to avoid overfitting. In the second phase, we used the optimal instance subset selected in the first phase as input data of bagging model. We used SVM model as base classifier for bagging ensemble. The majority voting scheme was used as a combining method in this study. This study applies the proposed model to the bankruptcy prediction problem using a real data set from Korean companies. The research data used in this study contains 1832 externally non-audited firms which filed for bankruptcy (916 cases) and non-bankruptcy (916 cases). Financial ratios categorized as stability, profitability, growth, activity and cash flow were investigated through literature review and basic statistical methods and we selected 8 financial ratios as the final input variables. We separated the whole data into three subsets as training, test and validation data set. In this study, we compared the proposed model with several comparative models including the simple individual SVM model, the simple bagging model and the instance selection based SVM model. The McNemar tests were used to examine whether the proposed model significantly outperforms the other models. The experimental results show that the proposed model outperforms the other models.