• Title/Summary/Keyword: Time Series Data Analysis

Search Result 1,862, Processing Time 0.032 seconds

A Study on the Application of Outlier Analysis for Fraud Detection: Focused on Transactions of Auction Exception Agricultural Products (부정 탐지를 위한 이상치 분석 활용방안 연구 : 농수산 상장예외품목 거래를 대상으로)

  • Kim, Dongsung;Kim, Kitae;Kim, Jongwoo;Park, Steve
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.93-108
    • /
    • 2014
  • To support business decision making, interests and efforts to analyze and use transaction data in different perspectives are increasing. Such efforts are not only limited to customer management or marketing, but also used for monitoring and detecting fraud transactions. Fraud transactions are evolving into various patterns by taking advantage of information technology. To reflect the evolution of fraud transactions, there are many efforts on fraud detection methods and advanced application systems in order to improve the accuracy and ease of fraud detection. As a case of fraud detection, this study aims to provide effective fraud detection methods for auction exception agricultural products in the largest Korean agricultural wholesale market. Auction exception products policy exists to complement auction-based trades in agricultural wholesale market. That is, most trades on agricultural products are performed by auction; however, specific products are assigned as auction exception products when total volumes of products are relatively small, the number of wholesalers is small, or there are difficulties for wholesalers to purchase the products. However, auction exception products policy makes several problems on fairness and transparency of transaction, which requires help of fraud detection. In this study, to generate fraud detection rules, real huge agricultural products trade transaction data from 2008 to 2010 in the market are analyzed, which increase more than 1 million transactions and 1 billion US dollar in transaction volume. Agricultural transaction data has unique characteristics such as frequent changes in supply volumes and turbulent time-dependent changes in price. Since this was the first trial to identify fraud transactions in this domain, there was no training data set for supervised learning. So, fraud detection rules are generated using outlier detection approach. We assume that outlier transactions have more possibility of fraud transactions than normal transactions. The outlier transactions are identified to compare daily average unit price, weekly average unit price, and quarterly average unit price of product items. Also quarterly averages unit price of product items of the specific wholesalers are used to identify outlier transactions. The reliability of generated fraud detection rules are confirmed by domain experts. To determine whether a transaction is fraudulent or not, normal distribution and normalized Z-value concept are applied. That is, a unit price of a transaction is transformed to Z-value to calculate the occurrence probability when we approximate the distribution of unit prices to normal distribution. The modified Z-value of the unit price in the transaction is used rather than using the original Z-value of it. The reason is that in the case of auction exception agricultural products, Z-values are influenced by outlier fraud transactions themselves because the number of wholesalers is small. The modified Z-values are called Self-Eliminated Z-scores because they are calculated excluding the unit price of the specific transaction which is subject to check whether it is fraud transaction or not. To show the usefulness of the proposed approach, a prototype of fraud transaction detection system is developed using Delphi. The system consists of five main menus and related submenus. First functionalities of the system is to import transaction databases. Next important functions are to set up fraud detection parameters. By changing fraud detection parameters, system users can control the number of potential fraud transactions. Execution functions provide fraud detection results which are found based on fraud detection parameters. The potential fraud transactions can be viewed on screen or exported as files. The study is an initial trial to identify fraud transactions in Auction Exception Agricultural Products. There are still many remained research topics of the issue. First, the scope of analysis data was limited due to the availability of data. It is necessary to include more data on transactions, wholesalers, and producers to detect fraud transactions more accurately. Next, we need to extend the scope of fraud transaction detection to fishery products. Also there are many possibilities to apply different data mining techniques for fraud detection. For example, time series approach is a potential technique to apply the problem. Even though outlier transactions are detected based on unit prices of transactions, however it is possible to derive fraud detection rules based on transaction volumes.

Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary (주가지수 방향성 예측을 위한 주제지향 감성사전 구축 방안)

  • Yu, Eunji;Kim, Yoosin;Kim, Namgyu;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.95-110
    • /
    • 2013
  • Recently, the amount of unstructured data being generated through a variety of social media has been increasing rapidly, resulting in the increasing need to collect, store, search for, analyze, and visualize this data. This kind of data cannot be handled appropriately by using the traditional methodologies usually used for analyzing structured data because of its vast volume and unstructured nature. In this situation, many attempts are being made to analyze unstructured data such as text files and log files through various commercial or noncommercial analytical tools. Among the various contemporary issues dealt with in the literature of unstructured text data analysis, the concepts and techniques of opinion mining have been attracting much attention from pioneer researchers and business practitioners. Opinion mining or sentiment analysis refers to a series of processes that analyze participants' opinions, sentiments, evaluations, attitudes, and emotions about selected products, services, organizations, social issues, and so on. In other words, many attempts based on various opinion mining techniques are being made to resolve complicated issues that could not have otherwise been solved by existing traditional approaches. One of the most representative attempts using the opinion mining technique may be the recent research that proposed an intelligent model for predicting the direction of the stock index. This model works mainly on the basis of opinions extracted from an overwhelming number of economic news repots. News content published on various media is obviously a traditional example of unstructured text data. Every day, a large volume of new content is created, digitalized, and subsequently distributed to us via online or offline channels. Many studies have revealed that we make better decisions on political, economic, and social issues by analyzing news and other related information. In this sense, we expect to predict the fluctuation of stock markets partly by analyzing the relationship between economic news reports and the pattern of stock prices. So far, in the literature on opinion mining, most studies including ours have utilized a sentiment dictionary to elicit sentiment polarity or sentiment value from a large number of documents. A sentiment dictionary consists of pairs of selected words and their sentiment values. Sentiment classifiers refer to the dictionary to formulate the sentiment polarity of words, sentences in a document, and the whole document. However, most traditional approaches have common limitations in that they do not consider the flexibility of sentiment polarity, that is, the sentiment polarity or sentiment value of a word is fixed and cannot be changed in a traditional sentiment dictionary. In the real world, however, the sentiment polarity of a word can vary depending on the time, situation, and purpose of the analysis. It can also be contradictory in nature. The flexibility of sentiment polarity motivated us to conduct this study. In this paper, we have stated that sentiment polarity should be assigned, not merely on the basis of the inherent meaning of a word but on the basis of its ad hoc meaning within a particular context. To implement our idea, we presented an intelligent investment decision-support model based on opinion mining that performs the scrapping and parsing of massive volumes of economic news on the web, tags sentiment words, classifies sentiment polarity of the news, and finally predicts the direction of the next day's stock index. In addition, we applied a domain-specific sentiment dictionary instead of a general purpose one to classify each piece of news as either positive or negative. For the purpose of performance evaluation, we performed intensive experiments and investigated the prediction accuracy of our model. For the experiments to predict the direction of the stock index, we gathered and analyzed 1,072 articles about stock markets published by "M" and "E" media between July 2011 and September 2011.

Wave Analysis and Spectrum Estimation for the Optimal Design of the Wave Energy Converter in the Hupo Coastal Sea (파력발전장치 설계를 위한후포 연안의 파랑 분석 및 스펙트럼 추정)

  • Kweon, Hyuck-Min;Cho, Hongyeon;Jeong, Weon-Mu
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.25 no.3
    • /
    • pp.147-153
    • /
    • 2013
  • There exist various types of the WEC (Wave Energy Converter), and among them, the point absorber is the most popularly investigated type. However, it is difficult to find examples of systematically measured data analysis for the design of the point absorber type of power buoy in the world. The study investigates the wave load acting on the point absorber type resonance power buoy wave energy extraction system proposed by Kweon et al. (2010). This study analyzes the time series spectra with respect to the three-year wave data (2002.05.01~2005.03.29) measured using the pressure type wave gage at the seaside of north breakwater of Hupo harbor located in the east coast of the Korean peninsula. From the analysis results, it could be deduced that monthly wave period and wave height variations were apparent and that monthly wave powers were unevenly distributed annually. The average wave steepness of the usual wave was 0.01, lower than that of the wind wave range of 0.02-0.04. The mode of the average wave period has the value of 5.31 sec, while mode of the wave height of the applicable period has the value of 0.29 m. The occurrence probability of the peak period is a bi-modal type, with a mode value between 4.47 sec and 6.78 sec. The design wave period can be selected from the above four values of 0.01, 5.31, 4.47, 6.78. About 95% of measured wave heights are below 1 m. Through this study, it was found that a resonance power buoy system is necessary in coastal areas with low wave energy and that the optimal design for overcoming the uneven monthly distribution of wave power is a major task in the development of a WEF (Wave Energy Farm). Finding it impossible to express the average spectrum of the usual wave in terms of the standard spectrum equation, this study proposes a new spectrum equation with three parameters, with which basic data for the prediction of the power production using wave power buoy and the fatigue analysis of the system can be given.

A Study on Retrieval of Storage Heat Flux in Urban Area (우리나라 도심지에서의 저장열 산출에 관한 연구)

  • Lee, Darae;Kim, Honghee;Lee, Sang-Hyun;Lee, Doo-Il;Hong, Jinkyu;Hong, Je-Woo;Lee, Keunmin;Lee, Kyeong-sang;Seo, Minji;Han, Kyung-Soo
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.2_1
    • /
    • pp.301-306
    • /
    • 2018
  • Urbanization causes urban floods and urban heat island in the summer, so it is necessary to understanding the changes of the thermal environment through urban climate and energy balance. This can be explained by the energy balance, but in urban areas, unlike the typical energy balance, the storage heat flux saved in the building or artificial land cover should be considered. Since the environment of each city is different, there is a difficulty in applying the method of retrieving the storage heat flux of the previous research. Especially, most of the previous studies are focused on the overseas cities, so it is necessary to study the storage heat retrieval suitable for various land cover and building characteristics of the urban areas in Korea. Therefore, the object of this study, it is to derive the regression formula which can quantitatively retrieve the storage heat using the data of the area where various surface types exist. To this end, nonlinear regression analysis was performed using net radiation and surface temperature data as independent variables and flux tower based storage heat estimates as dependent variables. The retrieved regression coefficients were applied to each independent variable to derive the storage heat retrieval regression formula. As a result of time series analysis with flux tower based storage heat estimates, it was well simulated high peak at day time and the value at night. Moreover storage heat retrieved in this study was possible continuous retrieval than flux tower based storage heat estimates. As a result of scatter plot analysis, accuracy of retrieved storage heat was found to be significant at $50.14Wm^{-2}$ and bias $-0.94Wm^{-2}$.

A Comparative Study on Failure Pprediction Models for Small and Medium Manufacturing Company (중소제조기업의 부실예측모형 비교연구)

  • Hwangbo, Yun;Moon, Jong Geon
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.11 no.3
    • /
    • pp.1-15
    • /
    • 2016
  • This study has analyzed predication capabilities leveraging multi-variate model, logistic regression model, and artificial neural network model based on financial information of medium-small sized companies list in KOSDAQ. 83 delisted companies from 2009 to 2012 and 83 normal companies, i.e. 166 firms in total were sampled for the analysis. Modelling with training data was mobilized for 100 companies inlcuding 50 delisted ones and 50 normal ones at random out of the 166 companies. The rest of samples, 66 companies, were used to verify accuracies of the models. Each model was designed by carrying out T-test with 79 financial ratios for the last 5 years and identifying 9 significant variables. T-test has shown that financial profitability variables were major variables to predict a financial risk at an early stage, and financial stability variables and financial cashflow variables were identified as additional significant variables at a later stage of insolvency. When predication capabilities of the models were compared, for training data, a logistic regression model exhibited the highest accuracy while for test data, the artificial neural networks model provided the most accurate results. There are differences between the previous researches and this study as follows. Firstly, this study considered a time-series aspect in light of the fact that failure proceeds gradually. Secondly, while previous studies constructed a multivariate discriminant model ignoring normality, this study has reviewed the regularity of the independent variables, and performed comparisons with the other models. Policy implications of this study is that the reliability for the disclosure documents is important because the simptoms of firm's fail woule be shown on financial statements according to this paper. Therefore institutional arragements for restraing moral laxity from accounting firms or its workers should be strengthened.

  • PDF

On Securing Continuity of Long-Term Observational Eddy Flux Data: Field Intercomparison between Open- and Enclosed-Path Gas Analyzers (장기 관측 에디 플럭스 자료의 연속성 확보에 대하여: 개회로 및 봉폐회로 기체분석기의 야외 상호 비교)

  • Kang, Minseok;Kim, Joon;Yang, Hyunyoung;Lim, Jong-Hwan;Chun, Jung-Hwa;Moon, Minkyu
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.21 no.3
    • /
    • pp.135-145
    • /
    • 2019
  • Analysis of a long cycle or a trend of time series data based on a long-term observation would require comparability between data observed in the past and the present. In the present study, we proposed an approach to ensure the compatibility among the instruments used for the long-term observation, which would allow to secure continuity of the data. An open-path gas analyzer (Model LI-7500, LI-COR, Inc., USA) has been used for eddy covariance flux measurement in the Gwangneung deciduous forest for more than 10 years. The open-path gas analyzer was replaced by an enclosed-path gas analyzer (Model EC155, Campbell Scientific, Inc., USA) in July 2015. Before completely replacing the gas analyzer, the carbon dioxide ($CO_2$) and latent heat fluxes were collected using both gas analyzers simultaneously during a five-month period from August to December in 2015. It was found that the $CO_2$ fluxes were not significantly different between the gas analyzers under the condition that the daily mean temperature was higher than $0^{\circ}C$. However, the $CO_2$ flux measured by the open-path gas analyzer was negatively biased (from positive sign, i.e., carbon source, to 0 or negative sign, i.e., carbon neutral or sink) due to the instrument surface heating under the condition that the daily mean temperature was lower than $0^{\circ}C$. Despite applying the frequency response correction associated with tube attenuation of water vapor, the latent heat flux measured by the enclosed-path gas analyzer was on average 9% smaller than that measured by the open-path gas analyzer, which resulted in >20% difference of the sums over the study period. These results indicated that application of the additional air density correction would be needed due to the instrument heat and analysis of the long-term observational flux data would be facilitated by understanding the underestimation tendency of latent heat flux measurements by an enclosed-path gas analyzer.

A Topic Modeling-based Recommender System Considering Changes in User Preferences (고객 선호 변화를 고려한 토픽 모델링 기반 추천 시스템)

  • Kang, So Young;Kim, Jae Kyeong;Choi, Il Young;Kang, Chang Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.43-56
    • /
    • 2020
  • Recommender systems help users make the best choice among various options. Especially, recommender systems play important roles in internet sites as digital information is generated innumerable every second. Many studies on recommender systems have focused on an accurate recommendation. However, there are some problems to overcome in order for the recommendation system to be commercially successful. First, there is a lack of transparency in the recommender system. That is, users cannot know why products are recommended. Second, the recommender system cannot immediately reflect changes in user preferences. That is, although the preference of the user's product changes over time, the recommender system must rebuild the model to reflect the user's preference. Therefore, in this study, we proposed a recommendation methodology using topic modeling and sequential association rule mining to solve these problems from review data. Product reviews provide useful information for recommendations because product reviews include not only rating of the product but also various contents such as user experiences and emotional state. So, reviews imply user preference for the product. So, topic modeling is useful for explaining why items are recommended to users. In addition, sequential association rule mining is useful for identifying changes in user preferences. The proposed methodology is largely divided into two phases. The first phase is to create user profile based on topic modeling. After extracting topics from user reviews on products, user profile on topics is created. The second phase is to recommend products using sequential rules that appear in buying behaviors of users as time passes. The buying behaviors are derived from a change in the topic of each user. A collaborative filtering-based recommendation system was developed as a benchmark system, and we compared the performance of the proposed methodology with that of the collaborative filtering-based recommendation system using Amazon's review dataset. As evaluation metrics, accuracy, recall, precision, and F1 were used. For topic modeling, collapsed Gibbs sampling was conducted. And we extracted 15 topics. Looking at the main topics, topic 1, top 3, topic 4, topic 7, topic 9, topic 13, topic 14 are related to "comedy shows", "high-teen drama series", "crime investigation drama", "horror theme", "British drama", "medical drama", "science fiction drama", respectively. As a result of comparative analysis, the proposed methodology outperformed the collaborative filtering-based recommendation system. From the results, we found that the time just prior to the recommendation was very important for inferring changes in user preference. Therefore, the proposed methodology not only can secure the transparency of the recommender system but also can reflect the user's preferences that change over time. However, the proposed methodology has some limitations. The proposed methodology cannot recommend product elaborately if the number of products included in the topic is large. In addition, the number of sequential patterns is small because the number of topics is too small. Therefore, future research needs to consider these limitations.

Assessment of Ecosystem Productivity and Efficiency using Flux Measurement over Haenam Farmland Site in Korea (HFK) (플럭스 관측 기반의 생태계 생산성과 효율성 평가: 해남 농경지 연구 사례)

  • Indrawati, Yohana Maria;Kim, Joon;Kang, Minseok
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.20 no.1
    • /
    • pp.57-72
    • /
    • 2018
  • Time series analysis of tower flux measurement can be used to build quantitative evidence for the achievement of climate-smart agriculture (CSA). In this study, we have assessed the first objective of CSA (regarding ecosystem productivity and efficiency) for rice paddy-dominated heterogeneous farmland. A set of quantitative indicators were evaluated by analysing the time series data of carbon, water and energy fluxes over the Haenam farmland site in Korea (HFK) during the rice growing seasons from 2003 to 2015. Four different varieties of rice were cultivated during the study period in chronological order of Dongjin No. 1 (2003-2008), Nampyung (2009), Onnuri (2010-2011), and Saenuri (2012-2015). Overall at HFK, gross primary productivity (GPP) ranged from 800 to $944g\;C\;m^{-2}$, water use efficiency (WUE) ranged from 1.91 to $2.80g\;C\;kg\;H_2O^{-1}$, carbon uptake efficiency (CUE) ranged from 1.06 to 1.34, and light use efficiency (LUE) ranged from 0.99 to $1.55g\;C\;MJ^{-1}$. Among the four rice varieties, Dongjin No. 1-dominated HFK showed the highest productivity with higher WUE and LUE, but comparable CUE. Considering the heterogeneous vegetation cover at HFK, a rule of thumb comparison suggested that the productivity of Dongjin No1-dominated HFK was comparable to those of monoculture rice paddies in Asia, whereas HFK was more efficient in water use and less efficient in carbon uptake. Saenuri-dominated HFK also produced high productivity but with the growing season length longer than Dongjin No.1. Although the latter showed better traits for CSA, farmers cultivate Saenuri because of higher pest resistance (associated with adaptability and resilience). This emphasizes the need for the evaluation of other two objectives of CSA (i.e. system resilience and greenhouse gas mitigation) for complete assessment at HFK, which is currently in progress.

A Study for establishment of soil moisture station in mountain terrain (1): the representative analysis of soil moisture for construction of Cosmic-ray verification system (산악 지형에서의 토양수분 관측소 구축을 위한 연구(1): Cosmic-ray 검증시스템 구축을 위한 토양수분량 대표성 분석 연구)

  • Kim, Kiyoung;Jung, Sungwon;Lee, Yeongil
    • Journal of Korea Water Resources Association
    • /
    • v.52 no.1
    • /
    • pp.51-60
    • /
    • 2019
  • The major purpose of this study is to construct an in-situ soil moisture verification network employing Frequency Domain Reflectometry (FDR) sensors for Cosmic-ray soil moisture observation system operation as well as long-term field-scale soil moisture monitoring. The test bed of Cosmic-ray and FDR verification network system was established at the Sulma Catchment, in connection with the existing instrumentations for integrated data provision of various hydrologic variables. This test bed includes one Cosmic-ray Neutron Probe (CRNP) and ten FDR stations with four different measurement depths (10 cm, 20 cm, 30 cm, and 40 cm) at each station, and has been operating since July 2018. Furthermore, to assess the reliability of the in-situ verification network, the volumetric water content data measured by FDR sensors were compared to those calculated through the core sampling method. The evaluation results of FDR sensors- measured soil moisture against sampling method during the study period indicated a reasonable agreement, with average values of $bias=-0.03m^3/m^3$ and RMSE $0.03m^3/m^3$, revealing that this FDR network is adequate to provide long-term reliable field-scale soil moisture monitoring at Sulmacheon basin. In addition, soil moisture time series observed at all FDR stations during the study period generally respond well to the rainfall events; and at some locations, the characteristics of rainfall water intercepted by canopy were also identified. The Temporal Stability Analysis (TSA) was performed for all FDR stations located within the CRNP footprint at each measurement depth to determine the representative locations for field-average soil moisture at different soil profiles of the verification network. The TSA results showed that superior performances were obtained at FDR 5 for 10 cm depth, FDR 8 for 20 cm depth, FDR2 for 30 cm depth, and FDR1 for 40 cm depth, respectively; demonstrating that those aforementioned stations can be regarded as temporal stable locations to represent field mean soil moisture measurements at their corresponding measurement depths. Although the limit on study duration has been presented, the analysis results of this study can provide useful knowledge on soil moisture variability and stability at the test bed, as well as supporting the utilization of the Cosmic-ray observation system for long-term field-scale soil moisture monitoring.

Estimation of Groundwater Recharge by Considering Runoff Process and Groundwater Level Variation in Watershed (유역 유출과정과 지하수위 변동을 고려한 분포형 지하수 함양량 산정방안)

  • Chung, Il-Moon;Kim, Nam-Won;Lee, Jeong-Woo
    • Journal of Soil and Groundwater Environment
    • /
    • v.12 no.5
    • /
    • pp.19-32
    • /
    • 2007
  • In Korea, there have been various methods of estimating groundwater recharge which generally can be subdivided into three types: baseflow separation method by means of groundwater recession curve, water budget analysis based on lumped conceptual model in watershed, and water table fluctuation method (WTF) by using the data from groundwater monitoring wells. However, groundwater recharge rate shows the spatial-temporal variability due to climatic condition, land use and hydrogeological heterogeneity, so these methods have various limits to deal with these characteristics. To overcome these limitations, we present a new method of estimating recharge based on water balance components from the SWAT-MODFLOW which is an integrated surface-ground water model. Groundwater levels in the interest area close to the stream have dynamics similar to stream flow, whereas levels further upslope respond to precipitation with a delay. As these behaviours are related to the physical process of recharge, it is needed to account for the time delay in aquifer recharge once the water exits the soil profile to represent these features. In SWAT, a single linear reservoir storage module with an exponential decay weighting function is used to compute the recharge from soil to aquifer on a given day. However, this module has some limitations expressing recharge variation when the delay time is too long and transient recharge trend does not match to the groundwater table time series, the multi-reservoir storage routing module which represents more realistic time delay through vadose zone is newly suggested in this study. In this module, the parameter related to the delay time should be optimized by checking the correlation between simulated recharge and observed groundwater levels. The final step of this procedure is to compare simulated groundwater table with observed one as well as to compare simulated watershed runoff with observed one. This method is applied to Mihocheon watershed in Korea for the purpose of testing the procedure of proper estimation of spatio-temporal groundwater recharge distribution. As the newly suggested method of estimating recharge has the advantages of effectiveness of watershed model as well as the accuracy of WTF method, the estimated daily recharge rate would be an advanced quantity reflecting the heterogeneity of hydrogeology, climatic condition, land use as well as physical behaviour of water in soil layers and aquifers.