• Title/Summary/Keyword: Stream Data Mining

Search Result 97, Processing Time 0.025 seconds

ADA: Advanced data analytics methods for abnormal frequent episodes in the baseline data of ISD

  • Biswajit Biswal;Andrew Duncan;Zaijing Sun
    • Nuclear Engineering and Technology
    • /
    • v.54 no.11
    • /
    • pp.3996-4004
    • /
    • 2022
  • The data collected by the In-Situ Decommissioning (ISD) sensors are time-specific, age-specific, and developmental stage-specific. Research has been done on the stream data collected by ISD testbed in the recent few years to seek both frequent episodes and abnormal frequent episodes. Frequent episodes in the data stream have confirmed the daily cycle of the sensor responses and established sequences of different types of sensors, which was verified by the experimental setup of the ISD Sensor Network Test Bed. However, the discovery of abnormal frequent episodes remained a challenge because these abnormal frequent episodes are very small signals and may be buried in the background noise of voltage and current changes. In this work, we proposed Advanced Data Analytics (ADA) methods that are applied to the baseline data to identify frequent episodes and extended our approach by adding more features extracted from the baseline data to discover abnormal frequent episodes, which may lead to the early indicators of ISD system failures. In the study, we have evaluated our approach using the baseline data, and the performance evaluation results show that our approach is able to discover frequent episodes as well as abnormal frequent episodes conveniently.

Novel Push-Front Fibonacci Windows Model for Finding Emerging Patterns with Better Completeness and Accuracy

  • Akhriza, Tubagus Mohammad;Ma, Yinghua;Li, Jianhua
    • ETRI Journal
    • /
    • v.40 no.1
    • /
    • pp.111-121
    • /
    • 2018
  • To find the emerging patterns (EPs) in streaming transaction data, the streaming is first divided into some time windows containing a number of transactions. Itemsets are generated from transactions in each window, and then the emergence of itemsets is evaluated between two windows. In the tilted-time windows model (TTWM), it is assumed that people need support data with finer accuracy from the most recent windows, while accepting coarser accuracy from older windows. Therefore, a limited array's elements are used to maintain all support data in a way that condenses old windows by merging them inside one element. The capacity of elements that accommodates the windows inside is modeled using a particular number sequence. However, in a stream, as new data arrives, the current array updating mechanisms lead to many null elements in the array and cause data incompleteness and inaccuracy problems. Two models derived from TTWM, logarithmic TTWM and Fibonacci windows model, also inherit the same problems. This article proposes a novel push-front Fibonacci windows model as a solution, and experiments are conducted to demonstrate its superiority in finding more EPs compared to other models.

The Comparison Among Prediction Methods of Water Demand And Analysis of Data on Water Services Using Data Mining Techniques (데이터마이닝 기법을 활용한 상수 이용현황 분석 및 단기 물 수요예측 방법 비교)

  • Ahn, Jihoon;Kim, Jinhwa
    • The Journal of Bigdata
    • /
    • v.1 no.1
    • /
    • pp.9-17
    • /
    • 2016
  • This study identifies major features in water supply and introduces important factors in water services based on the information from data mining analysis of water quantity and water pressure measured from sensors. It also suggests more accurate methods using multiple regression analysis and neural network in predicting short term prediction of water demand in water service. A small block of a county is selected for the data collection and tests. There isa water demand on business such as public offices and hospitalstoo in this area. Real stream data from sensors in this area is collected. Among 2,728 data sets collected, 2,632 sets are used for modelling and 96 sets are used for testing. The shows that neural network is better than multiple regression analysis in their prediction performance.

  • PDF

An Optimization Technique for Smart-Walk Systems Using Big Stream Log Data (Smart-Walk 시스템에서 스트림 빅데이터 분석을 통한 최적화 기법)

  • Cho, Wan-Sup;Yang, Kyung-Eun;Lee, Joong-Yeub
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.3
    • /
    • pp.105-114
    • /
    • 2012
  • Various RFID-based smart-walk systems have been developed for guiding disabled people. The system sends appropriate message whenever the disabled people arrived at a specific point. We propose universal design concept and optimization techniques for the smart-walk systems. Universal design concept can be adopted for supporting various kinds of disabled such as a blind person, a hearing-impaired person, or a foreigner in a system. It can be supported by storing appropriate messages set in the message database table depending on the kinds of the disabled. System optimization can be done by analyzing operational log(stream) data accumulated in the system. Useful information can be extracted by analyzing or mining the accumulated operational log data. We show various analysis results from the operational log data.

Mining Frequent Service Patterns using Graph (그래프를 이용한 빈발 서비스 탐사)

  • Hwang, Jeong-Hee
    • Journal of Digital Contents Society
    • /
    • v.19 no.3
    • /
    • pp.471-477
    • /
    • 2018
  • As time changes, users change their interest. In this paper, we propose a method to provide suitable service for users by dynamically weighting service interests in the context of age, timing, and seasonal changes in ubiquitous environment. Based on the service history data presented to users according to the age or season, we also offer useful services by continuously adding the most recent service rules to reflect the changing of service interest. To do this, a set of services is considered as a transaction and each service is considered as an item in a transaction. And also we represent the association of services in a graph and extract frequent service items that refer to the latest information services for users.

EXTENDED ONLINE DIVISIVE AGGLOMERATIVE CLUSTERING

  • Musa, Ibrahim Musa Ishag;Lee, Dong-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.406-409
    • /
    • 2008
  • Clustering data streams has an importance over many applications like sensor networks. Existing hierarchical methods follow a semi fuzzy clustering that yields duplicate clusters. In order to solve the problems, we propose an extended online divisive agglomerative clustering on data streams. It builds a tree-like top-down hierarchy of clusters that evolves with data streams using geometric time frame for snapshots. It is an enhancement of the Online Divisive Agglomerative Clustering (ODAC) with a pruning strategy to avoid duplicate clusters. Our main features are providing update time and memory space which is independent of the number of examples on data streams. It can be utilized for clustering sensor data and network monitoring as well as web click streams.

  • PDF

The Big Data Analytics Regarding the Cadastral Resurvey News Articles

  • Joo, Yong-Jin;Kim, Duck-Ho
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.32 no.6
    • /
    • pp.651-659
    • /
    • 2014
  • With the popularization of big data environment, big data have been highlighted as a key information strategy to establish national spatial data infrastructure for a scientific land policy and the extension of the creative economy. Especially interesting from our point of view is the cadastral information is a core national information source that forms the basis of spatial information that leads to people's daily life including the production and consumption of information related to real estate. The purpose of our paper is to suggest the scheme of big data analytics with respect to the articles of cadastral resurvey project in order to approach cadastral information in terms of spatial data integration. As specific research method, the TM (Text Mining) package from R was used to read various formats of news reports as texts, and nouns were extracted by using the KoNLP package. That is, we searched the main keywords regarding cadastral resurvey, performing extraction of compound noun and data mining analysis. And visualization of the results was presented. In addition, new reports related to cadastral resurvey between 2012 and 2014 were searched in newspapers, and nouns were extracted from the searched data for the data mining analysis of cadastral information. Furthermore, the approval rating, reliability, and improvement of rules were presented through correlation analyses among the extracted compound nouns. As a result of the correlation analysis among the most frequently used ones of the extracted nouns, five groups of data consisting of 133 keywords were generated. The most frequently appeared words were "cadastral resurvey," "civil complaint," "dispute," "cadastral survey," "lawsuit," "settlement," "mediation," "discrepant land," and "parcel." In Conclusions, the cadastral resurvey performed in some local governments has been proceeding smoothly as positive results. On the other hands, disputes from owner of land have been provoking a stream of complaints from parcel surveying for the cadastral resurvey. Through such keyword analysis, various public opinion and the types of civil complaints related to the cadastral resurvey project can be identified to prevent them through pre-emptive responses for direct call centre on the cadastral surveying, Electronic civil service and customer counseling, and high quality services about cadastral information can be provided. This study, therefore, provides a stepping stones for developing an account of big data analytics which is able to comprehensively examine and visualize a variety of news report and opinions in cadastral resurvey project promotion. Henceforth, this will contribute to establish the foundation for a framework of the information utilization, enabling scientific decision making with speediness and correctness.

Model Development for Specific Degradation Using Data Mining and Geospatial Analysis of Erosion and Sedimentation Features

  • Kang, Woochul;Kang, Joongu;Jang, Eunkyung;Julien, Piere Y.
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.85-85
    • /
    • 2020
  • South Korea experiences few large scale erosion and sedimentation problems, however, there are numerous local sedimentation problems. A reliable and consistent approach to modelling and management for sediment processes are desirable in the country. In this study, field measurements of sediment concentration from 34 alluvial river basins in South Korea were used with the Modified Einstein Procedure (MEP) to determine the total sediment load at the sampling locations. And then the Flow Duration-Sediment Rating Curve (FD-SRC) method was used to estimate the specific degradation for all gauging stations. The specific degradation of most rivers were found to be typically 50-300 tons/㎢·yr. A model tree data mining technique was applied to develop a model for the specific degradation based on various watershed characteristics of each watershed from GIS analysis. The meaningful parameters are: 1) elevation at the middle relative area of the hypsometric curve [m], 2) percentage of wetland and water [%], 3) percentage of urbanized area [%], and 4) Main stream length [km]. The Root Mean Square Error (RMSE) of existing models is in excess of 1,250 tons/㎢·yr and the RMSE of the proposed model with 6 additional validations decreased to 65 tons/㎢·yr. Erosion loss maps from the Revised Universal Soil Loss Equation (RUSLE), satellite images, and aerial photographs were used to delineate the geospatial features affecting erosion and sedimentation. The results of the geospatial analysis clearly shows that the high risk erosion area (hill slopes and construction sites at urbanized area) and sedimentation features (wetlands and agricultural reservoirs). The result of physiographical analysis also indicates that the watershed morphometric characteristic well explain the sediment transport. Sustainable management with the data mining methodologies and geospatial analysis could be helpful to solve various erosion and sedimentation problems under different conditions.

  • PDF

P-wave velocity structure in Southern Korea by using Velest program (Velest를 이용한 남한 지역의 P파 속도구조 분석)

  • 전정수
    • Proceedings of the Earthquake Engineering Society of Korea Conference
    • /
    • 2000.04a
    • /
    • pp.49-54
    • /
    • 2000
  • Korea Institute of Geology Mining and Materials(KIGAM) has been operating Korean Earthquake Monitoring System(KEMS) to archive the real-time data stream and to determine event parameters (epicenter origin time and magnitude)by the automatic processing and analyst review. To do this KEMS uses the Vindel Hue's velocity model which was derived from Wonju KSRS data. Because KIGAM now receives the real-time data from many stations including Wonju KSRS Cholwon seismo-acoustic array Uljin Wolsung Youngkwang Taejon Seoul Kimcheon Taegu etc. the proper velocity model should be established around the Korean peninsula, In this study P were velocity structures was derived from VELEST program using 69 events among the 835 events determined by KEMS in 1999 which were recorded by at least 5 stations. General trend of velocity structure was similar to Sang Jo Kim's model but velocity value was low in crust and high in upper mantle. Due to the sensitivity of inversion results to the initial input model the artificial short and blast data might be added.

  • PDF

In-memory Compression Scheme Based on Incremental Frequent Patterns for Graph Streams (그래프 스트림 처리를 위한 점진적 빈발 패턴 기반 인-메모리 압축 기법)

  • Lee, Hyeon-Byeong;Shin, Bo-Kyoung;Bok, Kyoung-Soo;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.1
    • /
    • pp.35-46
    • /
    • 2022
  • Recently, with the development of network technologies, as IoT and social network service applications have been actively used, a lot of graph stream data is being generated. In this paper, we propose a graph compression scheme that considers the stream graph environment by applying graph mining to the existing compression technique, which has been focused on compression rate and runtime. In this paper, we proposed Incremental frequent pattern based compression technique for graph streams. Since the proposed scheme keeps only the latest reference patterns, it increases the storage utilization and improves the query processing time. In order to show the superiority of the proposed scheme, various performance evaluations are performed in terms of compression rate and processing time compared to the existing method. The proposed scheme is faster than existing similar scheme when the number of duplicated data is large.