• Title/Summary/Keyword: Data Paper

Search Result 56,207, Processing Time 0.078 seconds

Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values

  • Kim, Su-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.2
    • /
    • pp.315-321
    • /
    • 2011
  • Gene expression microarray data often include multiple missing values. Most gene expression analysis (including gene clustering analysis); however, require a complete data matric as an input. In ordinary clustering methods, just a single missing value makes one abandon the whole data of a gene even if the rest of data for that gene was intact. The quality of analysis may decrease seriously as the missing rate is increased. In the opposite aspect, the imputation of missing value may result in an artifact that reduces the reliability of the analysis. To clarify this contradiction in microarray clustering analysis, this paper compared the accuracy of clustering with and without imputation over several microarray data having different missing rates. This paper also tested the clustering efficiency of several imputation methods including our propose algorithm. The results showed it is worthwhile to check the clustering result in this alternative way without any imputed data for the imperfect microarray data.

Can Big Data Help Predict Financial Market Dynamics?: Evidence from the Korean Stock Market

  • Pyo, Dong-Jin
    • East Asian Economic Review
    • /
    • v.21 no.2
    • /
    • pp.147-165
    • /
    • 2017
  • This study quantifies the dynamic interrelationship between the KOSPI index return and search query data derived from the Naver DataLab. The empirical estimation using a bivariate GARCH model reveals that negative contemporaneous correlations between the stock return and the search frequency prevail during the sample period. Meanwhile, the search frequency has a negative association with the one-week- ahead stock return but not vice versa. In addition to identifying dynamic correlations, the paper also aims to serve as a test bed in which the existence of profitable trading strategies based on big data is explored. Specifically, the strategy interpreting the heightened investor attention as a negative signal for future returns appears to have been superior to the benchmark strategy in terms of the expected utility over wealth. This paper also demonstrates that the big data-based option trading strategy might be able to beat the market under certain conditions. These results highlight the possibility of big data as a potential source-which has been left largely untapped-for establishing profitable trading strategies as well as developing insights on stock market dynamics.

Design of a Consistency Algorithm for VOD Streaming Data (VOD 스트리밍 데이터를 위한 Consistency 알고리즘 설계)

  • Jang Seung-Ju
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.8
    • /
    • pp.1414-1421
    • /
    • 2006
  • This paper proposes a consistency algorithm that is able to serve streaming data efficiently in VOD system. The media data is stripping into several pieces of data by the Round Robin method in order to media data service. The barrier mechanism is changed into the minimum data factor(SH. GOP) in this paper. The shared memory is allocated at one host with one fragment size. Data is combined with RTP packet transmission data format using barrier mechanism. I experiment and program the suggested algorithm on the VOD system.

Design of an effective real-time data acquisition system (효율적인 실시간 데이터 수집시스템의 설계)

  • 김동욱;염재명;김대원;박용식
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1996.10b
    • /
    • pp.1036-1039
    • /
    • 1996
  • The performance of real-time systems depends upon how well the tasks are scheduled within a cycle time and how fastly the response is made according to the occurrence of an external event. This paper presents the design of an effective real-time data acquisition system in order to gather the data from an automobile engine. This paper investigates an estimation and a restriction method of execution for aperiodic data. Also, the guarantee problem of real-time constraint is presented for periodic data. Through the experiments, the hard real-time guarantee problem of periodic data is studied and the damage problem of periodic data according to the increase of aperiodic tasks is analyzed.

  • PDF

Evolvable Neural Networks for Time Series Prediction with Adaptive Learning Interval

  • Lee, Dong-Wook;Kong, Seong-G;Sim, Kwee-Bo
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.920-924
    • /
    • 2005
  • This paper presents adaptive learning data of evolvable neural networks (ENNs) for time series prediction of nonlinear dynamic systems. ENNs are a special class of neural networks that adopt the concept of biological evolution as a mechanism of adaptation or learning. ENNs can adapt to an environment as well as changes in the environment. ENNs used in this paper are L-system and DNA coding based ENNs. The ENNs adopt the evolution of simultaneous network architecture and weights using indirect encoding. In general just previous data are used for training the predictor that predicts future data. However the characteristics of data and appropriate size of learning data are usually unknown. Therefore we propose adaptive change of learning data size to predict the future data effectively. In order to verify the effectiveness of our scheme, we apply it to chaotic time series predictions of Mackey-Glass data.

  • PDF

Genetic Algorithm Application to Machine Learning

  • Han, Myung-mook;Lee, Yill-byung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.7
    • /
    • pp.633-640
    • /
    • 2001
  • In this paper we examine the machine learning issues raised by the domain of the Intrusion Detection Systems(IDS), which have difficulty successfully classifying intruders. There systems also require a significant amount of computational overhead making it difficult to create robust real-time IDS. Machine learning techniques can reduce the human effort required to build these systems and can improve their performance. Genetic algorithms are used to improve the performance of search problems, while data mining has been used for data analysis. Data Mining is the exploration and analysis of large quantities of data to discover meaningful patterns and rules. Among the tasks for data mining, we concentrate the classification task. Since classification is the basic element of human way of thinking, it is a well-studied problem in a wide variety of application. In this paper, we propose a classifier system based on genetic algorithm, and the proposed system is evaluated by applying it to IDS problem related to classification task in data mining. We report our experiments in using these method on KDD audit data.

  • PDF

Data Mining for Strategy focused CRM Structure (전략중심의 CRM구조의 데이터마이닝)

  • Yoon Yong W.
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.399-405
    • /
    • 2004
  • With the explosive growth of information sources available under various information technology and business environment, it has become increasingly necessary for determining effective marketing strategies and optimizing the logical structure of the CRM data mining system. In this paper, we present an overview of the data mining for strategy focused CRM structure. This includes preprocessing, transaction identification and data integration components. We describe the main part of this paper to the discussion of processes and problems that characterize the mining tools and techniques, identify the CRM data mining, and provide a general architecture of a system to do focused CRM data mining that require further research and development.

  • PDF

Concept Drift Based on CNN Probability Vector in Data Stream Environment

  • Kim, Tae Yeun;Bae, Sang Hyun
    • Journal of Integrative Natural Science
    • /
    • v.13 no.4
    • /
    • pp.147-151
    • /
    • 2020
  • In this paper, we propose a method to detect concept drift by applying Convolutional Neural Network (CNN) in a data stream environment. Since the conventional method compares only the final output value of the CNN and detects it as a concept drift if there is a difference, there is a problem in that the actual input value of the data stream reacts sensitively even if there is no significant difference and is incorrectly detected as a concept drift. Therefore, in this paper, in order to reduce such errors, not only the output value of CNN but also the probability vector are used. First, the data entered into the data stream is patterned to learn from the neural network model, and the difference between the output value and probability vector of the current data and the historical data of these learned neural network models is compared to detect the concept drift. The proposed method confirmed that only CNN output values could be used to reduce detection errors compared to how concept drift were detected.

Query Optimization on Large Scale Nested Data with Service Tree and Frequent Trajectory

  • Wang, Li;Wang, Guodong
    • Journal of Information Processing Systems
    • /
    • v.17 no.1
    • /
    • pp.37-50
    • /
    • 2021
  • Query applications based on nested data, the most commonly used form of data representation on the web, especially precise query, is becoming more extensively used. MapReduce, a distributed architecture with parallel computing power, provides a good solution for big data processing. However, in practical application, query requests are usually concurrent, which causes bottlenecks in server processing. To solve this problem, this paper first combines a column storage structure and an inverted index to build index for nested data on MapReduce. On this basis, this paper puts forward an optimization strategy which combines query execution service tree and frequent sub-query trajectory to reduce the response time of frequent queries and further improve the efficiency of multi-user concurrent queries on large scale nested data. Experiments show that this method greatly improves the efficiency of nested data query.

Sequence Anomaly Detection based on Diffusion Model (확산 모델 기반 시퀀스 이상 탐지)

  • Zhiyuan Zhang;Inwhee, Joe
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.2-4
    • /
    • 2023
  • Sequence data plays an important role in the field of intelligence, especially for industrial control, traffic control and other aspects. Finding abnormal parts in sequence data has long been an application field of AI technology. In this paper, we propose an anomaly detection method for sequence data using a diffusion model. The diffusion model has two major advantages: interpretability derived from rigorous mathematical derivation and unrestricted selection of backbone models. This method uses the diffusion model to predict and reconstruct the sequence data, and then detects the abnormal part by comparing with the real data. This paper successfully verifies the feasibility of the diffusion model in the field of anomaly detection. We use the combination of MLP and diffusion model to generate data and compare the generated data with real data to detect anomalous points.