• Title/Summary/Keyword: data-set

Search Result 11,058, Processing Time 0.044 seconds

A Real-Time Stock Market Prediction Using Knowledge Accumulation (지식 누적을 이용한 실시간 주식시장 예측)

  • Kim, Jin-Hwa;Hong, Kwang-Hun;Min, Jin-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.109-130
    • /
    • 2011
  • One of the major problems in the area of data mining is the size of the data, as most data set has huge volume these days. Streams of data are normally accumulated into data storages or databases. Transactions in internet, mobile devices and ubiquitous environment produce streams of data continuously. Some data set are just buried un-used inside huge data storage due to its huge size. Some data set is quickly lost as soon as it is created as it is not saved due to many reasons. How to use this large size data and to use data on stream efficiently are challenging questions in the study of data mining. Stream data is a data set that is accumulated to the data storage from a data source continuously. The size of this data set, in many cases, becomes increasingly large over time. To mine information from this massive data, it takes too many resources such as storage, money and time. These unique characteristics of the stream data make it difficult and expensive to store all the stream data sets accumulated over time. Otherwise, if one uses only recent or partial of data to mine information or pattern, there can be losses of valuable information, which can be useful. To avoid these problems, this study suggests a method efficiently accumulates information or patterns in the form of rule set over time. A rule set is mined from a data set in stream and this rule set is accumulated into a master rule set storage, which is also a model for real-time decision making. One of the main advantages of this method is that it takes much smaller storage space compared to the traditional method, which saves the whole data set. Another advantage of using this method is that the accumulated rule set is used as a prediction model. Prompt response to the request from users is possible anytime as the rule set is ready anytime to be used to make decisions. This makes real-time decision making possible, which is the greatest advantage of this method. Based on theories of ensemble approaches, combination of many different models can produce better prediction model in performance. The consolidated rule set actually covers all the data set while the traditional sampling approach only covers part of the whole data set. This study uses a stock market data that has a heterogeneous data set as the characteristic of data varies over time. The indexes in stock market data can fluctuate in different situations whenever there is an event influencing the stock market index. Therefore the variance of the values in each variable is large compared to that of the homogeneous data set. Prediction with heterogeneous data set is naturally much more difficult, compared to that of homogeneous data set as it is more difficult to predict in unpredictable situation. This study tests two general mining approaches and compare prediction performances of these two suggested methods with the method we suggest in this study. The first approach is inducing a rule set from the recent data set to predict new data set. The seocnd one is inducing a rule set from all the data which have been accumulated from the beginning every time one has to predict new data set. We found neither of these two is as good as the method of accumulated rule set in its performance. Furthermore, the study shows experiments with different prediction models. The first approach is building a prediction model only with more important rule sets and the second approach is the method using all the rule sets by assigning weights on the rules based on their performance. The second approach shows better performance compared to the first one. The experiments also show that the suggested method in this study can be an efficient approach for mining information and pattern with stream data. This method has a limitation of bounding its application to stock market data. More dynamic real-time steam data set is desirable for the application of this method. There is also another problem in this study. When the number of rules is increasing over time, it has to manage special rules such as redundant rules or conflicting rules efficiently.

Deep Learning for Pet Image Classification (애완동물 분류를 위한 딥러닝)

  • Shin, Kwang-Seong;Shin, Seong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.151-152
    • /
    • 2019
  • In this paper, we propose an improved learning method based on a small data set for animal image classification. First, CNN creates a training model for a small data set and uses the data set to expand the data set of the training set Second, a bottleneck of a small data set is extracted using a pre-trained network for a large data set such as VGG16 and stored in two NumPy files as a new training data set and a test data set, finally, learn the fully connected network as a new data set.

  • PDF

Improving the Performance of Threshold Bootstrap for Simulation Output Analysis (시뮬레이션 출력분석을 위한 임계값 부트스트랩의 성능개선)

  • Kim, Yun-Bae
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.23 no.4
    • /
    • pp.755-767
    • /
    • 1997
  • Analyzing autocorrelated data set is still an open problem. Developing on easy and efficient method for severe positive correlated data set, which is common in simulation output, is vital for the simulation society. Bootstrap is on easy and powerful tool for constructing non-parametric inferential procedures in modern statistical data analysis. Conventional bootstrap algorithm requires iid assumption in the original data set. Proper choice of resampling units for generating replicates has much to do with the structure of the original data set, iid data or autocorrelated. In this paper, a new bootstrap resampling scheme is proposed to analyze the autocorrelated data set : the Threshold Bootstrap. A thorough literature search of bootstrap method focusing on the case of autocorrelated data set is also provided. Theoretical foundations of Threshold Bootstrap is studied and compared with other leading bootstrap sampling techniques for autocorrelated data sets. The performance of TB is reported using M/M/1 queueing model, else the comparison of other resampling techniques of ARMA data set is also reported.

  • PDF

Selection of data set with fuzzy entropy function (퍼지 엔트로피 함수를 이용한 데이터추출)

  • Lee, Sang-Hyuk;Cheon, Seong-Pyo;Kim, Sung-Shin
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2004.04a
    • /
    • pp.349-352
    • /
    • 2004
  • In this literature, the selection of data set among the universe set is carried out with the fuzzy entropy function. By the definition of fuzzy entropy, we have proposed the fuzzy entropy function and the proposed fuzzy entropy function is proved through the definition. The proposed fuzzy entropy function calculate the certainty or uncertainty value of data set, hence we can choose the data set that satisfying certain bound or reference. Therefore the reliable data set can be obtained by the proposed fuzzy entropy function. With the simple example we verify that the proposed fuzzy entropy function select reliable data set.

  • PDF

Selection of data set with fuzzy entropy function

  • Lee, Sang-Hyuk;Cheon, Seong-Pyo;Kim, Sung shin
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.5
    • /
    • pp.655-659
    • /
    • 2004
  • In this literature, the selection of data set among the universe set is carried out with the fuzzy entropy function. By the definition of fuzzy entropy, the fuzzy entropy function is proposed and the proposed fuzzy entropy function is proved through the definition. The proposed fuzzy entropy function calculate the certainty or uncertainty value of data set, hence we can choose the data set that satisfying certain bound or reference. Therefore the reliable data set can be obtained by the proposed fuzzy entropy function. With the simple example we verify that the proposed fuzzy entropy function select reliable data set.

The Nursing Minimum Data Set (NMDS) and Its Relationship with the Nursing Management Minimum Data Set (NMMDS): significance, development, and future of nursing profession (Nursing Minimum Data Set (NMDS)과 Nursing Management Minimum Data Set(NMMDS) 과의 관계)

  • Lee, Eunjoo
    • Journal of Korean Academy of Nursing
    • /
    • v.31 no.3
    • /
    • pp.401-416
    • /
    • 2001
  • 현재의 보건의료체계에서는 모든 것이 급박하게 변화하고 있으며 또 구체적인 자료를 요구한다. 컴퓨터의 보급과 함께 이러한 변화에 능동적으로 대처하기 위해 간호학에서도 표준화된 대규모 데이터베이스의 개발이 필수적이다. Nursing Minimum Data Set (NMDS)은 간호학분야에서 개발된 최초의 표준화된 대규모 데이터 베이스로서, 간호가 일어나는 모든 상황에서 반드시 수집되어야 할 핵심적인 간호요소를 포함하고 있다. 따라서 본 논문에서는 NMDS 개발의 역사적인 배경, 목적, 요소, 그리고 간호계의 세계적인 동향과 관련하여 NMDS가 이루어야 할 방향, 그리고 NMDS를 완성하기 위해 선행되어 할 문제로 표준화된 분류체계에 대해 논의하였다. 그리고 미국이외에도 몇몇나라에서 NNDS나 혹은 유사한 데이터베이스가 개발 중이거나 이미 수집되고 있는 나라들이 있으므로 이들에 대한 비교와 분석도 제시하였다. 그리고 보다 최근에 개발된 데이터 베이스로 주로 행정적인 목적을 위해 개발된 Nursing Management Minimum Data Set (NMMDS)을 소개하였다. 즉 NMDS가 임상적인 자료의 수집에 초점을 맞춘 데 비해, NMMD는 효과적인 간호관리에 필수적인 요소들을 포함시켰다. 그래서 간호행정가들이 의사결정에 필요한 재정적자원, 환경적자원, 간호자원에 대한 정보를 수집할 수 있게 고안되었다. 이러한 데이터 베이스들은 관계형 데이터베이스로 서로 연결되어야 하며, 다른 학문분야와도 연계되어 활용되어져야 할 것이다. 만약 이러한 대규모 데이터베이스 들이 한국에서도 개발되고 사용되어 진다면 환자간호에 더욱 비용 효과적인 관리가 가능하게 될 것이다. 마지막으로 우리나라에서 NMDS나 NMMDS 같은 대규모데이터 베이스의 개발이 시급히 요청됨을 강조하였다.

  • PDF

Managing Data Set in Administrative Information Systems as Records (행정정보 데이터세트의 기록관리 방안)

  • Oh, Seh-La;Rieh, Hae-young
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.19 no.2
    • /
    • pp.51-76
    • /
    • 2019
  • Records management professionals and scholars have emphasized the necessity of managing data set in administrative information systems as records, but it has not been practiced in the actual field. Applying paper-based records management standards and guidelines to data set management proved to be a difficult task because of technology-dependent characteristics, vast scale, and various operating environments. Therefore, the data set requires a management system that can accommodate the inherent characteristics of records and can be practically applied. This study developed and presented data set management methods and procedures based on the analysis of data set in public administrative information systems operating in public institutions.

A Review of Minimum Data Sets and Standardized Nursing Classifications (보건의료정보 자료 세트의 비교 및 간호정보 표준화에 대한 고찰)

  • Yom Young-Hee;Lee Ji-Soon;Kim Hee-Kyung;Chang Hae-Kyung;Oh Won-Ok;Choi Bo-Kyung;Park Chang-Sung;Chun Sook-Hee;Lee Jung-Ae
    • The Journal of Korean Academic Society of Nursing Education
    • /
    • v.5 no.1
    • /
    • pp.72-85
    • /
    • 1999
  • The paper presents a review of three data sets(Uniform Hospital Discharge Data Set, Nursing Minimum Data Set, and Nursing Management Minimum Data Set) and six major nursing classifications(the North American Nursing Diagnoses Association Taxonomy I, Omaha System, Nursing Interventions Classification, Nursing Intervention Lexicon and Taxonomy, Nursing Outcome Classification, Nursing Outcomes Classification, and Classification of Patient Outcome). The reviewed data sets and nursing classifications were different from each other in the purpose, structure, and user. Nursing Interventions Classification and Nursing Outcomes Classification were linked to North American Nursing Diagnosis Association, but others not. The data set and nursing classifications need to be linked to other data sets and classifications.

  • PDF

A Novel Reversible Data Hiding Scheme for VQ-Compressed Images Using Index Set Construction Strategy

  • Qin, Chuan;Chang, Chin-Chen;Chen, Yen-Chang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.8
    • /
    • pp.2027-2041
    • /
    • 2013
  • In this paper, we propose a novel reversible data hiding scheme in the index tables of the vector quantization (VQ) compressed images based on index set construction strategy. On the sender side, three index sets are constructed, in which the first set and the second set include the indices with greater and less occurrence numbers in the given VQ index table, respectively. The index values in the index table belonging to the second set are added with prefixes from the third set to eliminate the collision with the two derived mapping sets of the first set, and this operation of adding prefixes has data hiding capability additionally. The main data embedding procedure can be achieved easily by mapping the index values in the first set to the corresponding values in the two derived mapping sets. The same three index sets reconstructed on the receiver side ensure the correctness of secret data extraction and the lossless recovery of index table. Experimental results demonstrate the effectiveness of the proposed scheme.

Gene Set and Pathway Analysis of Microarray Data (프마이크로어레이 데이터의 유전자 집합 및 대사 경로 분석)

  • Kim Seon-Young
    • KOGO NEWS
    • /
    • v.6 no.1
    • /
    • pp.29-33
    • /
    • 2006
  • Gene set analysis is a new concept and method. to analyze and interpret microarray gene expression data and tries to extract biological meaning from gene expression data at gene set level rather than at gene level. Compared with methods which select a few tens or hundreds of genes before gene ontology and pathway analysis, gene set analysis identifies important gene ontology terms and pathways more consistently and performs well even in gene expression data sets with minimal or moderate gene expression changes. Moreover, gene set analysis is useful for comparing multiple gene expression data sets dealing with similar biological questions. This review briefly summarizes the rationale behind the gene set analysis and introduces several algorithms and tools now available for gene set analysis.

  • PDF