• Title/Summary/Keyword: incomplete data

Search Result 725, Processing Time 0.026 seconds

Cluster Analysis of Incomplete Microarray Data with Fuzzy Clustering

  • Kim, Dae-Won
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.3
    • /
    • pp.397-402
    • /
    • 2007
  • In this paper, we present a method for clustering incomplete Microarray data using alternating optimization in which a prior imputation method is not required. To reduce the influence of imputation in preprocessing, we take an alternative optimization approach to find better estimates during iterative clustering process. This method improves the estimates of missing values by exploiting the cluster Information such as cluster centroids and all available non-missing values in each iteration. The clustering results of the proposed method are more significantly relevant to the biological gene annotations than those of other methods, indicating its effectiveness and potential for clustering incomplete gene expression data.

Online Learning of Bayesian Network Parameters for Incomplete Data of Real World (현실 세계의 불완전한 데이타를 위한 베이지안 네트워크 파라메터의 온라인 학습)

  • Lim, Sung-Soo;Cho, Sung-Bae
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.12
    • /
    • pp.885-893
    • /
    • 2006
  • The Bayesian network(BN) has emerged in recent years as a powerful technique for handling uncertainty iii complex domains. Parameter learning of BN to find the most proper network from given data set has been investigated to decrease the time and effort for designing BN. Off-line learning needs much time and effort to gather the enough data and since there are uncertainties in real world, it is hard to get the complete data. In this paper, we propose an online learning method of Bayesian network parameters from incomplete data. It provides higher flexibility through learning from incomplete data and higher adaptability on environments through online learning. The results of comparison with Voting EM algorithm proposed by Cohen at el. confirm that the proposed method has the same performance in complete data set and higher performance in incomplete data set, comparing with Voting EM algorithm.

Effects of Additional Constraints on Performance of Portfolio Selection Models with Incomplete Information : Case Study of Group Stocks in the Korean Stock Market (불완전 정보 하에서 추가적인 제약조건들이 포트폴리오 선정 모형의 성과에 미치는 영향 : 한국 주식시장의 그룹주 사례들을 중심으로)

  • Park, Kyungchan;Jung, Jongbin;Kim, Seongmoon
    • Korean Management Science Review
    • /
    • v.32 no.1
    • /
    • pp.15-33
    • /
    • 2015
  • Under complete information, introducing additional constraints to a portfolio will have a negative impact on performance. However, real-life investments inevitably involve use of error-prone estimations, such as expected stock returns. In addition to the reality of incomplete data, investments of most Korean domestic equity funds are regulated externally by the government, as well as internally, resulting in limited maximum investment allocation to single stocks and risk free assets. This paper presents an investment framework, which takes such real-life situations into account, based on a newly developed portfolio selection model considering realistic constraints under incomplete information. Additionally, we examined the effects of additional constraints on portfolio's performance under incomplete information, taking the well-known Samsung and SK group stocks as performance benchmarks during the period beginning from the launch of each commercial fund, 2005 and 2007 respectively, up to 2013. The empirical study shows that an investment model, built under incomplete information with additional constraints, outperformed a model built without any constraints, and benchmarks, in terms of rate of return, standard deviation of returns, and Sharpe ratio.

Nonparametric Inference for the Recurrent Event Data with Incomplete Observation Gaps

  • Kim, Jin-Heum;Nam, Chung-Mo;Kim, Yang-Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.4
    • /
    • pp.621-632
    • /
    • 2012
  • Recurrent event data can be easily found in longitudinal studies such as clinical trials, reliability fields, and the social sciences; however, there are a few observations that disappear temporarily in sight during the follow-up and then suddenly reappear without notice like the Young Traffic Offenders Program(YTOP) data collected by Farmer et al. (2000). In this article we focused on inference for a cumulative mean function of the recurrent event data with these incomplete observation gaps. Defining a corresponding risk set would be easily accomplished if we know the exact intervals where the observation gaps occur. However, when they are incomplete (if their starting times are known but their terminating times are unknown) we need to estimate a distribution function for the terminating times of the observation gaps. To accomplish this, we treated them as interval-censored and then estimated their distribution using the EM algorithm proposed by Turnbull (1976). We proposed a nonparametric estimator for the cumulative mean function and also a nonparametric test to compare the cumulative mean functions of two groups. Through simulation we investigated the finite-sample performance of the proposed estimator and proposed test. Finally, we applied the proposed methods to YTOP data.

The Effect of the Incomplete Lactation Records for Genetic Evaluations with Random Regression Test-Day Models (RRTDM) in Holstein Cattle (불완전 검정일 기록이 RRTDM을 이용한 홀스타인 젖소의 유전평가에 미치는 영향)

  • Cho, J.H.;Cho, K.H.;Lee, K.J.
    • Journal of Animal Science and Technology
    • /
    • v.47 no.2
    • /
    • pp.147-158
    • /
    • 2005
  • The purpose of this study was to find out the effects that daughters' incomplete lactation records affect sire's breeding values through genetic evaluation using RRTDM(random regression test-day model). First, we estimated genetic parameters and breeding values on sires having complete lactation records of daughter by RRTDM, second, we changed complete lactation records of specific sires into incomplete records by various methods. Third, the breeding values were compared between complete and incomplete records. Finally, this study aimed to find out the methods to minimize the estimation errors of young bulls' breeding values. Data used in this study were collected from the dairy herd improvement program, and a total of 97,562 records were composed of 10,929 first parity with both parents known, since 1999. Breeding values on the daughters from randomly chosen sires were calculated and compared with among 90 day, 150day, and 200 day's incomplete records. For milk yields, sire's ranks of breeding values used by complete lactation records were very different from sire's ranks of breeding values obtained by incomplete lactation records(Rank_90 cut, 150cut, 200 cut).The differences were also obtained between complete lactation records(per305_full) and incomplete lactation record (per_90 cut, 150cut, 200 cut) in breeding values regarding persistency. Especially, the differences between per_90 cut and per305_full were very large(from 1.8 kg to 145kg).

New Wald Test Compared with Chen and Fienberg's for Testing Independence in Incomplete Contingency Tables

  • Kang, Shin-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.1
    • /
    • pp.137-144
    • /
    • 2005
  • In $I{\times}J$ incomplete contingency tables, the test of independence proposed by Chen and Fienberg(1974) uses $I{\times}J-1$ instead of (I-1)(J-1) degrees of freedom without providing much of an increase in the value of the test statistic. For these reasons, Chen and Fienberg tests are expected to have less power. New Wald test statistic related to the part of Chen and Fienberg test statistic is proposed using delta method. These two tests are compared through Monte Carlo studies.

  • PDF

Neural Network-based Decision Class Analysis with Incomplete Information

  • 김재경;이재광;박경삼
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 1999.03a
    • /
    • pp.281-287
    • /
    • 1999
  • Decision class analysis (DCA) is viewed as a classification problem where a set of input data (situation-specific knowledge) and output data(a topological leveled influence diagram (ID)) is given. Situation-specific knowledge is usually given from a decision maker (DM) with the help of domain expert(s). But it is not easy for the DM to know the situation-specific knowledge of decision problem exactly. This paper presents a methodology for sensitivity analysis of DCA under incomplete information. The purpose of sensitivity analysis in DCA is to identify the effects of incomplete situation-specific frames whose uncertainty affects the importance of each variable in the resulting model. For such a purpose, our suggested methodology consists of two procedures: generative procedure and adaptive procedure. An interactive procedure is also suggested based the sensitivity analysis to build a well-formed ID. These procedures are formally explained and illustrated with a raw material purchasing problem.

  • PDF

Detecting Influential Observations in Multivariate Statistical Analysis of Incomplete Data by PCA (주성분분석에 의한 결손 자료의 영향값 검출에 대한 연구)

  • 김현정;문승호;신재경
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.383-392
    • /
    • 2000
  • Since late 1970, methods of influence or sensitivity analysis for detecting influential observations have been studied not only in regression and related methods but also in various multivariate methods. If results of multivariate analyses sometimes depend heavily on a small number of observations, we should be very careful to draw a conclusion. Similar phenomena may also occur in the case of incomplete data. In this research we try to study such influential observations in multivariate statistical analysis of incomplete data. Case of principal component analysis is studied with a numerical example.

  • PDF

Estimation of Product Reliability with Incomplete Field Warranty Data (불완전한 사용현장 보증 데이터를 이용한 제품 신뢰도 추정)

  • Lim, Tae-Jin
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.28 no.4
    • /
    • pp.368-378
    • /
    • 2002
  • As more companies are equipped with data aquisition systems for their products, huge amount of field warranty data has been accumulated. We focus on the case when the field data for a given product comprise with the number of sales and the number of the first failures for each period. The number of censored items and their ages are assumed to be given. This type of data are incomplete in the sense that the age of a failed item is unknown. We construct a model for this type of data and propose an algorithm for nonparametric maximum likelihood estimation of the product reliability. Unlike the nonhomogeneous Poisson process(NHPP) model, our method can handle the data with censored items as well as those with small population. A few examples are investigated to characterize our model, and a real field warranty data set is analyzed by the method.

Effects of Elastic Resistance Exercise Using Proprioceptive Neuromuscular Facilitation on Activities of Daily Living of Patient with Incomplete Spinal Cord Injury -Single Subject Design- (PNF에 기초한 탄력저항운동이 불완전 척수 손상 환자의 일상생활동작에 미치는 효과 -단일사례연구-)

  • Kim, Jwa-Jun;Kim, Min-Soo
    • PNF and Movement
    • /
    • v.14 no.3
    • /
    • pp.245-254
    • /
    • 2016
  • Purpose: This study investigates the influence of elastic resistance exercise using proprioceptive neuromuscular facilitation (PNF) on the daily activities of a patient with incomplete spinal cord injury. The result will be proposed as background data for effective intervention in a patient with incomplete spinal cord injury. Methods: The target subject was a patient with incomplete spinal cord injury to the cervical cord (C6). Elastic resistance exercise based on PNF was performed for 30 min daily, five times a week, for eight weeks. The ASIS motor scale was applied to test the muscular strength of the upper limb, and the spinal cord independence measure II (SCIM II) was used to evaluate the capacity of daily activity. Results: By applying elastic resistance exercise based on PNF, the muscular strength of the upper limb increased and the performance of daily activity improved. Conclusion: Because elastic resistance exercise based on PNF positively influences the ASIA motor scale and SCIM II of the patient with incomplete spinal cord injury. It can be used for training programs to improve the capacity of daily activity of the patient.