• 제목/요약/키워드: random sets

검색결과 276건 처리시간 0.027초

데이터 전처리와 앙상블 기법을 통한 불균형 데이터의 분류모형 비교 연구 (A Comparison of Ensemble Methods Combining Resampling Techniques for Class Imbalanced Data)

  • 이희재;이성임
    • 응용통계연구
    • /
    • 제27권3호
    • /
    • pp.357-371
    • /
    • 2014
  • 최근 들어 데이터 마이닝의 분류문제에 있어 목표변수의 불균형 문제가 많은 관심을 받고 있다. 이러한 문제를 해결하기 위해, 이전 연구들은 원 자료에 대하여 데이터 전처리 과정을 실시했는데, 전처리 과정에는 목표변수의 다수계급을 소수계급의 비율에 맞게 조정하는 과소표집법, 소수계급을 복원추출하여 다수계급의 비율에 맞게 조정하는 과대표집법, 소수계급에 K-최근접 이웃 방법 등을 활용하여 과대표집법을 적용 후 다수계급에는 과소표집법을 적용한 하이브리드 기법 등이 있다. 또한 앙상블 기법도 이러한 불균형 데이터의 분류 성능을 높일 수 있다고 알려져 있어, 본 논문에서는 데이터의 전처리 과정과 앙상블 기법을 함께 고려한 여러 모형들을 사용하여, 불균형 자료에 대한 이들모형의 분류성능을 비교평가한다.

On Some Weak Positive Dependence Notions

  • Kim, Tae-Sung
    • Journal of the Korean Statistical Society
    • /
    • 제23권2호
    • /
    • pp.223-238
    • /
    • 1994
  • A random vector $\b{X} = (X_1,\cdots,X_n)$ is weakly associated if and only if for every pair of partitions $\b{X}_1 = (X_{\pi(1)},\cdots,X_{\pi(k)}), \b{X}_2 = (X_{\pi(k+1),\cdots,X_{\pi(n)})$ of $\b{X}, P(\b{X}_1 \in A, \b{X}_2 \in B) \geq P(\b{X}_1 \in A)\b{P}(\b{X}_2 \in B)$ whenever A and B are open upper sets and $\pi$ is a permutation of ${1,\cdots,n}$. In this paper, we develop notions of weak positive dependence, which are weaker than a positive version of negative association (weak association) but stronger than positive orthant dependence by arguments similar to those of Shaked. We also illustrate some concepts of a particular interest. Various properties and interrelationships are derived.

  • PDF

유연생산시스템의 효율적 운용을 위한 지능적 기법의 적용에 관한 연구 (Application of Intelligent Technique for the Efficient Operation of the Flexible Manufacturing System)

    • 한국경영과학회지
    • /
    • 제24권2호
    • /
    • pp.1-15
    • /
    • 1999
  • This research involves the development and evaluation of a work flow control model for a type of flexible manufacturing system(FMS) called a flexible flow line(FFL). The control model can be considered as a kind of hybrid intelligent model in that it utilizes both computer simulation and neural network technique. Training data sets were obtained using computer simulation of typical FFL states. And these data sets were used to train the neural network model. The model can easily incorporate particular aspects of a specific FFL such as limited buffer capacity and dispatching rules used. It also dynamically adapts to system uncertainty caused by such factors as machine breakdowns. Performance of the control model is shown to be superior to the random releasing method and the Minimal Part Set(MPS) heuristic in terms of machine utilization and work-in-process inventory level.

  • PDF

Numerical measures of Indicating Placement of Posets on Scale from Chains to Antichains

  • Bae, Kyoung-Yul
    • 정보기술과데이타베이스저널
    • /
    • 제3권1호
    • /
    • pp.97-108
    • /
    • 1996
  • In this paper we obtain several function defined on finite partially ordered sets(posets) which may indicate constraints of comparability on sets of teams(tasks, etc.) for which evaluation is computationally simple, a relatively rare condition in graph-based algorithms. Using these functions a set of numerical coefficients and associated distributions obtained from a computer simulation of certain families of random graphs is determined. From this information estimates may be made as to the actual linearity of complicated posets. Applications of these ideas is to all areas where obtaining rankings from partial information in rational ways is relevant as in, e.g., team_, scaling_, and scheduling theory as well as in theoretical computer science. Theoretical consideration of special and desirable properties of various functions is provided permitting judgment concerning sensitivity of these functions to changes in parameters describing (finite) posets.

  • PDF

확률 및 통계이론 기반 태양광 발전 시스템의 동적 모델링에 관한 연구 (A Study on Dynamic Modeling of Photovoltaic Power Generator Systems using Probability and Statistics Theories)

  • 조현철
    • 전기학회논문지
    • /
    • 제61권7호
    • /
    • pp.1007-1013
    • /
    • 2012
  • Modeling of photovoltaic power systems is significant to analytically predict its dynamics in practical applications. This paper presents a novel modeling algorithm of such system by using probability and statistic theories. We first establish a linear model basically composed of Fourier parameter sets for mapping the input/output variable of photovoltaic systems. The proposed model includes solar irradiation and ambient temperature of photovoltaic modules as an input vector and the inverter power output is estimated sequentially. We deal with these measurements as random variables and derive a parameter learning algorithm of the model in terms of statistics. Our learning algorithm requires computation of an expectation and joint expectation against solar irradiation and ambient temperature, which are analytically solved from the integral calculus. For testing the proposed modeling algorithm, we utilize realistic measurement data sets obtained from the Seokwang Solar power plant in Youngcheon, Korea. We demonstrate reliability and superiority of the proposed photovoltaic system model by observing error signals between a practical system output and its estimation.

Change-Point Problems in a Sequence of Binomial Variables

  • Jeong, Kwang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • 제3권2호
    • /
    • pp.175-185
    • /
    • 1996
  • For the Change-point problem in a sequence of binomial variables we consider the maximum likelihood estimator (MLE) of unknown change-point. Its asymptotic distribution is quite limited in the case of binomial variables with different numver of trials at each time point. Hinkley and Hinkley (1970) gives an asymptotic distribution of the MLE for a sequence of Bernoulli random variables. To find the asymptotic distribution a numerical method such as bootstrap can be used. Another concern of our interest in the inference on the change-point and we derive confidence sets based on the liklihood ratio test(LRT). We find approximate confidence sets from the bootstrap distribution and compare the two results through an example.

  • PDF

A computational note on maximum likelihood estimation in random effects panel probit model

  • Lee, Seung-Chun
    • Communications for Statistical Applications and Methods
    • /
    • 제26권3호
    • /
    • pp.315-323
    • /
    • 2019
  • Panel data sets have recently been developed in various areas, and many recent studies have analyzed panel, or longitudinal data sets. Often a dichotomous dependent variable occur in survival analysis, biomedical and epidemiological studies that is analyzed by a generalized linear mixed effects model (GLMM). The most common estimation method for the binary panel data may be the maximum likelihood (ML). Many statistical packages provide ML estimates; however, the estimates are computed from numerically approximated likelihood function. For instance, R packages, pglm (Croissant, 2017) approximate the likelihood function by the Gauss-Hermite quadratures, while Rchoice (Sarrias, Journal of Statistical Software, 74, 1-31, 2016) use a Monte Carlo integration method for the approximation. As a result, it can be observed that different packages give different results because of different numerical computation methods. In this note, we discuss the pros and cons of numerical methods compared with the exact computation method.

Comparative Sensitivity of PCR Primer Sets for Detection of Cryptosporidium parvum

  • Yu, Jae-Ran;Lee, Soo-Ung;Park, Woo-Yoon
    • Parasites, Hosts and Diseases
    • /
    • 제47권3호
    • /
    • pp.293-297
    • /
    • 2009
  • Improved methods for detection of Cryptosporidium oocysts in environmental and clinical samples are urgently needed to improve detection of cryptosporidiosis. We compared the sensitivity of 7 PCR primer sets for detection of Cryptosporidium parvum. Each target gene was amplified by PCR or nested PCR with serially diluted DNA extracted from purified C. parvum oocysts. The target genes included Cryptosporidium oocyst wall protein (COWP), small subunit ribosomal RNA (SSU rRNA), and random amplified polymorphic DNA. The detection limit of the PCR method ranged from $10^3$ to $10^4$ oocysts, and the nested PCR method was able to detect $10^0$ to $10^2$ oocysts. A second-round amplification of target genes showed that the nested primer set specific for the COWP gene proved to be the most sensitive one compared to the other primer sets tested in this study and would therefore be useful for the detection of C. parvum.

Quantitative Evaluation of Setup Error for Whole Body Stereotactic Radiosurgery by Image Registration Technique

  • Kim, Young-Seok;Yi, Byong-Yong;Kim, Jong-Hoon;Ahn, Seung-Do;Lee, Sang-wook;Im, Ki-Chun;Park, Eun-Kyung
    • 한국의학물리학회:학술대회논문집
    • /
    • 한국의학물리학회 2002년도 Proceedings
    • /
    • pp.103-105
    • /
    • 2002
  • Whole body stereotactic radiosurgery (WBSRS) technique is believed to be useful for the metastatic lesions as well as relatively small primary tumors in the trunk. Unlike stereotactic radiosurgery to intracranial lesion, inherent limitation on immobilization of whole body makes it difficult to achieve the reliable setup reproducibility. For this reason, it is essential to develop an objective and quantitative method of evaluating setup error for WBSRS. An evaluation technique using image registration has been developed for this purpose. Point pair image registrations with WBSRS frame coordinates were performed between two sets of CT images acquired before each treatment. Positional displacements could be determined by means of volumetric planning target volume (PTV) comparison between the reference and the registered image sets. Twenty eight sets of CT images from 19 WBSRS patients treated in Asan Medical Center have been analyzed by this method for determination of setup random error of each treatment. It is objective and clinically useful to analyze setup error quantitatively by image registration technique with WBSRS frame coordinates.

  • PDF

Selection of measurement sets in static structural identification of bridges using observability trees

  • Lozano-Galant, Jose Antonio;Nogal, Maria;Turmo, Jose;Castillo, Enrique
    • Computers and Concrete
    • /
    • 제15권5호
    • /
    • pp.771-794
    • /
    • 2015
  • This paper proposes an innovative method for selection of measurement sets in static parameter identification of concrete or steel bridges. This method is proved as a systematic tool to address the first steps of Structural System Identification procedures by observability techniques: the selection of adequate measurement sets. The observability trees show graphically how the unknown estimates are successively calculated throughout the recursive process of the observability analysis. The observability trees can be proved as an intuitive and powerful tool for measurement selection in beam bridges that can also be applied in complex structures, such as cable-stayed bridges. Nevertheless, in these structures, the strong link among structural parameters advises to assume a set of simplifications to increase the tree intuitiveness. In addition, a set of guidelines are provided to facilitate the representation of the observability trees in this kind of structures. These guidelines are applied in bridges of growing complexity to explain how the characteristics of the geometry of the structure (e.g. deck inclination, type of pylon-deck connection, or the existence of stay cables) affect the observability trees. The importance of the observability trees is justified by a statistical analysis of measurement sets randomly selected. This study shows that, in the analyzed structure, the probability of selecting an adequate measurement set with a minimum number of measurements at random is practically negligible. Furthermore, even bigger measurement sets might not provide adequate SSI of the unknown parameters. Finally, to show the potential of the observability trees, a large-scale concrete cable-stayed bridge is also analyzed. The comparison with the number of measurements required in the literature shows again the advantages of using the proposed method.