• Title/Summary/Keyword: statistical representation

Search Result 169, Processing Time 0.023 seconds

An Empirical Central Limit Theorem for the Kaplan-Meier Integral Process on [0,$\infty$)

  • Bae, Jong-Sig
    • Journal of the Korean Statistical Society
    • /
    • v.26 no.2
    • /
    • pp.231-243
    • /
    • 1997
  • In this paper we investigate weak convergence of the intergral processes whose index set is the non-compact infinite time interval. Our first goal is to develop the empirical central limit theorem as random elements of [0, .infty.) for an integral process which is constructed from iid variables. In developing the weak convergence as random elements of D[0, .infty.), we will use a result of Ossiander(4) whose proof heavily depends on the total boundedness of the index set. Our next goal is to establish the empirical central limit theorem for the Kaplan-Meier integral process as random elements of D[0, .infty.). In achieving the the goal, we will use the above iid result, a representation of State(6) on the Kaplan-Meier integral, and a lemma on the uniform order of convergence. The first result, in some sense, generalizes the result of empirical central limit therem of Pollard(5) where the process is regarded as random elements of D[-.infty., .infty.] and the sample paths of limiting Gaussian process may jump. The second result generalizes the first result to random censorship model. The later also generalizes one dimensional central limit theorem of Stute(6) to a process version. These results may be used in the nonparametric statistical inference.

  • PDF

Data-based On-line Diagnosis Using Multivariate Statistical Techniques (다변량 통계기법을 활용한 데이터기반 실시간 진단)

  • Cho, Hyun-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.1
    • /
    • pp.538-543
    • /
    • 2016
  • For a good product quality and plant safety, it is necessary to implement the on-line monitoring and diagnosis schemes of industrial processes. Combined with monitoring systems, reliable diagnosis schemes seek to find assignable causes of the process variables responsible for faults or special events in processes. This study deals with the real-time diagnosis of complicated industrial processes from the intelligent use of multivariate statistical techniques. The presented diagnosis scheme consists of a classification-based diagnosis using nonlinear representation and filtering of process data. A case study based on the simulation data was conducted, and the diagnosis results were obtained using different diagnosis schemes. In addition, the choice of future estimation methods was evaluated. The results showed that the performance of the presented scheme outperformed the other schemes.

Classification of algae in watersheds using elastic shape

  • Tae-Young Heo;Jaehoon Kim;Min Ho Cho
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.3
    • /
    • pp.309-322
    • /
    • 2024
  • Identifying algae in water is important for managing algal blooms which have great impact on drinking water supply systems. There have been various microscopic approaches developed for algae classification. Many of them are based on the morphological features of algae. However, there have seldom been mathematical frameworks for comparing the shape of algae, represented as a planar continuous curve obtained from an image. In this work, we describe a recent framework for computing shape distance between two different algae based on the elastic metric and a novel functional representation called the square root velocity function (SRVF). We further introduce statistical procedures for multiple shapes of algae including computing the sample mean, the sample covariance, and performing the principal component analysis (PCA). Based on the shape distance, we classify six algal species in watersheds experiencing algal blooms, including three cyanobacteria (Microcystis, Oscillatoria, and Anabaena), two diatoms (Fragilaria and Synedra), and one green algae (Pediastrum). We provide and compare the classification performance of various distance-based and model-based methods. We additionally compare elastic shape distance to non-elastic distance using the nearest neighbor classifiers.

Theoretical Background for Data-driven Integration of Raster-based Geological Information (격자형 지질정보의 자료유도 통합을 위한 이론적 배경)

  • Lee, Ki-Won;Chi, Kwang-Hoon
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.3 no.1 s.5
    • /
    • pp.115-121
    • /
    • 1995
  • Recently, spatial integration for mineral exploration is regarded as an important task of various geological applications of GIS. Therefore, theoretical bases of data representation and reasoning concerned with Dempster-Shafer theory and Fuzzy theory were systematically as the data-driven integration methodologies for raster-based geoinformation; they are distinguished from target-driven methodology based on statistical background. According to previous actual applications of these methods to mineral exploration, they have been proven to provide useful information related to hidden target mineral deposits, and it is thought that some suggestions in this study are helpful to further real applications including representation, reasoning, and interpretation stages in order to obtain a decision-supporting layer.

  • PDF

Vehicle Recognition using NMF in Urban Scene (도심 영상에서의 비음수행렬분해를 이용한 차량 인식)

  • Ban, Jae-Min;Lee, Byeong-Rae;Kang, Hyun-Chul
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.7C
    • /
    • pp.554-564
    • /
    • 2012
  • The vehicle recognition consists of two steps; the vehicle region detection step and the vehicle identification step based on the feature extracted from the detected region. Features using linear transformations have the effect of dimension reduction as well as represent statistical characteristics, and show the robustness in translation and rotation of objects. Among the linear transformations, the NMF(Non-negative Matrix Factorization) is one of part-based representation. Therefore, we can extract NMF features with sparsity and improve the vehicle recognition rate by the representation of local features of a car as a basis vector. In this paper, we propose a feature extraction using NMF suitable for the vehicle recognition, and verify the recognition rate with it. Also, we compared the vehicle recognition rate for the occluded area using the SNMF(sparse NMF) which has basis vectors with constraint and LVQ2 neural network. We showed that the feature through the proposed NMF is robust in the urban scene where occlusions are frequently occur.

Statistical Modeling of Learning Curves with Binary Response Data (이항 반응 자료에 대한 학습곡선의 모형화)

  • Lee, Seul-Ji;Park, Man-Sik
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.3
    • /
    • pp.433-450
    • /
    • 2012
  • As a worker performs a certain operation repeatedly, he tends to become familiar with the job and complete it in a very short time. That means that the efficiency is improved due to his accumulated knowledge, experience and skill in regards to the operation. Investing time in an output is reduced by repeating any operation. This phenomenon is referred to as the learning curve effect. A learning curve is a graphical representation of the changing rate of learning. According to previous literature, learning curve effects are determined by subjective pre-assigned factors. In this study, we propose a new statistical model to clarify the learning curve effect by means of a basic cumulative distribution function. This work mainly focuses on the statistical modeling of binary data. We employ the Newton-Raphson method for the estimation and Delta method for the construction of confidence intervals. We also perform a real data analysis.

A Semantic Text Model with Wikipedia-based Concept Space (위키피디어 기반 개념 공간을 가지는 시멘틱 텍스트 모델)

  • Kim, Han-Joon;Chang, Jae-Young
    • The Journal of Society for e-Business Studies
    • /
    • v.19 no.3
    • /
    • pp.107-123
    • /
    • 2014
  • Current text mining techniques suffer from the problem that the conventional text representation models cannot express the semantic or conceptual information for the textual documents written with natural languages. The conventional text models represent the textual documents as bag of words, which include vector space model, Boolean model, statistical model, and tensor space model. These models express documents only with the term literals for indexing and the frequency-based weights for their corresponding terms; that is, they ignore semantical information, sequential order information, and structural information of terms. Most of the text mining techniques have been developed assuming that the given documents are represented as 'bag-of-words' based text models. However, currently, confronting the big data era, a new paradigm of text representation model is required which can analyse huge amounts of textual documents more precisely. Our text model regards the 'concept' as an independent space equated with the 'term' and 'document' spaces used in the vector space model, and it expresses the relatedness among the three spaces. To develop the concept space, we use Wikipedia data, each of which defines a single concept. Consequently, a document collection is represented as a 3-order tensor with semantic information, and then the proposed model is called text cuboid model in our paper. Through experiments using the popular 20NewsGroup document corpus, we prove the superiority of the proposed text model in terms of document clustering and concept clustering.

On-line Process Data-driven Diagnostics Using Statistical Techniques (실시간 공정 데이터와 통계적 방법에 기반한 이상진단)

  • Cho, Hyun-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.3
    • /
    • pp.40-45
    • /
    • 2018
  • Intelligent monitoring and diagnosis of production processes based on multivariate statistical methods has been one of important tasks for safety and quality issues. This is due to the fact that faults and unexpected events may have serious impacts on the operation of processes. This study proposes a diagnostic scheme based on effective representation of process measurement data and is evaluated using simulation process data. The effects of utilizing a preprocessing step and nonlinear statistical methods are also tested using fifteen faults of the simulation process. Results show that the proposed scheme produced more reliable results and outperformed other tested schemes with none of the filtering step and nonlinear methods. The proposed scheme is expected to be robust to process noises and easy to develop due to the lack of required rigorous mathematical process models or expert knowledge.

A Study on Knowledge for the Teaching of Variability and Reasoning about Variation (변이성과 변이 추론의 지도를 위한 지식)

  • Ko, Eun-Sung;Lee, Kyeong-Hwa
    • Journal of Educational Research in Mathematics
    • /
    • v.20 no.4
    • /
    • pp.493-509
    • /
    • 2010
  • Researchers have suggested that educators have to focus their attention on variability and reasoning about variation as means of developing students' statistical thinking in school mathematics. This paper investigated knowledge for the teaching of variability and reasoning about variation; what are sources of variability, how to cope with variability, what are types of variability, how to recognize variability, and the relationship between statistical problem solving and variability. The results involve: discussion on the sources of variability and how to cope with variability promotes students' awareness of different types of variability and students' motivation in the following steps in the statistical activity; emphasis on reasoning about variation in teaching representation of data accords with objectives of statistics education; reexamination of curriculum for statistics education is needed, which has a content-oriented arrangement.

  • PDF

Middle School Students' Critical Thinking Based on Measurement and Scales for the Selection and Interpreation of Data and Graphical presentations (중학생들의 자료와 그래프의 선택과 해석에서 측정과 척도에 근거한 비판적 사고 연구)

  • Yun, Hyung-Ju;Ko, Eun-Sung;Yoo, Yun-Joo
    • Journal of Educational Research in Mathematics
    • /
    • v.22 no.2
    • /
    • pp.137-162
    • /
    • 2012
  • Learning graphical representations for statistical data requires understanding of the context related to measurement in statistical investigation since the choice of representation and the features of the selected graph to represent the data are determined by the purpose and context of data collection and the types of the data collected. This study investigated whether middle school students can think critically about measurement and scales integrating contextual knowledge and statistical knowledge. According to our results, the students lacked critical thinking related to measurement process of data and scales of graphical representations. In particular, the students had a tendency not to question upon information provided from data and graphs. They also lacked competence to critique data and graphs and to make a flexible judgement in light of context including statistical purpose.

  • PDF