• Title/Summary/Keyword: statistical techniques

Search Result 1,664, Processing Time 0.028 seconds

A Study on Detection of Influential Observations on A Subset of Regression Parameters in Multiple Regression

  • Park, Sung Hyun;Oh, Jin Ho
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.2
    • /
    • pp.521-531
    • /
    • 2002
  • Various diagnostic techniques for identifying influential observations are mostly based on the deletion of a single observation. While such techniques can satisfactorily identify influential observations in many cases, they will not always be successful because of some mask effect. It is necessary, therefore, to develop techniques that examine the potentially influential effects of a subset of observations. The partial regression plots can be used to examine an influential observation for a single parameter in multiple linear regression. However, it is often desirable to detect influential observations for a subset of regression parameters when interest centers on a selected subset of independent variables. Thus, we propose a diagnostic measure which deals with detecting influential observations on a subset of regression parameters. In this paper, we propose a measure M, which can be effectively used for the detection of influential observations on a subset of regression parameters in multiple linear regression. An illustrated example is given to show how we can use the new measure M to identify influential observations on a subset of regression parameters.

A Comprehensive Overview of RNA Deconvolution Methods and Their Application

  • Yebin Im;Yongsoo Kim
    • Molecules and Cells
    • /
    • v.46 no.2
    • /
    • pp.99-105
    • /
    • 2023
  • Tumors are surrounded by a variety of tumor microenvironmental cells. Profiling individual cells within the tumor tissues is crucial to characterize the tumor microenvironment and its therapeutic implications. Since single-cell technologies are still not cost-effective, scientists have developed many statistical deconvolution methods to delineate cellular characteristics from bulk transcriptome data. Here, we present an overview of 20 deconvolution techniques, including cutting-edge techniques recently established. We categorized deconvolution techniques by three primary criteria: characteristics of methodology, use of prior knowledge of cell types and outcome of the methods. We highlighted the advantage of the recent deconvolution tools that are based on probabilistic models. Moreover, we illustrated two scenarios of the common application of deconvolution methods to study tumor microenvironments. This comprehensive review will serve as a guideline for the researchers to select the appropriate method for their application of deconvolution.

Characterization of the Smoothest Density with Given Moments

  • Hong, Changkon
    • Journal of the Korean Statistical Society
    • /
    • v.30 no.3
    • /
    • pp.367-385
    • /
    • 2001
  • In this paper, we characterize the smoothest density with prescribed moments. Hong and Kim(1995) proved the existence and uniqueness of such as density. we introduce the general optimal control problem and prove some theorems on the characterization of the minimizer using the optimal control problem techniques.

  • PDF

Assessment for Efficiency of Two-Stage Randomized Response Technique

  • Park, Kyung-Ho
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.2
    • /
    • pp.427-433
    • /
    • 2000
  • In this paper, we review several two-stage randomized response techniques for gathering self-report data when persons are asked sensitive question. Also efficiencies and privacy protections based on the two-stage randomized response procedures are compared. Finally, we find optimal parameter conditions.

  • PDF

Algorithm for the Constrained Chebyshev Estimation in Linear Regression

  • Kim, Bu-yong
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.1
    • /
    • pp.47-54
    • /
    • 2000
  • This article is concerned with the algorithm for the Chebyshev estimation with/without linear equality and/or inequality constraints. The algorithm employs a linear scaling transformation scheme to reduce the computational burden which is induced when the data set is quite large. The convergence of the proposed algorithm is proved. And the updating and orthogonal decomposition techniques are considered to improve the computational efficiency and numerical stability.

  • PDF

Data-Driven Smooth Goodness of Fit Test by Nonparametric Function Estimation

  • Kim, Jongtae
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.3
    • /
    • pp.811-816
    • /
    • 2000
  • The purpose of this paper is to study of data-driven smoothing goodness of it test, when the hypothesis is complete. The smoothing goodness of fit test statistic by nonparametric function estimation techniques is proposed in this paper. The results of simulation studies for he powers of show that the proposed test statistic compared well to other.

  • PDF

A Spatial Analysis of Seismic Vulnerability of Buildings Using Statistical and Machine Learning Techniques Comparative Analysis (통계분석 기법과 머신러닝 기법의 비교분석을 통한 건물의 지진취약도 공간분석)

  • Seong H. Kim;Sang-Bin Kim;Dae-Hyeon Kim
    • Journal of Industrial Convergence
    • /
    • v.21 no.1
    • /
    • pp.159-165
    • /
    • 2023
  • While the frequency of seismic occurrence has been increasing recently, the domestic seismic response system is weak, the objective of this research is to compare and analyze the seismic vulnerability of buildings using statistical analysis and machine learning techniques. As the result of using statistical technique, the prediction accuracy of the developed model through the optimal scaling method showed about 87%. As the result of using machine learning technique, because the accuracy of Random Forest method is 94% in case of Train Set, 76.7% in case of Test Set, which is the highest accuracy among the 4 analyzed methods, Random Forest method was finally chosen. Therefore, Random Forest method was derived as the final machine learning technique. Accordingly, the statistical analysis technique showed higher accuracy of about 87%, whereas the machine learning technique showed the accuracy of about 76.7%. As the final result, among the 22,296 analyzed building data, the seismic vulnerabilities of 1,627(0.1%) buildings are expected as more dangerous when the statistical analysis technique is used, 10,146(49%) buildings showed the same rate, and the remaining 10,523(50%) buildings are expected as more dangerous when the machine learning technique is used. As the comparison of the results of using advanced machine learning techniques in addition to the existing statistical analysis techniques, in spatial analysis decisions, it is hoped that this research results help to prepare more reliable seismic countermeasures.

Statistical Evaluation of Smoke Analysis Technique through Asia Collaborative Study V.

  • Ra, Do-Young;Rhee, Moon-Soo;Kim, Yoon-Dong;Hwang, Keon-Joong
    • Journal of the Korean Society of Tobacco Science
    • /
    • v.20 no.1
    • /
    • pp.108-114
    • /
    • 1998
  • This study was conducted to evaluate the techniques or analyzing tobacco smoke by statistical treatment method for the analytical data through Asia Collaborative Study V. In addition to five smoke components analysis, consisting of TPM, water, nicotine, NFDPM, and puff count of four cigarettes samples, statistical parameters such as mean, standard deviation, box-and-whisker plots, h plots, k plots, regression coefficients, reproducibility (R), and repeatability (r) were also calculated. Analysis of water content of cigarette smoke was the most difficult task, whereas puff count analysis was the easiest as well recognized by all laboratories. Analysis of nicotine and puff count accounted for both the lowest and the highest variation among four parameters. The water coefficients indicated more randomness or variation in the slops. The NFDPM data exhibited both types of deviations from linearity. Water content of sample D indicated the highest difference between two single results and between two interlaboratory test results. As a whole, KGTRI ranked higher in the analytical techniques for statistical evaluation of results when compared with the practices of 28 other laboratories.

  • PDF

Single Image Based HDR Algorithm Using Statistical Differencing and Histogram Manipulation (통계적 편차와 히스토그램 변형을 이용한 단일영상기반 고품질 영상 생성기법)

  • Song, Jin-Sun;Han, Kyu-Phil;Park, Yang-Woo
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.7
    • /
    • pp.764-771
    • /
    • 2018
  • In this paper, we propose a high-quality image acquisition algorithm using only a single image, which the high-quality image is normally referred as HDR ones. In order to acquire the HDR image, conventional methods need many images having different exposure values at the same scene and should delicately adjust the color values for a bit-expansion or an exposure fusion. Thus, they require considerable calculations and complex structures. Therefore, the proposed algorithm suggests a completely new approach using one image for the high-quality image acquisition by applying statistical difference and histogram manipulation, or histogram specification, techniques. The techniques could control the pixel's statistical distribution of the input image into the desired one through the local and the global modifications, respectively. As the result, the quality of the proposed algorithm is better than those of conventional methods implemented in commercial image editing softwares.

Towards Improving Causality Mining using BERT with Multi-level Feature Networks

  • Ali, Wajid;Zuo, Wanli;Ali, Rahman;Rahman, Gohar;Zuo, Xianglin;Ullah, Inam
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.10
    • /
    • pp.3230-3255
    • /
    • 2022
  • Causality mining in NLP is a significant area of interest, which benefits in many daily life applications, including decision making, business risk management, question answering, future event prediction, scenario generation, and information retrieval. Mining those causalities was a challenging and open problem for the prior non-statistical and statistical techniques using web sources that required hand-crafted linguistics patterns for feature engineering, which were subject to domain knowledge and required much human effort. Those studies overlooked implicit, ambiguous, and heterogeneous causality and focused on explicit causality mining. In contrast to statistical and non-statistical approaches, we present Bidirectional Encoder Representations from Transformers (BERT) integrated with Multi-level Feature Networks (MFN) for causality recognition, called BERT+MFN for causality recognition in noisy and informal web datasets without human-designed features. In our model, MFN consists of a three-column knowledge-oriented network (TC-KN), bi-LSTM, and Relation Network (RN) that mine causality information at the segment level. BERT captures semantic features at the word level. We perform experiments on Alternative Lexicalization (AltLexes) datasets. The experimental outcomes show that our model outperforms baseline causality and text mining techniques.