• 제목/요약/키워드: random data analysis

검색결과 1,679건 처리시간 0.031초

Rank Tests for Multivariate Linear Models in the Presence of Missing Data

  • Lee, Jae-Won;David M. Reboussin
    • Journal of the Korean Statistical Society
    • /
    • 제26권3호
    • /
    • pp.319-332
    • /
    • 1997
  • The application of multivariate linear rank statistics to data with item nonresponse is considered. Only a modest extension of the complete data techniques is required when the missing data may be thought of as a random sample, and an appropriate modification of the covariances is derived. A proof of the asymptotic multivariate normality is given. A review of some related results in the literature is presented and applications including longitudinal and repeated measures designs are discussed.

  • PDF

강건 실험계획법을 이용한 열화자료의 분석 (Analysis of Degradation Data Using Robust Experimental Design)

  • 서순근;하천수
    • 품질경영학회지
    • /
    • 제32권1호
    • /
    • pp.113-129
    • /
    • 2004
  • The reliability of the product can be improved by making the product less sensitive to noises. Especially, it Is important to make products robust against various noise factors encountered in production and field environments. In this paper, the phenomenon of degradation assumes a simple random coefficient degradation model to present analysis procedures of degradation data for robust experimental design. To alleviate weak points of previous studies, such as Taguchi's, Wasserman's, and pseudo failure time methods, novel techniques for analysis of degradation data using the cross array that regards amount of degradation as a dynamic characteristic for time are proposed. Analysis approach for degradation data using robust experimental design are classified by assumptions on parametric or nonparametric degradation rate(or slope). Also, a simulation study demonstrates the superiority of proposed methods over some previous works.

Prediction of spatio-temporal AQI data

  • KyeongEun Kim;MiRu Ma;KyeongWon Lee
    • Communications for Statistical Applications and Methods
    • /
    • 제30권2호
    • /
    • pp.119-133
    • /
    • 2023
  • With the rapid growth of the economy and fossil fuel consumption, the concentration of air pollutants has increased significantly and the air pollution problem is no longer limited to small areas. We conduct statistical analysis with the actual data related to air quality that covers the entire of South Korea using R and Python. Some factors such as SO2, CO, O3, NO2, PM10, precipitation, wind speed, wind direction, vapor pressure, local pressure, sea level pressure, temperature, humidity, and others are used as covariates. The main goal of this paper is to predict air quality index (AQI) spatio-temporal data. The observations of spatio-temporal big datasets like AQI data are correlated both spatially and temporally, and computation of the prediction or forecasting with dependence structure is often infeasible. As such, the likelihood function based on the spatio-temporal model may be complicated and some special modelings are useful for statistically reliable predictions. In this paper, we propose several methods for this big spatio-temporal AQI data. First, random effects with spatio-temporal basis functions model, a classical statistical analysis, is proposed. Next, neural networks model, a deep learning method based on artificial neural networks, is applied. Finally, random forest model, a machine learning method that is closer to computational science, will be introduced. Then we compare the forecasting performance of each other in terms of predictive diagnostics. As a result of the analysis, all three methods predicted the normal level of PM2.5 well, but the performance seems to be poor at the extreme value.

분할구자료의 사영분석 (Projection analysis for split-plot data)

  • 최재성
    • 응용통계연구
    • /
    • 제30권3호
    • /
    • pp.335-344
    • /
    • 2017
  • 본 논문은 분할구실험으로 부터 주어진 자료분석을 위해 사영을 이용하는 방법을 다루고 있다. 분할구 실험의 특성으로 서로 다른 크기의 실험단위를 나타내는 오차항과 처리에 포함된 확률효과가 존재할 때 이들 분산성분의 추정에 사영을 이용하여 구하는 방법을 제시하고 있다. 분산성분 추정을 위해 잔차벡터에 대한 확률모형의 구축을 다루고 있다. 고정효과를 제외한 확률효과에 따른 제곱합의 계산을 위해 상수적합법이 적용되고 있다. 적률법에 의한 분산성분 추정을 위해 변동량의 기댓값 계산에 합성법을 이용한다. 고정효과들의 선형함수로 주어지는 추정가능함수에 관한 추정을 다루고 있다.

Applying the Nash Equilibrium to Constructing Covert Channel in IoT

  • Ho, Jun-Won
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제13권1호
    • /
    • pp.243-248
    • /
    • 2021
  • Although many different types of covert channels have been suggested in the literature, there are little work in directly applying game theory to building up covert channel. This is because researchers have mainly focused on tailoring game theory for covert channel analysis, identification, and covert channel problem solving. Unlike typical adaptation of game theory to covert channel, we show that game theory can be utilized to establish a new type of covert channel in IoT devices. More specifically, we propose a covert channel that can be constructed by utilizing the Nash Equilibrium with sensor data collected from IoT devices. For covert channel construction, we set random seed to the value of sensor data and make payoff from random number created by running pseudo random number generator with the configured random seed. We generate I × J (I ≥ 2, J ≥ 2) matrix game with these generated payoffs and attempt to obtain the Nash Equilibrium. Covert channel construction method is distinctly determined in accordance with whether or not to acquire the Nash Equilibrium.

A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest

  • Aydadenta, Husna;Adiwijaya, Adiwijaya
    • Journal of Information Processing Systems
    • /
    • 제14권5호
    • /
    • pp.1167-1175
    • /
    • 2018
  • Microarray data plays an essential role in diagnosing and detecting cancer. Microarray analysis allows the examination of levels of gene expression in specific cell samples, where thousands of genes can be analyzed simultaneously. However, microarray data have very little sample data and high data dimensionality. Therefore, to classify microarray data, a dimensional reduction process is required. Dimensional reduction can eliminate redundancy of data; thus, features used in classification are features that only have a high correlation with their class. There are two types of dimensional reduction, namely feature selection and feature extraction. In this paper, we used k-means algorithm as the clustering approach for feature selection. The proposed approach can be used to categorize features that have the same characteristics in one cluster, so that redundancy in microarray data is removed. The result of clustering is ranked using the Relief algorithm such that the best scoring element for each cluster is obtained. All best elements of each cluster are selected and used as features in the classification process. Next, the Random Forest algorithm is used. Based on the simulation, the accuracy of the proposed approach for each dataset, namely Colon, Lung Cancer, and Prostate Tumor, achieved 85.87%, 98.9%, and 89% accuracy, respectively. The accuracy of the proposed approach is therefore higher than the approach using Random Forest without clustering.

Efficient hardware implementation and analysis of true random-number generator based on beta source

  • Park, Seongmo;Choi, Byoung Gun;Kang, Taewook;Park, Kyunghwan;Kwon, Youngsu;Kim, Jongbum
    • ETRI Journal
    • /
    • 제42권4호
    • /
    • pp.518-526
    • /
    • 2020
  • This paper presents an efficient hardware random-number generator based on a beta source. The proposed generator counts the values of "0" and "1" and provides a method to distinguish between pseudo-random and true random numbers by comparing them using simple cumulative operations. The random-number generator produces labeled data indicating whether the count value is a pseudo- or true random number according to its bit value based on the generated labeling data. The proposed method is verified using a system based on Verilog RTL coding and LabVIEW for hardware implementation. The generated random numbers were tested according to the NIST SP 800-22 and SP 800-90B standards, and they satisfied the test items specified in the standard. Furthermore, the hardware is efficient and can be used for security, artificial intelligence, and Internet of Things applications in real time.

감성분석과 Word2vec을 이용한 비정형 품질 데이터 분석 (Informal Quality Data Analysis via Sentimental analysis and Word2vec method)

  • 이진욱;유국현;문병민;배석주
    • 품질경영학회지
    • /
    • 제45권1호
    • /
    • pp.117-128
    • /
    • 2017
  • Purpose: This study analyzes automobile quality review data to develop alternative analytical method of informal data. Existing methods to analyze informal data are based mainly on the frequency of informal data, however, this research tries to use correlation information of each informal data. Method: After sentimental analysis to acquire the user information for automobile products, three classification methods, that is, $na{\ddot{i}}ve$ Bayes, random forest, and support vector machine, were employed to accurately classify the informal user opinions with respect to automobile qualities. Additionally, Word2vec was applied to discover correlated information about informal data. Result: As applicative results of three classification methods, random forest method shows most effective results compared to the other classification methods. Word2vec method manages to discover closest relevant data with automobile components. Conclusion: The proposed method shows its effectiveness in terms of accuracy and sensitivity on the analysis of informal quality data, however, only two sentiments (positive or negative) can be categorized due to human errors. Further studies are required to derive more sentiments to accurately classify informal quality data. Word2vec method also shows comparative results to discover the relevance of components precisely.

Restricted maximum likelihood estimation of a censored random effects panel regression model

  • Lee, Minah;Lee, Seung-Chun
    • Communications for Statistical Applications and Methods
    • /
    • 제26권4호
    • /
    • pp.371-383
    • /
    • 2019
  • Panel data sets have been developed in various areas, and many recent studies have analyzed panel, or longitudinal data sets. Maximum likelihood (ML) may be the most common statistical method for analyzing panel data models; however, the inference based on the ML estimate will have an inflated Type I error because the ML method tends to give a downwardly biased estimate of variance components when the sample size is small. The under estimation could be severe when data is incomplete. This paper proposes the restricted maximum likelihood (REML) method for a random effects panel data model with a censored dependent variable. Note that the likelihood function of the model is complex in that it includes a multidimensional integral. Many authors proposed to use integral approximation methods for the computation of likelihood function; however, it is well known that integral approximation methods are inadequate for high dimensional integrals in practice. This paper introduces to use the moments of truncated multivariate normal random vector for the calculation of multidimensional integral. In addition, a proper asymptotic standard error of REML estimate is given.

블록 암호 연산 모드 RBF(Random Block Feedback)의 알려진/선택 평문 공격에 대한 안전성 비교 분석 (Safety Comparison Analysis Against Known/Chosen Plaintext Attack of RBF (Random Block Feedback) Mode to Other Block Cipher Modes of Operation)

  • 김윤정;이강
    • 한국통신학회논문지
    • /
    • 제39B권5호
    • /
    • pp.317-322
    • /
    • 2014
  • 데이타 보안과 무결성은 유무선 통신 환경에서 데이터 전송 시에 중요한 요소이다. 대량의 데이터는 전송 전에, 통상 암호 연산 모드를 이용한 블록 암호 알고리즘에 의하여 암호화된다. ECB, CBC 등의 기존 연산 모드 외에 블록 암호 연산 모드로 RBF 모드가 제안된 바 있다. 본 논문에서는, 알려진 평문 공격 (known plaintext attack) 및 선택 평문 공격 (chosen plaintext attack)에 대한, RBF 모드의 안전성을 기존 모드들과 비교 분석한 내용을 소개한다. 분석 결과, 기존의 연산 모드들이 알려진/선택 평문 공격에 취약한데 반하여, RBF 모드는 이들 공격에 안전함을 알 수 있었다.