• 제목/요약/키워드: Sensitive data

검색결과 2,488건 처리시간 0.029초

STATISTICALLY PREPROCESSED DATA BASED PARAMETRIC COST MODEL FOR BUILDING PROJECTS

  • Sae-Hyun Ji;Moonseo Park;Hyun-Soo Lee
    • 국제학술발표논문집
    • /
    • The 3th International Conference on Construction Engineering and Project Management
    • /
    • pp.417-424
    • /
    • 2009
  • For a construction project to progress smoothly, effective cost estimation is vital, particularly in the conceptual and schematic design stages. In these early phases, despite the fact that initial estimates are highly sensitive to changes in project scope, owners require accurate forecasts which reflect their supplying information. Thus, cost estimators need effective estimation strategies. Practically, parametric cost estimates are the most commonly used method in these initial phases, which utilizes historical cost data (Karshenas 1984, Kirkham 2007). Hence, compilation of historical data regarding appropriate cost variance governing parameters is a prime requirement. However, precedent practice of data mining (data preprocessing) for denoising internal errors or abnormal values is needed before compilation. As an effort to deal with this issue, this research proposed a statistical methodology for data preprocessing and verified that data preprocessing has a positive impact on the enhancement of estimate accuracy and stability. Moreover, Statistically Preprocessed data Based Parametric (SPBP) cost models are developed based on multiple regression equations and verified their effectiveness compared with conventional cost models.

  • PDF

Local Influence of the Quasi-likelihood Estimators in Generalized Linear Models

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • 제14권1호
    • /
    • pp.229-239
    • /
    • 2007
  • We present a diagnostic method for the quasi-likelihood estimators in generalized linear models. Since these estimators can be usually obtained by iteratively reweighted least squares which are well known to be very sensitive to unusual data, a diagnostic step is indispensable to analysis of data. We extend the local influence approach based on the maximum likelihood function to that on the quasi-likelihood function. Under several perturbation schemes local influence diagnostics are derived. An illustrative example is given and we compare the results provided by local influence and deletion.

The Three-Stage Cluster Unrelated Question Model

  • Ahn, Seung-Chul;Lee, Gi-Sung
    • Journal of the Korean Data and Information Science Society
    • /
    • 제14권1호
    • /
    • pp.55-65
    • /
    • 2003
  • In this study, we systemize the theoretical validity for applying unrelated question model to three-stage cluster sampling method and derive the estimate and it's variance of sensitive parameter. We derive the minimum variance form under the optimal values of the subsample sizes when the cost are fixed. Under the some given precision, we obtain the optimal values of the subsample sizes and derive the minimum cost form by using them.

  • PDF

A Combined Procedure of Direct Question Method and Modified Randomized Response Technique for Estimating Population Proportion

  • Kim, Hyuk-Joo
    • Journal of the Korean Data and Information Science Society
    • /
    • 제14권4호
    • /
    • pp.877-887
    • /
    • 2003
  • A two-stage procedure is proposed to estimate the population proportion of a sensitive group. The proposed procedure is obtained by combining the direct question method and a modified randomized response technique. It is verified that the proposed procedure is more efficient than existing methods under some mild conditions.

  • PDF

An Improved K-means Document Clustering using Concept Vectors

  • Shin, Yang-Kyu
    • Journal of the Korean Data and Information Science Society
    • /
    • 제14권4호
    • /
    • pp.853-861
    • /
    • 2003
  • An improved K-means document clustering method has been presented, where a concept vector is manipulated for each cluster on the basis of cosine similarity of text documents. The concept vectors are unit vectors that have been normalized on the n-dimensional sphere. Because the standard K-means method is sensitive to initial starting condition, our improvement focused on starting condition for estimating the modes of a distribution. The improved K-means clustering algorithm has been applied to a set of text documents, called Classic3, to test and prove efficiency and correctness of clustering result, and showed 7% improvements in its worst case.

  • PDF

Robust Singular Value Decomposition BaLsed on Weighted Least Absolute Deviation Regression

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • 제17권6호
    • /
    • pp.803-810
    • /
    • 2010
  • The singular value decomposition of a rectangular matrix is a basic tool to understand the structure of the data and particularly the relationship between row and column factors. However, conventional singular value decomposition used the least squares method and is not robust to outliers. We propose a simple robust singular value decomposition algorithm based on the weighted least absolute deviation which is not sensitive to leverage points. Its implementation is easy and the computation time is reasonably low. Numerical results give the data structure and the outlying information.

Class-Balanced Loss를 이용한 이미지 분류 (Image Classification using Class-Balanced Loss)

  • 박지희;황원준
    • 한국방송∙미디어공학회:학술대회논문집
    • /
    • 한국방송∙미디어공학회 2022년도 추계학술대회
    • /
    • pp.164-166
    • /
    • 2022
  • Long-tail problem은 class 별로 sample의 개수에 차이가 있어 성능에 안 좋은 영향을 미치는 것을 말한다. 본 논문에서는 cost-sensitive learning 중 Class-Balanced Loss를 이용해 성능을 개선하여 Long-tail problem을 해결하려고 한다. 먼저, balanced data set과 imbalanced data set의 성능 차이를 살펴보도록 할 것이다. 그 후, Class-Balanced Loss를 3가지 버전으로 이용해 그 성능을 측정하고 분석해 볼 것이다.

  • PDF

한강하류부 수질의 통계학적 해석 (Statistical Analysis of Water Quality in the Downstream of the Han River)

  • 백경원;정용태;한건연;송재우
    • 물과 미래
    • /
    • 제29권2호
    • /
    • pp.179-190
    • /
    • 1996
  • 한강하류부 수질의 통계학적 해석을 통하여 수질 시계열자료의 기본 통계특성치, 지점별 및 계절별 변동성을 검토하였으며, 유량과 수질인자간의 상관성 분석을 실시하였다. 본류의 주요 6개 지점 및 3개 지류에 대한 통계특성치와 적정분포형을 산정하여 제시하였으며, 시간의존성 및 계절성을 검토하여 제시하였다. 또한, 수질 항목간의 상관성 검토를 통하여 상관성이 높은 수질, 항목간, 그리고 지점간의 상관식을 제시하였다. 추계학적 모의모형의 적용가능성을 확인하였으며, DO 항목은 전 지점간에 높은 상관성을 가지고 있었다. 유량과의 상관관계 검토에 있어서 DO, SS 항목은 유량보다는 수온에 민감하였으며, BOD, COD 항목은 유량이 적은 갈수기에는 유량에 민감한 것으로 나타났다. 수온에 밀접한 영향을 받는 DO 항목외에도 BOD, COD 항목은 계절적인 주기성을 가지고 있었으며, 상호상관 분석결과 DO, BOD, COD 항목 외의 수질 항목들에서도 각 수질 항목들에 내재된 주기성을 찾아볼 수 있었다.

  • PDF

LINK 블록체인을 적용한 차량용 블랙박스 시스템 (Vehicle black box system with LINK blockchain)

  • 안규황;원태연;박상민;장경배;서화정
    • 한국정보통신학회논문지
    • /
    • 제23권8호
    • /
    • pp.1018-1023
    • /
    • 2019
  • 2010년도를 기점으로 차량용 블랙박스는 많은 사람들에게 보급되었음에도 불구하고 차량 사고 현장 기록물이 존재 하지 않거나 가해자가 고의적으로 영상 데이터를 삭제할 경우 피해자가 속출한다. 블록체인의 가장 큰 장점은 데이터 분산 저장으로 데이터 수정 및 삭제가 불가능하다는 점이며, 가장 큰 단점은 민감한 데이터 역시 분산 저장 된다는 점이다. 본 논문은 해당 장점을 이용해 블랙박스에 블록체인을 도입하여 공유 된 영상 데이터로 사고를 입증하며, 블록체인과 private 서버를 연동하여 기존에 블록체인에 저장되는 민감 정보를 private 서버에 저장하여 블록체인의 단점인 개인정보유출 문제를 해결하고자 한다. 또한 LINK 블록체인과 private 서버를 연동하는 코드(깃허브)와 데모영상(유튜브)을 본 논문에 첨부하였다.

섬광계를 이용한 비균질 도시 지표에서의 현열속 산정 (LAS-Derived Determination of Surface-Layer Sensible Heat Flux over a Heterogeneous Urban Area)

  • 이상현
    • 대기
    • /
    • 제25권2호
    • /
    • pp.193-203
    • /
    • 2015
  • A large aperture scintillometer (LAS) was deployed with an optical path length of 2.1 km to estimate turbulent sensible heat flux (${\mathcal{Q}}_H$) over a highly heterogeneous urban area. Scintillation measurements were conducted during cold season in November and December 2013, and the daytime data of 14 days were used in the analysis after quality control processes. The LAS-derived ${\mathcal{Q}}_H$ show reasonable temporal variation ranging $20{\sim}160W\;m^{-2}$ in unstable atmospheric conditions, and well compare with the measured net radiation. The LAS footprint analysis suggests that ${\mathcal{Q}}_H$ can be relatively high when the newly built-up urban area has high source contribution of the turbulent flux in the study area ('northwesterly winds'). Sensitivity tests show that the LAS-derived ${\mathcal{Q}}_H$ are highly sensitive to non-dimensional similarity function for temperature structure function parameter, but relatively less sensitive to surface aerodynamic parameters and meteorological variables (temperature and wind speed). A lower Bowen ratio also has a significant influence on the flux estimation. Overall uncertainty of the estimated daytime ${\mathcal{Q}}_H$ is expected within about 20% at an upper limit for the analysis data. It is also found that stable atmospheric conditions can be poorly determined when the scintillometry technique is applied over the highly heterogeneous urban area.