• Title/Summary/Keyword: Data cleansing

Search Result 73, Processing Time 0.028 seconds

Developing dirty data cleansing service between SOA-based services (SOA 기반 서비스 사이의 오류 데이터 정제 서비스 개발)

  • Ji, Eun-Mi;Choi, Byoung-Ju;Lee, Jung-Won
    • The KIPS Transactions:PartD
    • /
    • v.14D no.7
    • /
    • pp.829-840
    • /
    • 2007
  • Dirty Data Cleansing technique so far have aimed to integrate large amount of data from various sources and manage data quality resided in DB so that it enables to extract meaningful information. Prompt response to varying environment is required in order to persistently survive in rapidly changing business environment and the age of limitless competition. As system requirement is recently getting complexed, Service Oriented Architecture is proliferated for the purpose of integration and implementation of massive distributed system. Therefore, SOA necessarily needs Data Exchange among services through Data Cleansing Technique. In this paper, we executed quality management of XML data which is transmitted through events between services while they are integrated as a sole system. As a result, we developed Dirty Data Cleansing Service based on SOA as focusing on data cleansing between interactive services rather than cleansing based on detection of data error in DB already integrated.

A Study on Data Cleansing Techniques for Word Cloud Analysis of Text Data (텍스트 데이터 워드클라우드 분석을 위한 데이터 정제기법에 관한 연구)

  • Lee, Won-Jo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.4
    • /
    • pp.745-750
    • /
    • 2021
  • In Big data visualization analysis of unstructured text data, raw data is mostly large-capacity, and analysis techniques cannot be applied without cleansing it unstructured. Therefore, from the collected raw data, unnecessary data is removed through the first heuristic cleansing process and Stopwords are removed through the second machine cleansing process. Then, the frequency of the vocabulary is calculated, visualized using the word cloud technique, and key issues are extracted and informationalized, and the results are analyzed. In this study, we propose a new Stopword cleansing technique using an external Stopword set (DB) in Python word cloud, and derive the problems and effectiveness of this technique through practical case analysis. And, through this verification result, the utility of the practical application of word cloud analysis applying the proposed cleansing technique is presented.

A Study on the Use of Stopword Corpus for Cleansing Unstructured Text Data (비정형 텍스트 데이터 정제를 위한 불용어 코퍼스의 활용에 관한 연구)

  • Lee, Won-Jo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.891-897
    • /
    • 2022
  • In big data analysis, raw text data mostly exists in various unstructured data forms, so it becomes a structured data form that can be analyzed only after undergoing heuristic pre-processing and computer post-processing cleansing. Therefore, in this study, unnecessary elements are purified through pre-processing of the collected raw data in order to apply the wordcloud of R program, which is one of the text data analysis techniques, and stopwords are removed in the post-processing process. Then, a case study of wordcloud analysis was conducted, which calculates the frequency of occurrence of words and expresses words with high frequency as key issues. In this study, to improve the problems of the "nested stopword source code" method, which is the existing stopword processing method, using the word cloud technique of R, we propose the use of "general stopword corpus" and "user-defined stopword corpus" and conduct case analysis. The advantages and disadvantages of the proposed "unstructured data cleansing process model" are comparatively verified and presented, and the practical application of word cloud visualization analysis using the "proposed external corpus cleansing technique" is presented.

Influences of Perceived Behavior Control and Self-efficacy on Proper Hand Cleansing and Hand Washing Practices among Pre-practicum Nursing Students (임상실습 전 간호대학생의 올바른 손씻기와 실천에 대한 지각된 통제행위와 자기효능감의 영향)

  • Park, Kyung-Yeon
    • Journal of Korean Academy of Fundamentals of Nursing
    • /
    • v.19 no.3
    • /
    • pp.313-321
    • /
    • 2012
  • Purpose: The purpose of the study was to investigate hand washing practice and proper hand cleansing among first and second year nursing students who are prone to be exposed to nosocomial infections, and to identify the influence of perceived behavior control and self-efficacy on hand washing practices and proper hand cleansing. Method: Data for 91 students were collected from a nursing college in a metropolitan city in Korea. Data were analyzed using descriptive, t-test, one way ANOVA, Pearson correlation coefficient, and multiple regression with SPSS/WIN 19.0. Result: The mean score for hand washing practice was 38.35 out of a possible score of 48, and the mean sore for proper hand cleansing was 18.63 out of a possible score of 28. The significant factors affecting student hand washing practice were 'residential type' (p=.016), 'perceived behavior control' (p=.021), and 'self-efficacy' (p=.033) which explained 19.9% of the variance. The significant factors affecting proper hand cleansing by the students were 'perceived behavior control' (p<.001) and 'regular exercise' (p=.026) which explained 29.8% of the variance. Conclusion: These results indicate a need for education programs on hand washing including strategies to improve perceived behavior control and self-efficacy to promote more effective hand washing practices.

An Automatic Setting Method of Data Constraints for Cleansing Data Errors between Business Services (비즈니스 서비스간의 오류 정제를 위한 데이터 제약조건 자동 설정 기법)

  • Lee, Jung-Won
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.3
    • /
    • pp.161-171
    • /
    • 2009
  • In this paper, we propose an automatic method for setting data constraints of a data cleansing service, which is for managing the quality of data exchanged between composite services based on SOA(Service-Oriented Architecture) and enables to minimize human intervention during the process. Because it is impossible to deal with all kinds of real-world data, we focus on business data (i.e. costumer order, order processing) which are frequently used in services such as CRM(Customer Relationship Management) and ERP(Enterprise Resource Planning). We first generate an extended-element vector by extending semantics of data exchanged between composite services and then build a rule-based system for setting data constraints automatically using the decision tree learning algorithm. We applied this rule-based system into the data cleansing service and showed the automation rate over 41% by learning data from multiple registered services in the field of business.

Development of a Smooth Colon Surface Restoration Method for Electronic Colon Cleansing (전자적 장세척을 위한 부드러운 장표면 복원 방법 개발)

  • Kim, Seung-Hwan;Kim, Dong-Sung
    • Journal of Biomedical Engineering Research
    • /
    • v.32 no.3
    • /
    • pp.251-256
    • /
    • 2011
  • Virtual colonoscopy is favored over conventional colonoscopy because its non-invasive procedure can avoid complications that may happen in a conventional approach and because it can cleanse colon electronically instead of uncomfortable conventional colon cleansing. Electronic Colon Cleansing(ECC) has to deal with not only removing tagged fecal material but also recovering Partial Volume Effect(PVE) due to tagging material. This paper proposes an ECC method restoring inherent natural PVE while previous approaches focused only on reducing PVE due to tagged fecal material. The proposed method reduces PVE using 3-dimensional adaptive density correction and then replaces tagged fecal material into air. Next, it generates natural PVE for the replaced air adjacent to soft tissue and finally makes smooth transition of gray values for soft tissue adjacent to the replaced air. The proposed method applied to eleven patient data, and showed promising results.

Detection and Correction Method of Erroneous Data Using Quantile Pattern and LSTM

  • Hwang, Chulhyun;Kim, Hosung;Jung, Hoekyung
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.4
    • /
    • pp.242-247
    • /
    • 2018
  • The data of K-Water waterworks is collected from various sensors and used as basic data for the operation and analysis of various devices. In this way, the importance of the sensor data is very high, but it contains misleading data due to the characteristics of the sensor in the external environment. However, the cleansing method for the missing data is concentrated on the prediction of the missing data, so the research on the detection and prediction method of the missing data is poor. This is a study to detect wrong data by converting collected data into quintiles and patterning them. It is confirmed that the accuracy of detecting false data intentionally generated from real data is higher than that of the conventional method in all cases. Future research we will prove the proposed system's efficiency and accuracy in various environments.

Developing the SOA-based Dirty Data Cleansing Service (SOA에서의 오류 데이터 정제를 위한 서비스 개발)

  • Ji, Eun-Mi;Choi, Byoung-Ju;Lee, Jung-Won
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2007.05a
    • /
    • pp.649-652
    • /
    • 2007
  • 최근 e-Business 어플리케이션을 통합하기 위한 개념으로 서비스 지향구조 (Service Oriented Architecture)에 기본 원리를 둔 분산 소프트웨어 통합 기술이 널리 확산되고 있다. 따라서 각 서비스간의 데이터 정제기법을 통한 신뢰성 있는 데이터 교환은 필수적 요소로 자리 잡고 있다. 본 논문에서는 시스템에 상호작용 시 교환되는 데이터의 오류를 탐지하고 정제하기 위한 서비스로 사용자의 데이터 제약조건을 결합 시키는 변환 과정, 오류를 탐지하는 탐지과정, 탐지된 오류를 정제하고, 정보를 보여주는 정제과정으로 이루어진 오류 데이터 정제 서비스(DDCS; Dirty Data Cleansing Service)를 구현하고, 이를 이용하여 SOA기반 ESB상에서 통합된 시스템들 간에 상호 작용하는 오류 데이터 정제를 보장하는 서비스를 개발한다.

  • PDF

An Electronic Colon Cleansing Method using a Patient Colon CT Profile (환자 대장 CT 프로파일을 이용한 전자적 장세척 방법)

  • Kim, Han-Byul;Kim, Dong-Sung
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.8
    • /
    • pp.493-500
    • /
    • 2008
  • This paper proposes an electronic colon cleansing method using a patient CT profile for a virtual colonoscopy. The proposed method extracts the colon using cubic seeded region growing, and removes tagged materials adjacent to the colon. Residuals produced by a partial volume effect at the boundary of air-tagged material are deleted, and the removed soft tissue pixels due to a partial volume effect at the boundary of tagged material-soft tissue are recovered using a patient CT profile. The proposed method was applied to 16 virtual colonoscopy patient data sets, and produced promising results by a subjective evaluation of a radiologist and by a quantitative evaluation of a computer-aided diagnosis system.

Optimization of Surfactant Mixture Composition for Cleansing Using Mixture Experiment Design (혼합물 실험 계획법을 활용한 세정용 계면활성제 혼합물 조성의 최적화)

  • Song, Maria;Jin, Byung Suk
    • Applied Chemistry for Engineering
    • /
    • v.32 no.5
    • /
    • pp.574-580
    • /
    • 2021
  • The main goal of this study was to find an optimal surfactant mixture composition for the development of the best performing cleansing products. Three different surfactants including sodium cocoyl alaninate (SCoA), cocamidopropyl betaine (CPB), and decyl glucoside (DG) were selected, which showed excellent properties in detergency, foaming height, and contamination rate through preliminary experiments. The experiments by simplex centroid design matrix for surfactant mixtures were performed, and the regression analysis was conducted with the experimental data. Surface response model equations, which is statistically significant (p < 0.05), were obtained. The optimal composition of the surfactant mixture was also determined as SCoA (0.22), CPB (0.78), and DG(0.00) from simultaneous optimization of three response variables.