• Title/Summary/Keyword: Cleansing Algorithm

Search Result 10, Processing Time 0.03 seconds

Automatic Electronic Cleansing in Computed Tomography Colonography Images using Domain Knowledge

  • Manjunath, KN;Siddalingaswamy, PC;Prabhu, GK
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.18
    • /
    • pp.8351-8358
    • /
    • 2016
  • Electronic cleansing is an image post processing technique in which the tagged colonic content is subtracted from colon using CTC images. There are post processing artefacts, like: 1) soft tissue degradation; 2) incomplete cleansing; 3) misclassification of polyp due to pseudo enhanced voxels; and 4) pseudo soft tissue structures. The objective of the study was to subtract the tagged colonic content without losing the soft tissue structures. This paper proposes a novel adaptive method to solve the first three problems using a multi-step algorithm. It uses a new edge model-based method which involves colon segmentation, priori information of Hounsfield units (HU) of different colonic contents at specific tube voltages, subtracting the tagging materials, restoring the soft tissue structures based on selective HU, removing boundary between air-contrast, and applying a filter to clean minute particles due to improperly tagged endoluminal fluids which appear as noise. The main finding of the study was submerged soft tissue structures were absolutely preserved and the pseudo enhanced intensities were corrected without any artifact. The method was implemented with multithreading for parallel processing in a high performance computer. The technique was applied on a fecal tagged dataset (30 patients) where the tagging agent was not completely removed from colon. The results were then qualitatively validated by radiologists for any image processing artifacts.

Data Cleansing Algorithm for reducing Outlier (데이터 오·결측 저감 정제 알고리즘)

  • Lee, Jongwon;Kim, Hosung;Hwang, Chulhyun;Kang, Inshik;Jung, Hoekyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.342-344
    • /
    • 2018
  • This paper shows the possibility to substitute statistical methods such as mean imputation, correlation coefficient analysis, graph correlation analysis for the proposed algorithm, and replace statistician for processing various abnormal data measured in the water treatment process with it. In addition, this study aims to model a data-filtering system based on a recent fractile pattern and a deep learning-based LSTM algorithm in order to improve the reliability and validation of the algorithm, using the open-sourced libraries such as KERAS, THEANO, TENSORFLOW, etc.

  • PDF

An Automatic Setting Method of Data Constraints for Cleansing Data Errors between Business Services (비즈니스 서비스간의 오류 정제를 위한 데이터 제약조건 자동 설정 기법)

  • Lee, Jung-Won
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.3
    • /
    • pp.161-171
    • /
    • 2009
  • In this paper, we propose an automatic method for setting data constraints of a data cleansing service, which is for managing the quality of data exchanged between composite services based on SOA(Service-Oriented Architecture) and enables to minimize human intervention during the process. Because it is impossible to deal with all kinds of real-world data, we focus on business data (i.e. costumer order, order processing) which are frequently used in services such as CRM(Customer Relationship Management) and ERP(Enterprise Resource Planning). We first generate an extended-element vector by extending semantics of data exchanged between composite services and then build a rule-based system for setting data constraints automatically using the decision tree learning algorithm. We applied this rule-based system into the data cleansing service and showed the automation rate over 41% by learning data from multiple registered services in the field of business.

Symbolizing Numbers to Improve Neural Machine Translation (숫자 기호화를 통한 신경기계번역 성능 향상)

  • Kang, Cheongwoong;Ro, Youngheon;Kim, Jisu;Choi, Heeyoul
    • Journal of Digital Contents Society
    • /
    • v.19 no.6
    • /
    • pp.1161-1167
    • /
    • 2018
  • The development of machine learning has enabled machines to perform delicate tasks that only humans could do, and thus many companies have introduced machine learning based translators. Existing translators have good performances but they have problems in number translation. The translators often mistranslate numbers when the input sentence includes a large number. Furthermore, the output sentence structure completely changes even if only one number in the input sentence changes. In this paper, first, we optimized a neural machine translation model architecture that uses bidirectional RNN, LSTM, and the attention mechanism through data cleansing and changing the dictionary size. Then, we implemented a number-processing algorithm specialized in number translation and applied it to the neural machine translation model to solve the problems above. The paper includes the data cleansing method, an optimal dictionary size and the number-processing algorithm, as well as experiment results for translation performance based on the BLEU score.

A Study on the cleansing of water data using LSTM algorithm (LSTM 알고리즘을 이용한 수도데이터 정제기법)

  • Yoo, Gi Hyun;Kim, Jong Rib;Shin, Gang Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.501-503
    • /
    • 2017
  • In the water sector, various data such as flow rate, pressure, water quality and water level are collected during the whole process of water purification plant and piping system. The collected data is stored in each water treatment plant's DB, and the collected data are combined in the regional DB and finally stored in the database server of the head office of the Korea Water Resources Corporation. Various abnormal data can be generated when a measuring instrument measures data or data is communicated over various processes, and it can be classified into missing data and wrong data. The cause of each abnormal data is different. Therefore, there is a difference in the method of detecting the wrong side and the missing side data, but the method of cleansing the data is the same. In this study, a program that can automatically refine missing or wrong data by applying deep learning LSTM (Long Short Term Memory) algorithm will be studied.

  • PDF

Implementation of a data collection system for big data analysis and learning based on infant body temperature data (영유아 체온 데이터 기반 빅데이터 분석 및 학습을 위한 데이터 수집 시스템 구현)

  • Lee, Hyoun-Sup;Heo, Gyeongyong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.577-578
    • /
    • 2021
  • Recently, artificial intelligence systems are being used in various fields. The accuracy of the decision algorithm of artificial intelligence is greatly affected by the amount of learning and the accuracy of the learning data. In the case of the amount of learning, a large amount of data is required because it has a decisive effect on the performance of AI. In this paper, we propose a data collection system for constructing a system that analyzes future conditions and changes in infants' conditions based on the body temperature data of infants and toddlers. The proposed system is a system that collects and transmits data, and it is believed that it can minimize the resource consumption of the server system in existing big data analysis and training data construction.

  • PDF

The Importance of Manpower in Major Education as an Example of Artificial Intelligence Development in Construction (건설 인공지능 개발사례로 보는 전공교육 인력의 중요성)

  • Heo, Seokjae;Lee, Sanghyun;Lee, Seungwon;Kim, Myunghun;Chung, Lan
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2021.11a
    • /
    • pp.223-224
    • /
    • 2021
  • The process before the model learning stage in AI R&D can be subdivided into data collection/cleansing-data purification-data labeling. After that, according to the purpose of development, it goes through a stage of verifying the model by performing learning by using the algorithm of the artificial intelligence model. Several studies describe an important part of AI research as the learning stage, and try to increase the accuracy by changing the structure and layer of the AI model. However, if the refinement and labeling process of the learning data is tailored only to the model format and is not made for the purpose of development, the desired AI model cannot be obtained. The latest research reveals that most AI research failures are the failure of the learning data rather than the structure of the AI model. analyzed.

  • PDF

Application of Social Big Data Analysis for CosMedical Cosmetics Marketing : H Company Case Study (기능성 화장품 마케팅의 소셜 빅데이터 분석 활용 : H사 사례를 중심으로)

  • Hwang, Sin-Hae;Ku, Dong-Young;Kim, Jeoung-Kun
    • Journal of Digital Convergence
    • /
    • v.17 no.7
    • /
    • pp.35-41
    • /
    • 2019
  • This study aims to analyze the cosmedical cosmetics market and the nature of customer through the social big data analysis. More than 80,000 posts were analyzed using R program. After data cleansing, keyword frequency analysis and association analysis were performed to understand customer needs and competitor positioning, formulated several implications for marketing strategy sophistication and implementation. Analysis results show that "prevention" is a new and essential attribute for appealing target customers. The expansion of the product line for the gift market is also suggested. It has been shown that there is a high correlation with products that can be complementary to each other. In addition to the traditional marketing technique, the social big data analysis based on evidence was useful in deriving the characteristics of the customers and the market that had not been identified before. Word2vec algorithm will be beneficial to find additional.

A Dynamic Orchestration Framework for Supporting Sustainable Services in IT Ecosystem (IT 생태계의 지속적인 운영을 위한 동적 오케스트레이션 프레임워크)

  • Park, Soo Jin
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.12
    • /
    • pp.549-564
    • /
    • 2017
  • Not only services that are provided by a single system have been various with the development of the Internet of Things and autonomous software but also new services that are not possible before are provided through collaboration between systems. The collaboration between autonomous systems is similar to the ecosystem configuration in terms of biological viewpoints. Thus, it is called the IT Ecosystem, and this concept has arisen newly in recent years. The IT Ecosystem refers to a concept that achieves a mission of each of a number of heterogeneous systems rather than a single system utilizing their own autonomy as well as achieving the objectives of the overall system simultaneously in order to meet a single common goal. In our previous study, we proposed architecture of elementary level and as well as basic several meta-models to implement the IT Ecosystem. This paper proposes comprehensive reference architecture framework to implement the IT Ecosystem by cleansing the previous study. Among them, a utility function based on cost-benefit model is proposed to solve the dynamic re-configuration problem of system components. Furthermore, a measure of using genetic algorithm is proposed as a solution to reduce the dynamic re-configuration overhead that is increased exponentially according to the expansion of the number of entities of components in the IT Ecosystem. Finally, the utilization of the proposed orchestration framework is verified quantitatively through probable case studies on IT Ecosystem for unmanned forestry management.

Hate Speech Detection Using Modified Principal Component Analysis and Enhanced Convolution Neural Network on Twitter Dataset

  • Majed, Alowaidi
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.112-119
    • /
    • 2023
  • Traditionally used for networking computers and communications, the Internet has been evolving from the beginning. Internet is the backbone for many things on the web including social media. The concept of social networking which started in the early 1990s has also been growing with the internet. Social Networking Sites (SNSs) sprung and stayed back to an important element of internet usage mainly due to the services or provisions they allow on the web. Twitter and Facebook have become the primary means by which most individuals keep in touch with others and carry on substantive conversations. These sites allow the posting of photos, videos and support audio and video storage on the sites which can be shared amongst users. Although an attractive option, these provisions have also culminated in issues for these sites like posting offensive material. Though not always, users of SNSs have their share in promoting hate by their words or speeches which is difficult to be curtailed after being uploaded in the media. Hence, this article outlines a process for extracting user reviews from the Twitter corpus in order to identify instances of hate speech. Through the use of MPCA (Modified Principal Component Analysis) and ECNN, we are able to identify instances of hate speech in the text (Enhanced Convolutional Neural Network). With the use of NLP, a fully autonomous system for assessing syntax and meaning can be established (NLP). There is a strong emphasis on pre-processing, feature extraction, and classification. Cleansing the text by removing extra spaces, punctuation, and stop words is what normalization is all about. In the process of extracting features, these features that have already been processed are used. During the feature extraction process, the MPCA algorithm is used. It takes a set of related features and pulls out the ones that tell us the most about the dataset we give itThe proposed categorization method is then put forth as a means of detecting instances of hate speech or abusive language. It is argued that ECNN is superior to other methods for identifying hateful content online. It can take in massive amounts of data and quickly return accurate results, especially for larger datasets. As a result, the proposed MPCA+ECNN algorithm improves not only the F-measure values, but also the accuracy, precision, and recall.