• Title/Summary/Keyword: Input preprocessing

Search Result 298, Processing Time 0.031 seconds

Analyses of the Non-Examinees' Characteristics for the Effective Health Screening Management (효율적 건강검진관리를 위한 미수검자의 특성 분석 - 건강보험 지역 가입자 중심으로 -)

  • Lee, Ae-Kyung;Lee, Sun-Mi;Park, Il-Su
    • Health Policy and Management
    • /
    • v.16 no.1
    • /
    • pp.54-72
    • /
    • 2006
  • This study was conducted as the primary work to develop a customer relationship management (CRM) system to improve the performance of health screening programs. The specific aims of the study was to identify and classify the characteristics of the people who did not receive their health screening using decision trees and to propose management strategies according to their characteristics identified. The data on a total of 5,102,761 subjects of health screening provided by the National Health Insurance Program in the year of 2002 were used. The target variable was whether they underwent their health screening. The input variables included a total of 27. The SAS 9.1 version was used for data preprocessing and statistical analyses. SAS Enterprise Miner was used to develop the decision trees model. The decision trees identified the factors greatly affecting the health screening. In the non-disease group, the highest rate of non-examinees was characterized by: no experience of receiving a health screen, household's age, non-insured episode for the last one year, and patients' age. In the disease group, the one showing the highest rate of non-examinees was characterized by: no experience of receiving a health screening, no experience of going to public health center or midwife clinic for the last one year, and examinees' age. Developing CRM systems for health screening management taking into account the individual characteristics would be considerably helpful to increase the rate of receiving health screening.

Utilization of Syllabic Nuclei Location in Korean Speech Segmentation into Phonemic Units (음절핵의 위치정보를 이용한 우리말의 음소경계 추출)

  • 신옥근
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.5
    • /
    • pp.13-19
    • /
    • 2000
  • The blind segmentation method, which segments input speech data into recognition unit without any prior knowledge, plays an important role in continuous speech recognition system and corpus generation. As no prior knowledge is required, this method is rather simple to implement, but in general, it suffers from bad performance when compared to the knowledge-based segmentation method. In this paper, we introduce a method to improve the performance of a blind segmentation of Korean continuous speech by postprocessing the segment boundaries obtained from the blind segmentation. In the preprocessing stage, the candidate boundaries are extracted by a clustering technique based on the GLR(generalized likelihood ratio) distance measure. In the postprocessing stage, the final phoneme boundaries are selected from the candidates by utilizing a simple a priori knowledge on the syllabic structure of Korean, i.e., the maximum number of phonemes between any consecutive nuclei is limited. The experimental result was rather promising : the proposed method yields 25% reduction of insertion error rate compared that of the blind segmentation alone.

  • PDF

Classification of ratings in online reviews (온라인 리뷰에서 평점의 분류)

  • Choi, Dongjun;Choi, Hosik;Park, Changyi
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.845-854
    • /
    • 2016
  • Sentiment analysis or opinion mining is a technique of text mining employed to identify subjective information or opinions of an individual from documents in blogs, reviews, articles, or social networks. In the literature, only a problem of binary classification of ratings based on review texts in an online review. However, because there can be positive or negative reviews as well as neutral reviews, a multi-class classification will be more appropriate than the binary classification. To this end, we consider the multi-class classification of ratings based on review texts. In the preprocessing stage, we extract words related with ratings using chi-square statistic. Then the extracted words are used as input variables to multi-class classifiers such as support vector machines and proportional odds model to compare their predictive performances.

For Improving Security Log Big Data Analysis Efficiency, A Firewall Log Data Standard Format Proposed (보안로그 빅데이터 분석 효율성 향상을 위한 방화벽 로그 데이터 표준 포맷 제안)

  • Bae, Chun-sock;Goh, Sung-cheol
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.1
    • /
    • pp.157-167
    • /
    • 2020
  • The big data and artificial intelligence technology, which has provided the foundation for the recent 4th industrial revolution, has become a major driving force in business innovation across industries. In the field of information security, we are trying to develop and improve an intelligent security system by applying these techniques to large-scale log data, which has been difficult to find effective utilization methods before. The quality of security log big data, which is the basis of information security AI learning, is an important input factor that determines the performance of intelligent security system. However, the difference and complexity of log data by various product has a problem that requires excessive time and effort in preprocessing big data with poor data quality. In this study, we research and analyze the cases related to log data collection of various firewall. By proposing firewall log data collection format standard, we hope to contribute to the development of intelligent security systems based on security log big data.

An Area-efficient Design of SHA-256 Hash Processor for IoT Security (IoT 보안을 위한 SHA-256 해시 프로세서의 면적 효율적인 설계)

  • Lee, Sang-Hyun;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.1
    • /
    • pp.109-116
    • /
    • 2018
  • This paper describes an area-efficient design of SHA-256 hash function that is widely used in various security protocols including digital signature, authentication code, key generation. The SHA-256 hash processor includes a padder block for padding and parsing input message, so that it can operate without software for preprocessing. Round function was designed with a 16-bit data-path that processed 64 round computations in 128 clock cycles, resulting in an optimized area per throughput (APT) performance as well as small area implementation. The SHA-256 hash processor was verified by FPGA implementation using Virtex5 device, and it was estimated that the throughput was 337 Mbps at maximum clock frequency of 116 MHz. The synthesis for ASIC implementation using a $0.18-{\mu}m$ CMOS cell library shows that it has 13,251 gate equivalents (GEs) and it can operate up to 200 MHz clock frequency.

Face detection using fuzzy color classifier and convex-hull (Fuzzy Color Classifier 와 Convex-hull을 사용한 얼굴 검출)

  • Park, Min-Sik;Park, Chang-U;Kim, Won-Ha;Park, Min-Yong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.39 no.2
    • /
    • pp.69-78
    • /
    • 2002
  • This paper addresses a method to automatically detect out a person's face from a given image that consists of a hair and face view of the person and a complex background scene. Out method involves an effective detection algorithm that exploits the spatial distribution characteristics of human skin color via an adaptive fuzzy color classifier (AFCC), The universal skin-color map is derived on the chrominance component of human skin color in Cb, Cr and their corresponding luminance. The desired fuzzy system is applied to decide the skin color regions and those that are not. We use RGB model for extracting the hair color regions because the hair regions often show low brightness and chromaticity estimation of low brightness color is not stable. After some preprocessing, we apply convex-hull to each region. Consequent face detection is made from the relationship between a face's convex-hull and a head's convex-hull. The algorithm using the convex-hull shows better performance than the algorithm using pattern method. The performance of the proposed algorithm is shown by experiment. Experimental results show that the proposed algorithm successfully and efficiently detects the faces without constrained input conditions in color images.

Efficient Real-time Lane Detection Algorithm Using V-ROI (V-ROI를 이용한 고효율 실시간 차선 인식 알고리즘)

  • Dajun, Ding;Lee, Chanho
    • Journal of IKEEE
    • /
    • v.16 no.4
    • /
    • pp.349-355
    • /
    • 2012
  • Information technology improves convenience, safety, and performance of automobiles. Recently, a lot of algorithms are studied to provide safety and environment information for driving, and lane detection algorithm is one of them. In this paper, we propose a lane detection algorithm that reduces the amount of calculation by reducing region of interest (ROI) after preprocessing. The proposed algorithm reduces the area of ROI a lot by determining the candidate regions near lane boundaries as V-ROI so that the amount of calculation is reduced. In addition, the amount of calculation can be maintained almost the same regardless of the resolutions of the input images by compressing the images since the lane detection algorithm does not require high resolution. The proposed algorithm is implemented using C++ and OpenCV library and is verified to work at 30 fps for realtime operation.

A Sliding Window-based Multivariate Stream Data Classification (슬라이딩 윈도우 기반 다변량 스트림 데이타 분류 기법)

  • Seo, Sung-Bo;Kang, Jae-Woo;Nam, Kwang-Woo;Ryu, Keun-Ho
    • Journal of KIISE:Databases
    • /
    • v.33 no.2
    • /
    • pp.163-174
    • /
    • 2006
  • In distributed wireless sensor network, it is difficult to transmit and analyze the entire stream data depending on limited networks, power and processor. Therefore it is suitable to use alternative stream data processing after classifying the continuous stream data. We propose a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes input as a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a standard text classification algorithm to classify the discretized data in the window. We evaluated both supervised and unsupervised classification algorithms. For supervised, we tested Bayesian classifier and SVM, and for unsupervised, we tested Jaccard, TFIDF Jaro and Jaro Winkler. In our experiments, SVM and TFIDF outperformed other classification methods. In particular, we observed that classification accuracy is improved when the correlation of attributes is also considered along with the n-gram tokens of symbols.

The Verification of Image Merging for Lumber Scanning System (제재목 화상입력시스템의 화상병합 성능 검증)

  • Kim, Byung Nam;Kim, Kwang Mo;Shim, Kug-Bo;Lee, Hyoung Woo;Shim, Sang-Ro
    • Journal of the Korean Wood Science and Technology
    • /
    • v.37 no.6
    • /
    • pp.556-565
    • /
    • 2009
  • Automated visual grading system of lumber needs correct input image. In order to create a correct image of domestic red pine lumber 3.6 m long feeding on a conveyer, part images were captured using area sensor and template matching algorithm was applied to merge part images. Two kinds of template matching algorithms and six kinds of template sizes were adopted in this operation. Feature extracted method appeared to have more excellent image merging performance than fixed template method. Error length was attributed to a decline of similarity related by difference of partial brightness on a part image, specific pattern and template size. The mismatch part was repetitively generated at the long grain. The best size of template for image merging was $100{\times}100$ pixels. In a further study, assignment of exact template size, preprocessing of image merging for reduction of brightness difference will be needed to improve image merging.

Design of Optimized Radial Basis Function Neural Networks Classifier with the Aid of Principal Component Analysis and Linear Discriminant Analysis (주성분 분석법과 선형판별 분석법을 이용한 최적화된 방사형 기저 함수 신경회로망 분류기의 설계)

  • Kim, Wook-Dong;Oh, Sung-Kwun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.6
    • /
    • pp.735-740
    • /
    • 2012
  • In this paper, we introduce design methodologies of polynomial radial basis function neural network classifier with the aid of Principal Component Analysis(PCA) and Linear Discriminant Analysis(LDA). By minimizing the information loss of given data, Feature data is obtained through preprocessing of PCA and LDA and then this data is used as input data of RBFNNs. The hidden layer of RBFNNs is built up by Fuzzy C-Mean(FCM) clustering algorithm instead of receptive fields and linear polynomial function is used as connection weights between hidden and output layer. In order to design optimized classifier, the structural and parametric values such as the number of eigenvectors of PCA and LDA, and fuzzification coefficient of FCM algorithm are optimized by Artificial Bee Colony(ABC) optimization algorithm. The proposed classifier is applied to some machine learning datasets and its result is compared with some other classifiers.