• Title/Summary/Keyword: Preprocessing Process

Search Result 422, Processing Time 0.022 seconds

Development and Application of a Big Data Platform for Education Longitudinal Study Analysis (교육종단연구 분석을 위한 빅데이터 플랫폼 개발 및 적용)

  • Park, Jung;Cho, Wan-Sup
    • The Journal of Bigdata
    • /
    • v.5 no.1
    • /
    • pp.11-27
    • /
    • 2020
  • In this paper, we developed a big data platform to store, process, and analyze effectively on such education longitudinal study data. And it was applied to the Seoul Education Longitudinal Study(SELS) to confirm its usefulness. The developed platform consists of data preprocessing unit and data analysis unit. The data preprocessing unit 1) masking, 2) converts each item into a factor 3) normalizes / creates dummy variables 4) data derivation, and 5) data warehousing. The data analysis unit consists of OLAP and data mining(DM). In the multidimensional analysis, OLAP is performed after selecting a measure and designing a schema. The DM process involves variable selection, research model selection, data modification, parameter tuning, model training, model evaluation, and interpretation of the results. The data warehouse created through the preprocessing process on this platform can be shared by various researchers, and the continuous accumulation of data sets makes further analysis easier for subsequent researchers. In addition, policy-makers can access the SELS data warehouse directly and analyze it online through multi-dimensional analysis, enabling scientific decision making. To prove the usefulness of the developed platform, SELS data was built on the platform and OLAP and DM were performed by selecting the mathematics academic achievement as a measure, and various factors affecting the measurements were analyzed using DM techniques. This enabled us to quickly and effectively derive implications for data-based education policies.

A New Algorithm of Reducing Candidate Haplotypes for Haplotype Inference (일배체형 추론을 위한 후보군 간소화 알고리즘)

  • Choi, Mun-Ho;Kang, Seung-Ho;Lim, Hyeong-Seok
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.7
    • /
    • pp.1732-1739
    • /
    • 2013
  • The identification of haplotypes, which encode SNPs in a single chromosome, makes it possible to perform a haplotype-based association test with diseases. Given a set of genotypes from a population, the process of recovering the haplotypes that explain the genotypes is called haplotype inference. We propose a new preprocessing algorithm for the haplotype inference by pure parsimony (HIPP). The proposed algorithm excludes a large amount of redundant candidate haplotypes by detecting some groups of haplotypes that are dispensable for optimal solutions. For the well-known synthetic and biological data, the experimental results of our method show that our method run much faster than other preprocessing methods. After applying our preprocessing results, the numbers of haplotypes of HIPP solvers are equal to or slightly larger than that of optimal solutions.

Energy-Aware Data-Preprocessing Scheme for Efficient Audio Deep Learning in Solar-Powered IoT Edge Computing Environments (태양 에너지 수집형 IoT 엣지 컴퓨팅 환경에서 효율적인 오디오 딥러닝을 위한 에너지 적응형 데이터 전처리 기법)

  • Yeontae Yoo;Dong Kun Noh
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.18 no.4
    • /
    • pp.159-164
    • /
    • 2023
  • Solar energy harvesting IoT devices prioritize maximizing the utilization of collected energy due to the periodic recharging nature of solar energy, rather than minimizing energy consumption. Meanwhile, research on edge AI, which performs machine learning near the data source instead of the cloud, is actively conducted for reasons such as data confidentiality and privacy, response time, and cost. One such research area involves performing various audio AI applications using audio data collected from multiple IoT devices in an IoT edge computing environment. However, in most studies, IoT devices only perform sensing data transmission to the edge server, and all processes, including data preprocessing, are performed on the edge server. In this case, it not only leads to overload issues on the edge server but also causes network congestion by transmitting unnecessary data for learning. On the other way, if data preprocessing is delegated to each IoT device to address this issue, it leads to another problem of increased blackout time due to energy shortages in the devices. In this paper, we aim to alleviate the problem of increased blackout time in devices while mitigating issues in server-centric edge AI environments by determining where the data preprocessed based on the energy state of each IoT device. In the proposed method, IoT devices only perform the preprocessing process, which includes sound discrimination and noise removal, and transmit to the server if there is more energy available than the energy threshold required for the basic operation of the device.

AutoFe-Sel: A Meta-learning based methodology for Recommending Feature Subset Selection Algorithms

  • Irfan Khan;Xianchao Zhang;Ramesh Kumar Ayyasam;Rahman Ali
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.7
    • /
    • pp.1773-1793
    • /
    • 2023
  • Automated machine learning, often referred to as "AutoML," is the process of automating the time-consuming and iterative procedures that are associated with the building of machine learning models. There have been significant contributions in this area across a number of different stages of accomplishing a data-mining task, including model selection, hyper-parameter optimization, and preprocessing method selection. Among them, preprocessing method selection is a relatively new and fast growing research area. The current work is focused on the recommendation of preprocessing methods, i.e., feature subset selection (FSS) algorithms. One limitation in the existing studies regarding FSS algorithm recommendation is the use of a single learner for meta-modeling, which restricts its capabilities in the metamodeling. Moreover, the meta-modeling in the existing studies is typically based on a single group of data characterization measures (DCMs). Nonetheless, there are a number of complementary DCM groups, and their combination will allow them to leverage their diversity, resulting in improved meta-modeling. This study aims to address these limitations by proposing an architecture for preprocess method selection that uses ensemble learning for meta-modeling, namely AutoFE-Sel. To evaluate the proposed method, we performed an extensive experimental evaluation involving 8 FSS algorithms, 3 groups of DCMs, and 125 datasets. Results show that the proposed method achieves better performance compared to three baseline methods. The proposed architecture can also be easily extended to other preprocessing method selections, e.g., noise-filter selection and imbalance handling method selection.

A Predictive Bearing Anomaly Detection Model Using the SWT-SVD Preprocessing Algorithm (SWT-SVD 전처리 알고리즘을 적용한 예측적 베어링 이상탐지 모델)

  • So-hyang Bak;Kwanghoon Pio Kim
    • Journal of Internet Computing and Services
    • /
    • v.25 no.1
    • /
    • pp.109-121
    • /
    • 2024
  • In various manufacturing processes such as textiles and automobiles, when equipment breaks down or stops, the machines do not work, which leads to time and financial losses for the company. Therefore, it is important to detect equipment abnormalities in advance so that equipment failures can be predicted and repaired before they occur. Most equipment failures are caused by bearing failures, which are essential parts of equipment, and detection bearing anomaly is the essence of PHM(Prognostics and Health Management) research. In this paper, we propose a preprocessing algorithm called SWT-SVD, which analyzes vibration signals from bearings and apply it to an anomaly transformer, one of the time series anomaly detection model networks, to implement bearing anomaly detection model. Vibration signals from the bearing manufacturing process contain noise due to the real-time generation of sensor values. To reduce noise in vibration signals, we use the Stationary Wavelet Transform to extract frequency components and perform preprocessing to extract meaningful features through the Singular Value Decomposition algorithm. For experimental validation of the proposed SWT-SVD preprocessing method in the bearing anomaly detection model, we utilize the PHM-2012-Challenge dataset provided by the IEEE PHM Conference. The experimental results demonstrate significant performance with an accuracy of 0.98 and an F1-Score of 0.97. Additionally, to substantiate performance improvement, we conduct a comparative analysis with previous studies, confirming that the proposed preprocessing method outperforms previous preprocessing methods in terms of performance.

Automatic Tagging and Tag Recommendation Techniques Using Tag Ontology (태그 온톨로지를 이용한 자동 태깅 및 태그 추천 기법)

  • Kim, Jae-Seung;Mun, Hyeon-Jeong;Woo, Tae-Yong
    • The Journal of Society for e-Business Studies
    • /
    • v.14 no.4
    • /
    • pp.167-179
    • /
    • 2009
  • This paper introduces techniques to recommend standardized tags using tag ontology. Tag recommendation consists of TWCIDF and TWCITC; the former technique automatically tags a large quantity of already existing document groups, and the latter recommends tagging for new documents. Tag groups are created through several processes, including preprocessing, standardization using tag ontology, automatic tagging and defining ranks for recommendation. In the preprocessing process, in order to search semantic compound nouns, words are combined to establish basic word groups. In the standardization process, typographical errors and similar words are processed. As a result of experiments conducted on the basis of techniques presented in this paper, it is proved that real-time automatic tagging and tag recommendation is possible while guaranteeing the accuracy of tag recommendation.

  • PDF

Method of Generating Shape Feature Vector Using Infrared Video for Night Pedestrian Recognition (야간 보행자인식을 위한 적외선 동영상의 형상특징벡터 생성기법)

  • Song, Byeong Tak;Kim, Tai Suk
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.7
    • /
    • pp.755-763
    • /
    • 2018
  • In this paper, for recognize a night pedestrian from an infrared video, a new method differentiated from the existing feature vector is proposed and experimented. The new approach focuses on the shape feature vector of the structure and shape of the pedestrian image divided by the human body seven split ratio. The pedestrian images are divided into 7 square blocks from the still image of the preprocessing process. And to reduce the dimension, the square block is converted into a mosaic block. The scalar and direction of the shape feature vector is calculated by the brightness and position of the element in the mosaic. For practicality of infrared video system, the proposed method simplifies the data to be processed by reducing the amount of data in the preprocessing in order to continuously batch process the entire system in real time. Through the experiments, we verified the validity of the proposed shape feature vector. In comparison to the existing method, we propose a new shape feature vector generation method as the feature vector for night pedestrian recognition.

The Design of Multi-FNN Model Using HCM Clustering and Genetic Algorithms and Its Applications to Nonlinear Process (HCM 클러스터링과 유전자 알고리즘을 이용한 다중 FNN 모델 설계와 비선형 공정으로의 응용)

  • 박호성;오성권;김현기
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2000.05a
    • /
    • pp.47-50
    • /
    • 2000
  • In this paper, an optimal identification method using Multi-FNN(Fuzzy-Neural Network) is proposed for model ins of nonlinear complex system. In order to control of nonlinear process with complexity and uncertainty of data, proposed model use a HCM clustering algorithm which carry out the input-output data preprocessing function and Genetic Algorithm which carry out optimization of model. The proposed Multi-FNN is based on Yamakawa's FNN and it uses simplified inference as fuzzy inference method and Error Back Propagation Algorithm as learning rules. HCM clustering method which carry out the data preprocessing function for system modeling, is utilized to determine the structure of Multi-FNN by means of the divisions of input-output space. Also, the parameters of Multi-FNN model such as apexes of membership function, learning rates and momentum coefficients are adjusted using genetic algorithms. Also, a performance index with a weighting factor is presented to achieve a sound balance between approximation and generalization abilities of the model, To evaluate the performance of the proposed model, we use the time series data for gas furnace and the numerical data of nonlinear function.

  • PDF

Recognition of Car Manufacturers using Faster R-CNN and Perspective Transformation

  • Ansari, Israfil;Lee, Yeunghak;Jeong, Yunju;Shim, Jaechang
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.8
    • /
    • pp.888-896
    • /
    • 2018
  • In this paper, we report detection and recognition of vehicle logo from images captured from street CCTV. Image data includes both the front and rear view of the vehicles. The proposed method is a two-step process which combines image preprocessing and faster region-based convolutional neural network (R-CNN) for logo recognition. Without preprocessing, faster R-CNN accuracy is high only if the image quality is good. The proposed system is focusing on street CCTV camera where image quality is different from a front facing camera. Using perspective transformation the top view images are transformed into front view images. In this system, the detection and accuracy are much higher as compared to the existing algorithm. As a result of the experiment, on day data the detection and recognition rate is improved by 2% and night data, detection rate improved by 14%.

A Novel Preprocessing Algorithm for Fingerprint

  • Nam, Jin-Moon
    • Journal of information and communication convergence engineering
    • /
    • v.7 no.4
    • /
    • pp.442-448
    • /
    • 2009
  • This paper proposes a fingerprint image processing algorithm to accurately extract minutiae in the process of fingerprint recognition. We improved the matching accuracy of low quality fingerprint images by using effective ridge vector and ridge probability. The proposed algorithm improves the clarity of ridge structures and reduces undesired noise. We collected thumb print images from 10 individuals 5 separate times each, in total using 50 thumbprints. We registered one of the five thumbprint images from each individual to match the registered one with the other four thumbprint images, and alternated the registered thumbprint image. We matched thumbprints 20 times for each individual. In total, we conducted 200 matches for the thumbprints from the 10 individuals. We improved the verification accuracy and reliability compared to conventional methods.