• Title/Summary/Keyword: Data Pre-processing

Search Result 810, Processing Time 0.032 seconds

An Outlier Detection Using Autoencoder for Ocean Observation Data (해양 이상 자료 탐지를 위한 오토인코더 활용 기법 최적화 연구)

  • Kim, Hyeon-Jae;Kim, Dong-Hoon;Lim, Chaewook;Shin, Yongtak;Lee, Sang-Chul;Choi, Youngjin;Woo, Seung-Buhm
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.33 no.6
    • /
    • pp.265-274
    • /
    • 2021
  • Outlier detection research in ocean data has traditionally been performed using statistical and distance-based machine learning algorithms. Recently, AI-based methods have received a lot of attention and so-called supervised learning methods that require classification information for data are mainly used. This supervised learning method requires a lot of time and costs because classification information (label) must be manually designated for all data required for learning. In this study, an autoencoder based on unsupervised learning was applied as an outlier detection to overcome this problem. For the experiment, two experiments were designed: one is univariate learning, in which only SST data was used among the observation data of Deokjeok Island and the other is multivariate learning, in which SST, air temperature, wind direction, wind speed, air pressure, and humidity were used. Period of data is 25 years from 1996 to 2020, and a pre-processing considering the characteristics of ocean data was applied to the data. An outlier detection of actual SST data was tried with a learned univariate and multivariate autoencoder. We tried to detect outliers in real SST data using trained univariate and multivariate autoencoders. To compare model performance, various outlier detection methods were applied to synthetic data with artificially inserted errors. As a result of quantitatively evaluating the performance of these methods, the multivariate/univariate accuracy was about 96%/91%, respectively, indicating that the multivariate autoencoder had better outlier detection performance. Outlier detection using an unsupervised learning-based autoencoder is expected to be used in various ways in that it can reduce subjective classification errors and cost and time required for data labeling.

Development of Market Growth Pattern Map Based on Growth Model and Self-organizing Map Algorithm: Focusing on ICT products (자기조직화 지도를 활용한 성장모형 기반의 시장 성장패턴 지도 구축: ICT제품을 중심으로)

  • Park, Do-Hyung;Chung, Jaekwon;Chung, Yeo Jin;Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.1-23
    • /
    • 2014
  • Market forecasting aims to estimate the sales volume of a product or service that is sold to consumers for a specific selling period. From the perspective of the enterprise, accurate market forecasting assists in determining the timing of new product introduction, product design, and establishing production plans and marketing strategies that enable a more efficient decision-making process. Moreover, accurate market forecasting enables governments to efficiently establish a national budget organization. This study aims to generate a market growth curve for ICT (information and communication technology) goods using past time series data; categorize products showing similar growth patterns; understand markets in the industry; and forecast the future outlook of such products. This study suggests the useful and meaningful process (or methodology) to identify the market growth pattern with quantitative growth model and data mining algorithm. The study employs the following methodology. At the first stage, past time series data are collected based on the target products or services of categorized industry. The data, such as the volume of sales and domestic consumption for a specific product or service, are collected from the relevant government ministry, the National Statistical Office, and other relevant government organizations. For collected data that may not be analyzed due to the lack of past data and the alteration of code names, data pre-processing work should be performed. At the second stage of this process, an optimal model for market forecasting should be selected. This model can be varied on the basis of the characteristics of each categorized industry. As this study is focused on the ICT industry, which has more frequent new technology appearances resulting in changes of the market structure, Logistic model, Gompertz model, and Bass model are selected. A hybrid model that combines different models can also be considered. The hybrid model considered for use in this study analyzes the size of the market potential through the Logistic and Gompertz models, and then the figures are used for the Bass model. The third stage of this process is to evaluate which model most accurately explains the data. In order to do this, the parameter should be estimated on the basis of the collected past time series data to generate the models' predictive value and calculate the root-mean squared error (RMSE). The model that shows the lowest average RMSE value for every product type is considered as the best model. At the fourth stage of this process, based on the estimated parameter value generated by the best model, a market growth pattern map is constructed with self-organizing map algorithm. A self-organizing map is learning with market pattern parameters for all products or services as input data, and the products or services are organized into an $N{\times}N$ map. The number of clusters increase from 2 to M, depending on the characteristics of the nodes on the map. The clusters are divided into zones, and the clusters with the ability to provide the most meaningful explanation are selected. Based on the final selection of clusters, the boundaries between the nodes are selected and, ultimately, the market growth pattern map is completed. The last step is to determine the final characteristics of the clusters as well as the market growth curve. The average of the market growth pattern parameters in the clusters is taken to be a representative figure. Using this figure, a growth curve is drawn for each cluster, and their characteristics are analyzed. Also, taking into consideration the product types in each cluster, their characteristics can be qualitatively generated. We expect that the process and system that this paper suggests can be used as a tool for forecasting demand in the ICT and other industries.

Evaluation of Benzoic Acid Level of Fermented Dairy Products during Fermentation (발효과정에서 생성되는 발효유제품의 안식향산 함량 수준 평가)

  • Lim, Sang-Dong;Park, Mi-Sun;Kim, Kee-Sung;Yoo, Mi-Young
    • Food Science of Animal Resources
    • /
    • v.33 no.5
    • /
    • pp.640-645
    • /
    • 2013
  • The purpose of this study was to utilize the results as a basic data of benzoic acids in animal products that didn't mention in the quality standard of National Veterinary Research and Quarantine Service (NVRQS) to solve the conflict of international trade and administration. Set-Pak method listed in the quality standard of NVRQS, faster than auto distillation methods with same recovery selected as a pre treatment for the determination of benzoic acid. The regression curve of benzoic acid with Sep-Pak method was linear with the $R^2$ value of 0.999 and the limit of detection (LOD) and limit of quantitation (LOQ) was 0.058 mg/kg and 0.176 mg/kg, respectively. The benzoic acid in the fermented milk was detected after the fermentation stage by addition of starter culture with the level of 2.28~10.48 mg/kg and 0~16.5 mg/kg in the commercial fermented milk products without detection by the addition of syrup. In case of cheese products, the benzoic acids level was influenced by the curd formation (Camembert cheese) and the quality of natural cheese (processed cheese), by the way, the benzoic acid level of commercial natural cheese was 0~4.2 mg/kg, processed cheese was 0~20.8 mg/kg, respectively. Based on this result, it may be possible to utilize as a basic data for the systematic control the level of natural benzoic acids in raw material, processing and final products of animal origin.

Key Bit-dependent Attack on Side-Channel Analysis-Resistant Hardware Binary Scalar Multiplication Algorithm using a Single-Trace (부채널 분석에 안전한 하드웨어 이진 스칼라 곱셈 알고리즘에 대한 단일 파형 비밀 키 비트 종속 공격)

  • Sim, Bo-Yeon;Kang, Junki;Han, Dong-Guk
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.5
    • /
    • pp.1079-1087
    • /
    • 2018
  • Binary scalar multiplication which is the main operation of elliptic curve cryptography is vulnerable to the side-channel analysis. Especially, it is vulnerable to the side-channel analysis which uses power consumption and electromagnetic emission patterns. Thus, various countermeasures have been studied. However, they have focused on eliminating patterns of data dependent branches, statistical characteristic according to intermediate values, or the interrelationships between data. No countermeasure have been taken into account for the secure design of the key bit check phase, although the secret scalar bits are directly loaded during that phase. Therefore, in this paper, we demonstrate that we can extract secret scalar bits with 100% success rate using a single power or a single electromagnetic trace by performing key bit-dependent attack on hardware implementation of binary scalar multiplication algorithm. Experiments are focused on the $Montgomery-L{\acute{o}}pez-Dahab$ ladder algorithm protected by scalar randomization. Our attack does not require sophisticated pre-processing and can defeat existing countermeasures using a single-trace. As a result, we propose a countermeasure and suggest that it should be applied.

Development of an Informetric Analysis System KnowledgeMatrix (계량정보분석시스템 KnowledgeMatrix 개발)

  • Lee, Bangrae;Yeo, Woon Dong;Lee, June Young;Lee, Chang-Hoan;Kwon, Oh-Jin;Moon, Yeong-ho
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.167-171
    • /
    • 2007
  • Application areas of Knowledge Discovery in Database (KDD) have been expanded into many R&D management processes including technology trends analysis, forecasting and evaluation etc. Established research field such as informetrics (or scientometrics) has recently fully utilized techniques or methods of KDD. Various systems have been developed to support works of analyzing large-scale R&D related databases such as patent DB or bibliographic DB by a few researchers or institutions. But extant systems have some problems for korean users to use. Their prices is not cheap, korean language process not available, and user's demands not reflected. To solve these problems, Korea Institute of Science and Technology Information (KISTI) developed stand-alone type information analysis system named as KnowledgeMatrix. KnowledgeMatrix system offer various functions to analyze retrieved data set from databases. Knowledge Matrix main operation unit is composed of user-defined lists and matrix generation, cluster analysis, visualization, data pre-processing. KnowledgeMatrix show better performances and offer more various functions than extant systems.

  • PDF

Design and Implementation of ASTERIX Parsing Module Based on Pattern Matching for Air Traffic Control Display System (항공관제용 현시시스템을 위한 패턴매칭 기반의 ASTERIX 파싱 모듈 설계 및 구현)

  • Kim, Kanghee;Kim, Hojoong;Yin, Run Dong;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.3
    • /
    • pp.89-101
    • /
    • 2014
  • Recently, as domestic air traffic dramatically increases, the need of ATC(air traffic control) systems has grown for safe and efficient ATM(air traffic management). Especially, for smooth ATC, it is far more important that performance of display system which should show all air traffic situation in FIR(Flight Information Region) without additional latency is guaranteed. In this paper, we design a ASTERIX(All purpose STructured Eurocontrol suRveillance Information eXchange) parsing module to promote stable ATC by minimizing system loads, which is connected with reducing overheads arisen when we parse ASTERIX message. Our ASTERIX parsing module based on pattern matching creates patterns by analyzing received ASTERIX data, and handles following received ASTERIX data using pre-defined procedure through patterns. This module minimizes display errors by rapidly extracting only necessary information for display different from existing parsing module containing unnecessary parsing procedure. Therefore, this designed module is to enable controllers to operate stable ATC. The comparison with existing general bit level ASTERIX parsing module shows that ASTERIX parsing module based on pattern matching has shorter processing delay, higher throughput, and lower CPU usage.

Implementation of High-Throughput SHA-1 Hash Algorithm using Multiple Unfolding Technique (다중 언폴딩 기법을 이용한 SHA-1 해쉬 알고리즘 고속 구현)

  • Lee, Eun-Hee;Lee, Je-Hoon;Jang, Young-Jo;Cho, Kyoung-Rok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.4
    • /
    • pp.41-49
    • /
    • 2010
  • This paper proposes a new high speed SHA-1 architecture using multiple unfolding and pre-computation techniques. We unfolds iterative hash operations to 2 continuos hash stage and reschedules computation timing. Then, the part of critical path is computed at the previous hash operation round and the rest is performed in the present round. These techniques reduce 3 additions to 2 additions on the critical path. It makes the maximum clock frequency of 118 MHz which provides throughput rate of 5.9 Gbps. The proposed architecture shows 26% higher throughput with a 32% smaller hardware size compared to other counterparts. This paper also introduces a analytical model of multiple SHA-1 architecture at the system level that maps a large input data on SHA-1 block in parallel. The model gives us the required number of SHA-1 blocks for a large multimedia data processing that it helps to make decision hardware configuration. The hs fospeed SHA-1 is useful to generate a condensed message and may strengthen the security of mobile communication and internet service.

A Basic Study on the Differential Diagnostic System of Laryngeal Diseases using Hierarchical Neural Networks (다단계 신경회로망을 이용한 후두질환 감별진단 시스템의 개발)

  • 전계록;김기련;권순복;예수영;이승진;왕수건
    • Journal of Biomedical Engineering Research
    • /
    • v.23 no.3
    • /
    • pp.197-205
    • /
    • 2002
  • The objectives of this Paper is to implement a diagnostic classifier of differential laryngeal diseases from acoustic signals acquired in a noisy room. For this Purpose, the voice signals of the vowel /a/ were collected from Patients in a soundproof chamber and got mixed with noise. Then, the acoustic Parameters were analyzed, and hierarchical neural networks were applied to the data classification. The classifier had a structure of five-step hierarchical neural networks. The first neural network classified the group into normal and benign or malign laryngeal disease cases. The second network classified the group into normal or benign laryngeal disease cases The following network distinguished polyp. nodule. Palsy from the benign laryngeal cases. Glottic cancer cases were discriminated into T1, T2. T3, T4 by the fourth and fifth networks All the neural networks were based on multilayer perceptron model which classified non-linear Patterns effectively and learned by an error back-propagation algorithm. We chose some acoustic Parameters for classification by investigating the distribution of laryngeal diseases and Pilot classification results of those Parameters derived from MDVP. The classifier was tested by using the chosen parameters to find the optimum ones. Then the networks were improved by including such Pre-Processing steps as linear and z-score transformation. Results showed that 90% of T1, 100% of T2-4 were correctly distinguished. On the other hand. 88.23% of vocal Polyps, 100% of normal cases. vocal nodules. and vocal cord Paralysis were classified from the data collected in a noisy room.

Park Golf Participation of Physically Disabled Impact on Psychological Well-being and Subjective Happiness (파크골프 참여가 지체장애인의 심리적 웰빙과 주관적 행복감에 미치는 영향)

  • Kim, Dong Won
    • 재활복지
    • /
    • v.18 no.4
    • /
    • pp.187-205
    • /
    • 2014
  • Is to identify how this affects the physically disabled to participate in the program 12 weeks Park Golf psychological well-being and happiness, the purpose of this research is subjective. How to study subjects, only 40-year-old disabled man more than 24 people total delay experimental group and 12 patients(failure cut seven, delayed dysfunction 5) and the control group and 12 patients(failure cut six, delayed dysfunction in 4, two people were involved in the joint disorder). 3 times a week(Mon, Wed, Fri), was carried out 50 minutes into 12 weeks of the experimental period, was located at River Park Golf Course A test place. We calculate the pre-and post-test data mean and standard deviation using SPSS Statistics 21.0 statistical data processing program, binary repeated measures ANOVA to analyze the effects on the psychological well-being of the disabled and subjective effects euphoria Park Golf Participation(was performed 2-way [2] RM ANOVA). First results in psychological well-being of the two groups according to Park Golf participate in group comparisons before and after the exercise involved only fun, immersive and shows were not significantly different, within each group enjoyment, competence, self-realization, all the children of the immersion showed a significant difference in the factors. Second, before and after participation in exercise, there was a significant difference between groups in subjective happiness of two groups according to Park Golf participation, the two groups were not significantly different within. Taken together the results to see more, showed that the positive effects on the psychological well-being and subjective happiness Park Golf participation is the Physically Disabled.

Convergence Study on Effects of Underwater Rehabilitation Exercise on Physical Fitness and Blood Lipids in Middle Aged Women (중년여성의 수중재활운동이 신체적성과 혈중지질에 미치는 융합연구)

  • Beak, Soon-Gi;Kim, Do-Jin
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.8
    • /
    • pp.260-267
    • /
    • 2019
  • The purpose of this study is to find out how underwater rehabilitation exercises affect physical fitness and blood lipids for 10 weeks and provide basic data to help prevent middle-aged women from cardiovascular diseases. The subjects of this study were middle-aged women living in Seoul, Korea. The underwater rehabilitation exercise was performed for 1 week and 3 times for 10 weeks, and the exercise time was 60 minutes for 1 time including the warm up, the main exercise and the cool down. The exercise intensity was set at 60-70% of the heart rate reserve calculated from the pre-exercise test. The measurement variables were physical fitness and blood lipid. In the data processing, descriptive statistics were presented for each measurement item and a 2-way RGRM ANOVA was conducted to examine the interaction effects between groups. The results have shown significant interaction effects in physical fitness(Flexibility, Cardiorespiratory Endurance, Muscular Endurance) and the blood lipids(TG, TC, HLD-C, LDL-C). This study found that the 10-week underwater rehabilitation exercise program of middle-aged women increased physical fitness level and decreased and increased blood lipid, which could be an effective and convergent program to prevent and reduce cardiovascular disease.