• Title/Summary/Keyword: Statistical Pattern Recognition

Search Result 149, Processing Time 0.025 seconds

The extension of the largest generalized-eigenvalue based distance metric Dij1) in arbitrary feature spaces to classify composite data points

  • Daoud, Mosaab
    • Genomics & Informatics
    • /
    • v.17 no.4
    • /
    • pp.39.1-39.20
    • /
    • 2019
  • Analyzing patterns in data points embedded in linear and non-linear feature spaces is considered as one of the common research problems among different research areas, for example: data mining, machine learning, pattern recognition, and multivariate analysis. In this paper, data points are heterogeneous sets of biosequences (composite data points). A composite data point is a set of ordinary data points (e.g., set of feature vectors). We theoretically extend the derivation of the largest generalized eigenvalue-based distance metric Dij1) in any linear and non-linear feature spaces. We prove that Dij1) is a metric under any linear and non-linear feature transformation function. We show the sufficiency and efficiency of using the decision rule $\bar{{\delta}}_{{\Xi}i}$(i.e., mean of Dij1)) in classification of heterogeneous sets of biosequences compared with the decision rules min𝚵iand median𝚵i. We analyze the impact of linear and non-linear transformation functions on classifying/clustering collections of heterogeneous sets of biosequences. The impact of the length of a sequence in a heterogeneous sequence-set generated by simulation on the classification and clustering results in linear and non-linear feature spaces is empirically shown in this paper. We propose a new concept: the limiting dispersion map of the existing clusters in heterogeneous sets of biosequences embedded in linear and nonlinear feature spaces, which is based on the limiting distribution of nucleotide compositions estimated from real data sets. Finally, the empirical conclusions and the scientific evidences are deduced from the experiments to support the theoretical side stated in this paper.

An Effective Steel Plate Detection Using Eigenvalue Analysis (고유값 분석을 이용한 효과적인 후판 인식)

  • Park, Sang-Hyun
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.7 no.5
    • /
    • pp.1033-1039
    • /
    • 2012
  • In this paper, a simple and robust algorithm is proposed for detecting each steel plate from a image which contains several steel plates. Steel plate is characterized by line edge, so line detection is a fundamental task for analyzing and understanding of steel plate images. To detect the line edge, the proposed algorithm uses the small eigenvalue analysis. The proposed approach scans an input edge image from the top left corner to the bottom right corner with a moving mask. A covariance matrix of a set of edge pixels over a connected region within the mask is determined and then the statistical and geometrical properties of the small eigenvalue of the matrix are explored for the purpose of straight line detection. Using the detected line edges, each plate is determined based on the directional information and the distance information of the line edges. The results of the experiments emphasize that the proposed algorithm detects each steel plate from a image effectively.

Effective Line Detection of Steel Plates Using Eigenvalue Analysis (고유값 분석을 이용한 효과적인 후판의 직선 검출)

  • Park, Sang-Hyun;Kim, Jong-Ho;Kang, Eui-Sung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.7
    • /
    • pp.1479-1486
    • /
    • 2011
  • In this paper, a simple and robust algorithm is proposed for detecting straight line segments in a steel plate image. Line detection from a steel plate image is a fundamental task for analyzing and understanding of the image. The proposed algorithm is based on small eigenvalue analysis. The proposed approach scans an input edge image from the top left comer to the bottom right comer with a moving mask. A covariance matrix of a set of edge pixels over a connected region within the mask is determined and then the statistical and geometrical properties of the small eigenvalue of the matrix are explored for the purpose of straight line detection. Before calculating the eigenvalue, each line segment is separated from the edge image where several line segments are overlapped to increase the accuracy of the line detection. Additionally, unnecessary line segments are eliminated by the number of pixels and the directional information of the detected line edges. The respects of the experiments emphasize that the proposed algorithm outperforms the existing algorithm which uses small eigenvalue analysis.

$^1H$ NMR-Based Metabolomic Approach for Understanding the Fermentation Behaviors of Wine Yeast Strains

  • Son, Hong-Seok;Hwang, Geum-Sook;Kim, Ki-Myong;Kim, Eun-Young;Berg, Frans van den;Park, Won-Mok;Lee, Cherl-Ho;Hong, Young-Shick
    • Proceedings of the Microbiological Society of Korea Conference
    • /
    • 2009.05a
    • /
    • pp.78-78
    • /
    • 2009
  • $^1H$ NMR spectroscopy coupled with multivariate statistical analysis was used for the first time to investigate metabolic changes in musts during alcoholic fermentation and wines during ageing. Three Saccharomyces cerevisiae yeast strains (RC-212, KIV-1116 and KUBY-501) were also evaluated for their impacts on the metabolic changes in must and wine. Pattern recognition (PR) methods, including PCA, PLS-DA and OPLS-DA scores plots, showed clear differences for metabolites among musts or wines for each fermentation stage up to 6 months. Metabolites responsible for the differentiation were identified to valine, 2,3-butanediol (2,3-BD), pyruvate, succinate, proline, citrate, glycerol, malate, tartarate, glucose, N-methylnicotinic acid (NMNA), and polyphenol compounds. PCA scores plots showed continuous movements away from days 1 to 8 in all musts for all yeast strains, indicating continuous and active fermentation. During alcoholic fermentation, highest levels of 2,3-BD, succinate and glycerol were found in musts with the KIV-1116 strain, which showed the fastest fermentation or highest fermentative activity of the 3 strains, whereas the KUBY-501 strain showed the slowest fermentative activity. This study highlights the applicability of NMR-based metabolomics for monitoring wine fermentation and evaluating the fermentative characteristics of yeast strains.

  • PDF

Principal Components Self-Organizing Map PC-SOM (주성분 자기조직화 지도 PC-SOM)

  • 허명회
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.2
    • /
    • pp.321-333
    • /
    • 2003
  • Self-organizing map (SOM), a unsupervised learning neural network, has been developed by T. Kohonen since 1980's. Main application areas were pattern recognition and text retrieval. Because of that, it has not been spread to statisticians until late. Recently, SOM's are frequently drawn in data mining fields. Kohonen's SOM, however, needs improvements to become a statistician's standard tool. First, there should be a good guideline as for the size of map. Second, an enhanced visualization mode is wanted. In this study, principal components self-organizing map (PC-SOM), a modification of Kohonen's SOM, is proposed to meet such needs. PC-SOM performs one-dimensional SOM during the first stage to decompose input units into node weights and residuals. At the second stage, another one-dimensional SOM is applied to the residuals of the first stage. Finally, by putting together two stages, one obtains two-dimensional SOM. Such procedure can be easily expanded to construct three or more dimensional maps. The number of grid lines along the second axis is determined automatically, once that of the first axis is given by the data analyst. Furthermore, PC-SOM provides easily interpretable map axes. Such merits of PC-SOM are demonstrated with well-known Fisher's iris data and a simulated data set.

A Study of Evaluation of the Feature from Cooccurrence Matrix and Appropriate Applicable Resolution (공기행렬의 질감특성치들에 대한 평가와 적정 적용해상도에 관한 연구)

  • Kwon, Oh-Hyoung;Kim, Yong-Il;Eo, Yang-Dam
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.8 no.1 s.15
    • /
    • pp.105-110
    • /
    • 2000
  • Since the advent of high resolution satellite image, possibilities of applying various human interpretation mechanism to these images have increased. Also many studies about these possibilities in many fields such as computer vision, pattern recognition, artificial intellegence and remote sensing have been done. In this field of these studies, texture is defined as a kind of quantity related to spatial distribution of brightness and tone and also plays an important role for interpretation of images. Especially, methods of obtaining texture by statistical model have been studied intensively. Among these methods, texture measurement method based on cooccurrence matrix is highly estimated because it is easy to calculate texture features compared with other methods. In addition, these results in high classification accuracy when this is applied to satellite images and aerial photos. But in the existing studies using cooccurrence matrix, features have been chosen arbitrarily without considering feature variation. And not enough studies have been implemented for appropriate resolution selection in which cooccurrence matrix can extract texture. Therefore, this study reviews the concept of cooccurrence matrix as a texture measurement method, evaluates usefulness of several features obtained from cooccurrence matrix, and proposes appropriate resolution by investigating variance trend of several features.

  • PDF

Multi-target Classification Method Based on Adaboost and Radial Basis Function (아이다부스트(Adaboost)와 원형기반함수를 이용한 다중표적 분류 기법)

  • Kim, Jae-Hyup;Jang, Kyung-Hyun;Lee, Jun-Haeng;Moon, Young-Shik
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.3
    • /
    • pp.22-28
    • /
    • 2010
  • Adaboost is well known for a representative learner as one of the kernel methods. Adaboost which is based on the statistical learning theory shows good generalization performance and has been applied to various pattern recognition problems. However, Adaboost is basically to deal with a two-class classification problem, so we cannot solve directly a multi-class problem with Adaboost. One-Vs-All and Pair-Wise have been applied to solve the multi-class classification problem, which is one of the multi-class problems. The two methods above are ones of the output coding methods, a general approach for solving multi-class problem with multiple binary classifiers, which decomposes a complex multi-class problem into a set of binary problems and then reconstructs the outputs of binary classifiers for each binary problem. However, two methods cannot show good performance. In this paper, we propose the method to solve a multi-target classification problem by using radial basis function of Adaboost weak classifier.

Damage Detecion of CFRP-Laminated Concrete based on a Continuous Self-Sensing Technology (셀프센싱 상시계측 기반 CFRP보강 콘크리트 구조물의 손상검색)

  • Kim, Young-Jin;Park, Seung-Hee;Jin, Kyu-Nam;Lee, Chang-Gil
    • Land and Housing Review
    • /
    • v.2 no.4
    • /
    • pp.407-413
    • /
    • 2011
  • This paper reports a novel structural health monitoring (SHM) technique for detecting de-bonding between a concrete beam and CFRP (Carbon Fiber Reinforced Polymer) sheet that is attached to the concrete surface. To achieve this, a multi-scale actuated sensing system with a self-sensing circuit using piezoelectric active sensors is applied to the CFRP laminated concrete beam structure. In this self-sensing based multi-scale actuated sensing, one scale provides a wide frequency-band structural response from the self-sensed impedance measurements and the other scale provides a specific frequency-induced structural wavelet response from the self-sensed guided wave measurement. To quantify the de-bonding levels, the supervised learning-based statistical pattern recognition was implemented by composing a two-dimensional (2D) plane using the damage indices extracted from the impedance and guided wave features.

Visualizing Unstructured Data using a Big Data Analytical Tool R Language (빅데이터 분석 도구 R 언어를 이용한 비정형 데이터 시각화)

  • Nam, Soo-Tai;Chen, Jinhui;Shin, Seong-Yoon;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.151-154
    • /
    • 2021
  • Big data analysis is the process of discovering meaningful new correlations, patterns, and trends in large volumes of data stored in data stores and creating new value. Thus, most big data analysis technology methods include data mining, machine learning, natural language processing, and pattern recognition used in existing statistical computer science. Also, using the R language, a big data tool, we can express analysis results through various visualization functions using pre-processing text data. The data used in this study was analyzed for 21 papers in the March 2021 among the journals of the Korea Institute of Information and Communication Engineering. In the final analysis results, the most frequently mentioned keyword was "Data", which ranked first 305 times. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

  • PDF

Visualizing Article Material using a Big Data Analytical Tool R Language (빅데이터 분석 도구 R 언어를 이용한 논문 데이터 시각화)

  • Nam, Soo-Tai;Shin, Seong-Yoon;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.326-327
    • /
    • 2021
  • Newly, big data utilization has been widely interested in a wide variety of industrial fields. Big data analysis is the process of discovering meaningful new correlations, patterns, and trends in large volumes of data stored in data stores and creating new value. Thus, most big data analysis technology methods include data mining, machine learning, natural language processing, and pattern recognition used in existing statistical computer science. Also, using the R language, a big data tool, we can express analysis results through various visualization functions using pre-processing text data. The data used in this study were analyzed for 29 papers in a specific journal. In the final analysis results, the most frequently mentioned keyword was "Research", which ranked first 743 times. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

  • PDF