• Title/Summary/Keyword: Similarity Metrics

Search Result 76, Processing Time 0.022 seconds

Signal Peptide Cleavage Site Prediction Using a String Kernel with Real Exponent Metric (실수 지수 메트릭으로 구성된 스트링 커널을 이용한 신호펩티드의 절단위치 예측)

  • Chi, Sang-Mun
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.10
    • /
    • pp.786-792
    • /
    • 2009
  • A kernel in support vector machines can be described as a similarity measure between data, and this measure is used to find an optimal hyperplane that classifies patterns. It is therefore important to effectively incorporate the characteristics of data into the similarity measure. To find an optimal similarity between amino acid sequences, we propose a real exponent exponential form of the two metrices, which are derived from the evolutionary relationships of amino acids and the hydrophobicity of amino acids. We prove that the proposed metric satisfies the conditions to be a metric, and we find a relation between the proposed metric and the metrics in the string kernels which are widely used for the processing of amino acid sequences and DNA sequences. In the prediction experiments on the cleavage site of the signal peptide, the optimal metric can be found in the proposed metrics.

Pattern and process in MAEUL, a traditional Korean rural landscape

  • Kim, Jae-Eun;Hong, Sun-Kee
    • Journal of Ecology and Environment
    • /
    • v.34 no.2
    • /
    • pp.237-249
    • /
    • 2011
  • Land-use changes due to the socio-economic environment influence landscape patterns and processes, which affect habitats and biodiversity. This study considers the effects of such land-use changes, particularly on the traditional rural "Maeul" forested landscape, by analyzing landscape structure and vegetation changes. Three study areas were examined that have seen their populations decrease and age over the last few decades. Five types of plant life-forms (Raunkier life-forms) were distinguished to investigate ecosystem function. Principle component analysis was used to understand vegetation dynamics and community characteristics based on a vegetation similarity index. Ordination analysis transformed species-coverage data was introduced to clarify vegetation dynamics. Landscape indices, such as area metrics, edge metrics, and shape metrics, showed that spatial heterogeneity has increased over time in all areas. Pinus densiflora was the main land-use plant type in all study areas but decreased over time, whereas Quercus spp. increased. Over a decade, P. densiflora communities shifted to deciduous oak and plantation. These findings indicate that the impact of human activities on the Maeul landscape is twofold. While forestry activities caused heavy disturbances, the abandonment of traditional human activities has led to natural succession. Furthermore, it can be concluded that the type and intensity of these human impacts on landscape heterogeneity relate differently to vegetation succession. This reflects the cause and consequence of patch dynamics. We discuss an approach for sustainable landscape planning and management of the Maeul landscape based on traditional management.

Parametric and Non Parametric Measures for Text Similarity (텍스트 유사성을 위한 파라미터 및 비 파라미터 측정)

  • Mlyahilu, John;Kim, Jong-Nam
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.20 no.4
    • /
    • pp.193-198
    • /
    • 2019
  • The wide spread of genuine and fake information on internet has lead to various studies on text analysis. Copying and pasting others' work without acknowledgement, research results manipulation without proof has been trending for a while in the era of data science. Various tools have been developed to reduce, combat and possibly eradicate plagiarism in various research fields. Text similarity measurements can be manually done by using both parametric and non parametric methods of which this study implements cosine similarity and Pearson correlation as parametric while Spearman correlation as non parametric. Cosine similarity and Pearson correlation metrics have achieved highest coefficients of similarity while Spearman shown low similarity coefficients. We recommend the use of non parametric methods in measuring text similarity due to their non normality assumption as opposed to the parametric methods which relies on normality assumptions and biasness.

Using Genre Rating Information for Similarity Estimation in Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.12
    • /
    • pp.93-100
    • /
    • 2019
  • Similarity computation is very crucial to performance of memory-based collaborative filtering systems. These systems make use of user ratings to recommend products to customers in online commercial sites. For better recommendation, most similar users to the active user need to be selected for their references. There have been numerous similarity measures developed in literature, most of which suffer from data sparsity or cold start problems. This paper intends to extract preference information as much as possible from user ratings to compute more reliable similarity even in a sparse data condition, as compared to previous similarity measures. We propose a new similarity measure which relies not only on user ratings but also on movie genre information provided by the dataset. Performance experiments of the proposed measure and previous relevant measures are conducted to investigate their performance. As a result, it is found that the proposed measure yields better or comparable achievements in terms of major performance metrics.

공정계획 전문가시스템의 개발-조선 블럭분할에의 응용

  • 박병태;이재원
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 1993.04b
    • /
    • pp.370-374
    • /
    • 1993
  • This paper describes a study on the expert system based process planning of the block division process in shipbuilding. The prototype system developed deterines the block division line of the midship of crude-oil tanker. Case-based reasoning (CBR) approach relying on previous similar cases to solve the problem is applied instead of rule-based reasoning (RBR). Similar cases are retrieved from case base according to the similarity metrics between input problem and cases. The retrieved case with the highest priority is then adapted to fit to the input problem buy adaptation rules. The adapted solution is proposed as the division line for the input problem.

Generative probabilistic model with Dirichlet prior distribution for similarity analysis of research topic

  • Milyahilu, John;Kim, Jong Nam
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.4
    • /
    • pp.595-602
    • /
    • 2020
  • We propose a generative probabilistic model with Dirichlet prior distribution for topic modeling and text similarity analysis. It assigns a topic and calculates text correlation between documents within a corpus. It also provides posterior probabilities that are assigned to each topic of a document based on the prior distribution in the corpus. We then present a Gibbs sampling algorithm for inference about the posterior distribution and compute text correlation among 50 abstracts from the papers published by IEEE. We also conduct a supervised learning to set a benchmark that justifies the performance of the LDA (Latent Dirichlet Allocation). The experiments show that the accuracy for topic assignment to a certain document is 76% for LDA. The results for supervised learning show the accuracy of 61%, the precision of 93% and the f1-score of 96%. A discussion for experimental results indicates a thorough justification based on probabilities, distributions, evaluation metrics and correlation coefficients with respect to topic assignment.

Precise segmentation of fetal head in ultrasound images using improved U-Net model

  • Vimala Nagabotu;Anupama Namburu
    • ETRI Journal
    • /
    • v.46 no.3
    • /
    • pp.526-537
    • /
    • 2024
  • Monitoring fetal growth in utero is crucial to anomaly diagnosis. However, current computer-vision models struggle to accurately assess the key metrics (i.e., head circumference and occipitofrontal and biparietal diameters) from ultrasound images, largely owing to a lack of training data. Mitigation usually entails image augmentation (e.g., flipping, rotating, scaling, and translating). Nevertheless, the accuracy of our task remains insufficient. Hence, we offer a U-Net fetal head measurement tool that leverages a hybrid Dice and binary cross-entropy loss to compute the similarity between actual and predicted segmented regions. Ellipse-fitted two-dimensional ultrasound images acquired from the HC18 dataset are input, and their lower feature layers are reused for efficiency. During regression, a novel region of interest pooling layer extracts elliptical feature maps, and during segmentation, feature pyramids fuse field-layer data with a new scale attention method to reduce noise. Performance is measured by Dice similarity, mean pixel accuracy, and mean intersection-over-union, giving 97.90%, 99.18%, and 97.81% scores, respectively, which match or outperform the best U-Net models.

An approach for improving the performance of the Content-Based Image Retrieval (CBIR)

  • Jeong, Inseong
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.30 no.6_2
    • /
    • pp.665-672
    • /
    • 2012
  • Amid rapidly increasing imagery inputs and their volume in a remote sensing imagery database, Content-Based Image Retrieval (CBIR) is an effective tool to search for an image feature or image content of interest a user wants to retrieve. It seeks to capture salient features from a 'query' image, and then to locate other instances of image region having similar features elsewhere in the image database. For a CBIR approach that uses texture as a primary feature primitive, designing a texture descriptor to better represent image contents is a key to improve CBIR results. For this purpose, an extended feature vector combining the Gabor filter and co-occurrence histogram method is suggested and evaluated for quantitywise and qualitywise retrieval performance criterion. For the better CBIR performance, assessing similarity between high dimensional feature vectors is also a challenging issue. Therefore a number of distance metrics (i.e. L1 and L2 norm) is tried to measure closeness between two feature vectors, and its impact on retrieval result is analyzed. In this paper, experimental results are presented with several CBIR samples. The current results show that 1) the overall retrieval quantity and quality is improved by combining two types of feature vectors, 2) some feature is better retrieved by a specific feature vector, and 3) retrieval result quality (i.e. ranking of retrieved image tiles) is sensitive to an adopted similarity metric when the extended feature vector is employed.

Evaluation of Geo-based Image Fusion on Mobile Cloud Environment using Histogram Similarity Analysis

  • Lee, Kiwon;Kang, Sanggoo
    • Korean Journal of Remote Sensing
    • /
    • v.31 no.1
    • /
    • pp.1-9
    • /
    • 2015
  • Mobility and cloud platform have become the dominant paradigm to develop web services dealing with huge and diverse digital contents for scientific solution or engineering application. These two trends are technically combined into mobile cloud computing environment taking beneficial points from each. The intention of this study is to design and implement a mobile cloud application for remotely sensed image fusion for the further practical geo-based mobile services. In this implementation, the system architecture consists of two parts: mobile web client and cloud application server. Mobile web client is for user interface regarding image fusion application processing and image visualization and for mobile web service of data listing and browsing. Cloud application server works on OpenStack, open source cloud platform. In this part, three server instances are generated as web server instance, tiling server instance, and fusion server instance. With metadata browsing of the processing data, image fusion by Bayesian approach is performed using functions within Orfeo Toolbox (OTB), open source remote sensing library. In addition, similarity of fused images with respect to input image set is estimated by histogram distance metrics. This result can be used as the reference criterion for user parameter choice on Bayesian image fusion. It is thought that the implementation strategy for mobile cloud application based on full open sources provides good points for a mobile service supporting specific remote sensing functions, besides image fusion schemes, by user demands to expand remote sensing application fields.

Steganography Software Analysis -Focusing on Performance Comparison (스테가노그래피 소프트웨어 분석 연구 - 성능 비교 중심으로)

  • Lee, Hyo-joo;Park, Yongsuk
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.10
    • /
    • pp.1359-1368
    • /
    • 2021
  • Steganography is a science of embedding secret data into innocent data and its goal is to conceal the existence of a carrier data. Many research on Steganography has been proposed by various hiding and detection techniques that are based on different algorithms. On the other hand, very few studies have been conducted to analyze the performance of each Steganography software. This paper describes five different Steganography software, each having its own algorithms, and analyzes the difference of each inherent feature. Image quality metrics of Peak Signal to Noise Ratio (PSNR) and Structural SIMilarity (SSIM) are used to define its performance of each Steganography software. We extracted PSNR and SSIM results of a quantitative amount of embedded output images for those five Steganography software. The results will show the optimal steganography software based on the evaluation metrics and ultimately contribute to forensics.