• Title/Summary/Keyword: similarity weight

Search Result 376, Processing Time 0.03 seconds

PMCN: Combining PDF-modified Similarity and Complex Network in Multi-document Summarization

  • Tu, Yi-Ning;Hsu, Wei-Tse
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.9 no.3
    • /
    • pp.23-41
    • /
    • 2019
  • This study combines the concept of degree centrality in complex network with the Term Frequency $^*$ Proportional Document Frequency ($TF^*PDF$) algorithm; the combined method, called PMCN (PDF-Modified similarity and Complex Network), constructs relationship networks among sentences for writing news summaries. The PMCN method is a multi-document summarization extension of the ideas of Bun and Ishizuka (2002), who first published the $TF^*PDF$ algorithm for detecting hot topics. In their $TF^*PDF$ algorithm, Bun and Ishizuka defined the publisher of a news item as its channel. If the PDF weight of a term is higher than the weights of other terms, then the term is hotter than the other terms. However, this study attempts to develop summaries for news items. Because the $TF^*PDF$ algorithm summarizes daily news, PMCN replaces the concept of "channel" with "the date of the news event", and uses the resulting chronicle ordering for a multi-document summarization algorithm, of which the F-measure scores were 0.042 and 0.051 higher than LexRank for the famous d30001t and d30003t tasks, respectively.

Relevancy contemplation in medical data analytics and ranking of feature selection algorithms

  • P. Antony Seba;J. V. Bibal Benifa
    • ETRI Journal
    • /
    • v.45 no.3
    • /
    • pp.448-461
    • /
    • 2023
  • This article performs a detailed data scrutiny on a chronic kidney disease (CKD) dataset to select efficient instances and relevant features. Data relevancy is investigated using feature extraction, hybrid outlier detection, and handling of missing values. Data instances that do not influence the target are removed using data envelopment analysis to enable reduction of rows. Column reduction is achieved by ranking the attributes through feature selection methodologies, namely, extra-trees classifier, recursive feature elimination, chi-squared test, analysis of variance, and mutual information. These methodologies are ranked via Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) using weight optimization to identify the optimal features for model building from the CKD dataset to facilitate better prediction while diagnosing the severity of the disease. An efficient hybrid ensemble and novel similarity-based classifiers are built using the pruned dataset, and the results are thereafter compared with random forest, AdaBoost, naive Bayes, k-nearest neighbors, and support vector machines. The hybrid ensemble classifier yields a better prediction accuracy of 98.31% for the features selected by extra tree classifier (ETC), which is ranked as the best by TOPSIS.

Molecular Identification and Expression of Myosin Light Chain in Shortspine Spurdog (Squalus mitsukurii)

  • Kim, Soo Cheol;Sumi, Kanij Rukshana;Sharker, Md Rajib;Kho, Kang Hee
    • Journal of Marine Life Science
    • /
    • v.3 no.1
    • /
    • pp.1-8
    • /
    • 2018
  • Myosin is considered as the vital motor protein in vertebrates and invertebrates. Our present study was conducted to decipher the occurrence of myosin in dog fish (Squalus mitsukurii). We isolated one clone containing 979 bp cDNA sequence, which consisted of a complete coding sequence of 453 bp and a deduced amino acid sequence of 150 amino acids from the open reading frame with molecular weight, isoelectric point and aliphatic index are 16.72 Kda, 4.49 and 78.00, respectively. It contained 428 bp long 3' UTR with single potential polyadenylation signals (AATAAA). The predicted EF CA2+ binding domains were identified in residue 6-41, 83-118 and 133-150. A BLAST search indicates this protein exhibits a strong similarity to whale shark (Rhincodon typus) MLC3 (91% identical) and also house mouse (Mus musculus) MLC isoform 3f (81% identical). Phylogenetic analysis revealed that this protein is a MLC 3 isoform like protein. This protein also demonstrates highly conserved region with other myosin proteins. Homology modeling of S. mitsukuri was performed using crystal structure of Gallus gallus skeletal muscle myosin II based on high similarity. Reverse transcription-polymerase chain reaction (PCR), quantitative PCR results exhibits dogfish myosin protein is highly expressed in muscle tissue.

The Weight Decision of Multi-dimensional Features using Fuzzy Similarity Relations and Emotion-Based Music Retrieval (퍼지 유사관계를 이용한 다차원 특징들의 가중치 결정과 감성기반 음악검색)

  • Lim, Jee-Hye;Lee, Joon-Whoan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.5
    • /
    • pp.637-644
    • /
    • 2011
  • Being digitalized, the music can be easily purchased and delivered to the users. However, there is still some difficulty to find the music which fits to someone's taste using traditional music information search based on musician, genre, tittle, album title and so on. In order to reduce the difficulty, the contents-based or the emotion-based music retrieval has been proposed and developed. In this paper, we propose new method to determine the importance of MPEG-7 low-level audio descriptors which are multi-dimensional vectors for the emotion-based music retrieval. We measured the mutual similarities of musics which represent a pair of emotions expressed by opposite meaning in terms of each multi-dimensional descriptor. Then rough approximation, and inter- and intra similarity ratio from the similarity relation are used for determining the importance of a descriptor, respectively. The set of weights based on the importance decides the aggregated similarity measure, by which emotion-based music retrieval can be achieved. The proposed method shows better result than previous method in terms of the average number of satisfactory musics in the experiment emotion-based retrieval based on content-based search.

Weight Lightening of HUMS Housing for Small Aircraft by Using FEM and Taguchi Method (유한요소법 및 다구찌 기법에 의한 소형항공기용 HUMS 하우징 경량화)

  • Kim, Jin-Su;Yoon, Dae-Won;Park, Tae-Sang;Jeong, Jae-Eun;Oh, Jae-Eung
    • Transactions of the Korean Society for Noise and Vibration Engineering
    • /
    • v.23 no.12
    • /
    • pp.1045-1055
    • /
    • 2013
  • It is true that the dependency on import is currently high in case of the safety checkup system of domestic airplanes, and it is at the point of time that localization of HUMS for small airplanes is required. In this study, the design factors were selected for the lightweight of HUMS for small airplanes by using Pro-Engineer which is a design tool and Abaqus. 9 models were made through experiment plans with Taguchi method for this, and the each model for weight lightening was selected through vibration analysis and shock analysis while in operation with experiment profile values. After fabricating HUMS, it was verified that as a result of experiment with the same profile values as the analysis, there was similarity between the analyzed values and values of the experiment. As a result of performing weight lightening which is the purpose of the study, electronic performance for small airplanes is assured and a design plan reducing 15 % weight compared to the targeted weight was deduced. Besides, it could be verified that the light weight model satisfied the maximum allowable displacement value of PCB[printed circuit board] and accordingly satisfied electronic properties of HUMS. In this study, the reliability of a product was certified through the result of an experiment on ground. If the reliability of HUMS were verified through a test flight in the future, it is considered that it would make a big contribution to localization of aerospace electronic equipment.

Evaluation of Classifiers Performance for Areal Features Matching (면 객체 매칭을 위한 판별모델의 성능 평가)

  • Kim, Jiyoung;Kim, Jung Ok;Yu, Kiyun;Huh, Yong
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.31 no.1
    • /
    • pp.49-55
    • /
    • 2013
  • In this paper, we proposed a good classifier to match different spatial data sets by applying evaluation of classifiers performance in data mining and biometrics. For this, we calculated distances between a pair of candidate features for matching criteria, and normalized the distances by Min-Max method and Tanh (TH) method. We defined classifiers that shape similarity is derived from fusion of these similarities by CRiteria Importance Through Intercriteria correlation (CRITIC) method, Matcher Weighting method and Simple Sum (SS) method. As results of evaluation of classifiers performance by Precision-Recall (PR) curve and area under the PR curve (AUC-PR), we confirmed that value of AUC-PR in a classifier of TH normalization and SS method is 0.893 and the value is the highest. Therefore, to match different spatial data sets, we thought that it is appropriate to a classifier that distances of matching criteria are normalized by TH method and shape similarity is calculated by SS method.

In silico characterisation, homology modelling and structure-based functional annotation of blunt snout bream (Megalobrama amblycephala) Hsp70 and Hsc70 proteins

  • Tran, Ngoc Tuan;Jakovlic, Ivan;Wang, Wei-Min
    • Journal of Animal Science and Technology
    • /
    • v.57 no.12
    • /
    • pp.44.1-44.9
    • /
    • 2015
  • Background: Heat shock proteins play an important role in protection from stress stimuli and metabolic insults in almost all organisms. Methods: In this study, computational tools were used to deeply analyse the physicochemical characteristics and, using homology modelling, reliably predict the tertiary structure of the blunt snout bream (Ma-) Hsp70 and Hsc70 proteins. Derived three-dimensional models were then used to predict the function of the proteins. Results: Previously published predictions regarding the protein length, molecular weight, theoretical isoelectric point and total number of positive and negative residues were corroborated. Among the new findings are: the extinction coefficient (33725/33350 and 35090/34840 - Ma-Hsp70/ Ma-Hsc70, respectively), instability index (33.68/35.56 - both stable), aliphatic index (83.44/80.23 - both very stable), half-life estimates (both relatively stable), grand average of hydropathicity (-0.431/-0.473 - both hydrophilic) and amino acid composition (alanine-lysine-glycine/glycine-lysine-aspartic acid were the most abundant, no disulphide bonds, the N-terminal of both proteins was methionine). Homology modelling was performed by SWISS-MODEL program and the proposed model was evaluated as highly reliable based on PROCHECK's Ramachandran plot, ERRAT, PROVE, Verify 3D, ProQ and ProSA analyses. Conclusions: The research revealed a high structural similarity to Hsp70 and Hsc70 proteins from several taxonomically distant animal species, corroborating a remarkably high level of evolutionary conservation among the members of this protein family. Functional annotation based on structural similarity provides a reliable additional indirect evidence for a high level of functional conservation of these two genes/proteins in blunt snout bream, but it is not sensitive enough to functionally distinguish the two isoforms.

Molecular Characterization of a Defensin-like Peptide from Larvae of a Beetle, Protaetia brevitarsis

  • Hwang, Jae-Sam;Kang, Bo-Ram;Kim, Seong-Ryul;Yun, Eun-Young;Park, Kwan-Ho;Jeon, Jae-Pil;Nam, Sung-Hee;Suh, Hwa-Jin;Hong, Mee-Yeon;Kim, Ik-Soo
    • International Journal of Industrial Entomology and Biomaterials
    • /
    • v.17 no.1
    • /
    • pp.131-135
    • /
    • 2008
  • A cDNA encoding a defensin-like peptide (Protaetiamycine) from the larvae of a beetle, Protaetia brevitarsis was cloned. The DNAs encoded the deduced propeptide of 79 amino acid residues with the predicted molecular weight of 8.4 kDa and PI of 8.24. Overall amino acid sequence of this protein has 39% similarity to that of Rhodnius prolixus defensin, 43% similarity to that of Acalolepta luxuriosa defensin, and 72% similarity to that of Oryctes rhinoceros defensin, suggesting that this gene is an insect defensin. In an attempt to apply the anti-bacterial peptide to the development of therapeutic agents, a 12-mer peptide amidated at its C-terminus, ACAAHCLAIGRG-$NH_2$ (Ala55-Lys66-$NH_2$, 12Pbn) was synthesized. This peptide showed some antifungal activity against Candida albicans. To increase antifungal activity, six 9-mer peptides were synthesized by modifying amino acid sequences of 12Pbn fragment. Among these peptides, 9Pbm3-9Pbm6 exhibited strong activity compared with Cecropin B and mellitin.

Classification Protein Subcellular Locations Using n-Gram Features (단백질 서열의 n-Gram 자질을 이용한 세포내 위치 예측)

  • Kim, Jinsuk
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.12-16
    • /
    • 2007
  • The function of a protein is closely co-related with its subcellular location(s). Given a protein sequence, therefore, how to determine its subcellular location is a vitally important problem. We have developed a new prediction method for protein subcellular location(s), which is based on n-gram feature extraction and k-nearest neighbor (kNN) classification algorithm. It classifies a protein sequence to one or more subcellular compartments based on the locations of top k sequences which show the highest similarity weights against the input sequence. The similarity weight is a kind of similarity measure which is determined by comparing n-gram features between two sequences. Currently our method extract penta-grams as features of protein sequences, computes scores of the potential localization site(s) using kNN algorithm, and finally presents the locations and their associated scores. We constructed a large-scale data set of protein sequences with known subcellular locations from the SWISS-PROT database. This data set contains 51,885 entries with one or more known subcellular locations. Our method show very high prediction precision of about 93% for this data set, and compared with other method, it also showed comparable prediction improvement for a test collection used in a previous work.

  • PDF

A Design of HPPS(Hybrid Preference Prediction System) for Customer-Tailored Service (고객 맞춤 서비스를 위한 HPPS(Hybrid Preference Prediction System) 설계)

  • Jeong, Eun-Hee;Lee, Byung-Kwan
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.11
    • /
    • pp.1467-1477
    • /
    • 2011
  • This paper proposes a HPPS(Hybrid Preference Prediction System) design using the analysis of user profile and of the similarity among users precisely to predict the preference for custom-tailored service. Contrary to the existing NBCFA(Neighborhood Based Collaborative Filtering Algorithm), this paper is designed using these following rules. First, if there is no neighbor's commodity rating value in a preference prediction formula, this formula uses the rating average value for a commodity. Second, this formula reflects the weighting value through the analysis of a user's characteristics. Finally, when the nearest neighbor is selected, we consider the similarity, the commodity rating, and the rating frequency. Therefore, the first and second preference prediction formula made HPPS improve the precision by 97.24%, and the nearest neighbor selection method made HPPS improve the precision by 75%, compared with the existing NBCFA.