Browse > Article
http://dx.doi.org/10.23087/jkicsp.2022.23.3.005

Similarity Measurement Between Titles and Abstracts Using Bijection Mapping and Phi-Correlation Coefficient  

John N. Mlyahilu (Division of Computer Engineering and AI, Pukyong National University)
Jong-Nam Kim (Division of Computer Engineering and AI, Pukyong National University)
Publication Information
Journal of the Institute of Convergence Signal Processing / v.23, no.3, 2022 , pp. 143-149 More about this Journal
Abstract
This excerpt delineates a quantitative measure of relationship between a research title and its respective abstract extracted from different journal articles documented through a Korean Citation Index (KCI) database published through various journals. In this paper, we propose a machine learning-based similarity metric that does not assume normality on dataset, realizes the imbalanced dataset problem, and zero-variance problem that affects most of the rule-based algorithms. The advantage of using this algorithm is that, it eliminates the limitations experienced by Pearson correlation coefficient (r) and additionally, it solves imbalanced dataset problem. A total of 107 journal articles collected from the database were used to develop a corpus with authors, year of publication, title, and an abstract per each. Based on the experimental results, the proposed algorithm achieved high correlation coefficient values compared to others which are cosine similarity, euclidean, and pearson correlation coefficients by scoring a maximum correlation of 1, whereas others had obtained non-a-number value to some experiments. With these results, we found that an effective title must have high correlation coefficient with the respective abstract.
Keywords
Text mining; Machine learning; Zero-variance; Document-term-matrix; Corpora;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. Tullu. "Write the title and abstract for research paper: Being concise, precise, and meticulous is a key" Saudi Journal of Anaesthesia, Vol. 13, No. 1, pp. 12 - 17, 2019.   DOI
2 C. Andrade. "How to write a good abstract for a scientific paper or conference presentation,"Indian journal of psychiatry, Vol. 53, No. 2, pp. 172-175, 2011.   DOI
3 J. Soch, "The Book of Statistical Proofs,"Online Edition, pp. 16, 2022.
4 C.E. Paiva, J.P. Lima, B.S.R. Paiva. "Articles with short titles describing the results are cited more often," Clinics, Vol. 67, No. 5, pp. 509-513, 2012.   DOI
5 C. Andrade. "How to write a good abstract for a scientific paper or conference presentation," Indian journal of psychiatry, Vol. 53, No. 2, pp. 172-175, 2011.   DOI
6 D. Ruffell. "Writing a great abstract: tips from an Editor,"FEBS Letters," Vol. 593, No. 2, pp. 141 - 143, 2019.
7 K. Vijay. "Data Science || Data Exploration," pp. 39-64, 2019.
8 R.J. Janse, T. Hoekstra, K.J. Jager, C. Zoccali, G. Tripepi, F.W. Dekker, M. Diepen. "Conducting correlation analysis: important limitations and pitfalls," Clinical Kidney Journal, Vol. 14, No. 11, pp. 2332 - 2337, 2021.   DOI
9 R. Aggarwal, P. Ranganathan. "Common pitfalls in statistical analysis: The use of correlation techniques," Perspectives in clinical research, Vol. 7, No. 4, pp. 187-190, 2019.   DOI
10 J. Ekstrom. "A Generalized Definition of the Polychoric Correlation Coefficient," 2011.
11 L. Zahrotun. "Comparison Jaccard similarity, Cosine Similarity and Combined Both of the Data Clustering with Shared Nearest Neighbor Method," Computer Engineering and Applications, Vol. 5, No. 1, (2016), pp. 11-18.   DOI
12 D. Chicco, G. Jurman. "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation," BMC Genomics, Vol. 21, No. 6, pp. 1 - 3, 2020.   DOI