DOI QR코드

DOI QR Code

Similarity Measurement Between Titles and Abstracts Using Bijection Mapping and Phi-Correlation Coefficient

  • John N. Mlyahilu (Division of Computer Engineering and AI, Pukyong National University) ;
  • Jong-Nam Kim (Division of Computer Engineering and AI, Pukyong National University)
  • Received : 2022.09.13
  • Accepted : 2022.09.30
  • Published : 2022.09.30

Abstract

This excerpt delineates a quantitative measure of relationship between a research title and its respective abstract extracted from different journal articles documented through a Korean Citation Index (KCI) database published through various journals. In this paper, we propose a machine learning-based similarity metric that does not assume normality on dataset, realizes the imbalanced dataset problem, and zero-variance problem that affects most of the rule-based algorithms. The advantage of using this algorithm is that, it eliminates the limitations experienced by Pearson correlation coefficient (r) and additionally, it solves imbalanced dataset problem. A total of 107 journal articles collected from the database were used to develop a corpus with authors, year of publication, title, and an abstract per each. Based on the experimental results, the proposed algorithm achieved high correlation coefficient values compared to others which are cosine similarity, euclidean, and pearson correlation coefficients by scoring a maximum correlation of 1, whereas others had obtained non-a-number value to some experiments. With these results, we found that an effective title must have high correlation coefficient with the respective abstract.

Keywords

Acknowledgement

본 논문은 2017년 정부(교육부)의 재원으로 한국연구재단의 지원(NRF-2017S1A6A3A01079869)을 받아 수행되었습니다.

References

  1. S. Tullu. "Write the title and abstract for research paper: Being concise, precise, and meticulous is a key" Saudi Journal of Anaesthesia, Vol. 13, No. 1, pp. 12 - 17, 2019. https://doi.org/10.4103/sja.SJA_685_18
  2. C. Andrade. "How to write a good abstract for a scientific paper or conference presentation,"Indian journal of psychiatry, Vol. 53, No. 2, pp. 172-175, 2011. https://doi.org/10.4103/0019-5545.82558
  3. J. Soch, "The Book of Statistical Proofs,"Online Edition, pp. 16, 2022.
  4. C.E. Paiva, J.P. Lima, B.S.R. Paiva. "Articles with short titles describing the results are cited more often," Clinics, Vol. 67, No. 5, pp. 509-513, 2012. https://doi.org/10.6061/clinics/2012(05)17
  5. C. Andrade. "How to write a good abstract for a scientific paper or conference presentation," Indian journal of psychiatry, Vol. 53, No. 2, pp. 172-175, 2011. https://doi.org/10.4103/0019-5545.82558
  6. D. Ruffell. "Writing a great abstract: tips from an Editor,"FEBS Letters," Vol. 593, No. 2, pp. 141 - 143, 2019.
  7. K. Vijay. "Data Science || Data Exploration," pp. 39-64, 2019.
  8. R.J. Janse, T. Hoekstra, K.J. Jager, C. Zoccali, G. Tripepi, F.W. Dekker, M. Diepen. "Conducting correlation analysis: important limitations and pitfalls," Clinical Kidney Journal, Vol. 14, No. 11, pp. 2332 - 2337, 2021. https://doi.org/10.1093/ckj/sfab085
  9. R. Aggarwal, P. Ranganathan. "Common pitfalls in statistical analysis: The use of correlation techniques," Perspectives in clinical research, Vol. 7, No. 4, pp. 187-190, 2019. https://doi.org/10.4103/2229-3485.192046
  10. J. Ekstrom. "A Generalized Definition of the Polychoric Correlation Coefficient," 2011.
  11. L. Zahrotun. "Comparison Jaccard similarity, Cosine Similarity and Combined Both of the Data Clustering with Shared Nearest Neighbor Method," Computer Engineering and Applications, Vol. 5, No. 1, (2016), pp. 11-18. https://doi.org/10.18495/comengapp.v5i1.160
  12. D. Chicco, G. Jurman. "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation," BMC Genomics, Vol. 21, No. 6, pp. 1 - 3, 2020. https://doi.org/10.1186/s12864-019-6419-1