• Title/Summary/Keyword: entity normalization

Search Result 5, Processing Time 0.024 seconds

Named entity normalization for traditional herbal formula mentions

  • Ho Jang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.10
    • /
    • pp.105-111
    • /
    • 2024
  • In this paper, we propose methods for the named entity normalization of traditional herbal formula found in medical texts. Specifically, we developed methodologies to determine whether mentions, such as full names of herbal formula and their abbreviations, refer to the same concept. Two different approaches were attempted. First, we built a supervised classification model that uses BERT-based contextual vectors and character similarity features of herbal formula mentions in medical texts to determine whether two mentions are identical. Second, we applied a prompt-based querying method using GPT-4o mini and GPT-4o to perform the same task. Both methods achieved over 0.9 in Precision, Recall, and F1-score, with the GPT-4o-based approach demonstrating the highest Precision and F1-Score. The results of this study demonstrate the effectiveness of machine learning-based approaches for named entity normalization in traditional medicine texts, with the GPT-4o-based method showing superior performance. This suggests its potential as a valuable foundation for the development of intelligent information extraction systems in the traditional medicine domain.

Improving methods for normalizing biomedical text entities with concepts from an ontology with (almost) no training data at BLAH5 the CONTES

  • Ferre, Arnaud;Ba, Mouhamadou;Bossy, Robert
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.20.1-20.5
    • /
    • 2019
  • Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.

Negative Side Effects of Denormalization-Oriented Data Modeling in Enterprise-Wide Database Design (기업 전사 자료 설계에서 역정규화 중심 데이터 모델링의 부작용)

  • Rhee, Hae-Kyung
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.6 s.312
    • /
    • pp.17-25
    • /
    • 2006
  • As information systems to be computerized get significantly scaled up, data modeling issues apparently considered to be crucial once again as the early 1980's under the terms of data governance, data architecture or data quality. Unfortuately, merely resorting to heuristics-based field approaches with more or less no firm theoretical foundation of knowledge with regard to criteria of data design lead quite often to major failures in efficacy of data modeling. In this paper, we have compared normalization-critical data modeling approach, well-known as the Non-Stop Data Modeling methodology in the literature, to the Information Engineering in which in many occasions the notion of do-normalization is supported and even recommended as a mandatory part in its modeling nature. Quantitative analyses have revealed that NS methodology ostensibly outperforms IE methodology in terms of efficiency indices like adequacy of entity judgement, degree of existence of data circulation path that confirms the balancedness of data design and ratio of unnecessary data attribute replication.

The e-Business Component Construction based on Distributed Component Specification (분산 컴포넌트 명세를 통한 e-비즈니스 컴포넌트 구축)

  • Kim, Haeng-Gon;Choe, Ha-Jeong;Han, Eun-Ju
    • The KIPS Transactions:PartD
    • /
    • v.8D no.6
    • /
    • pp.705-714
    • /
    • 2001
  • The computing systems of today expanded business trade and distributed business process Internet. More and more systems are developed from components with exactly reusability, independency, and portability. Component based development is focused on advanced concepts rater than passive manipulation or source code in class library. The primary component construction in CBD. However, lead to an additional cost for reconstructing the new component with CBD model. It also difficult to serve component information with rapidly and exactly, which normalization model are not established, frequency user logging in Web caused overload. A lot of difficult issues and aspects of Component Based Development have to be investigated to develop good component-based products. There is no established normalization model which will guarantee a proper treatment of components. This paper elaborates on some of those aspects of web application to adapt user requirement with exactly and rapidly. Distributed components in this paper are used in the most tiny size on network and suggest the network-addressable interface based on business domain. We also discuss the internal and external specifications for grasping component internal and external relations of user requirements to be analyzed. The specifications are stored on Servlets after dividing the information between session and entity as an EJB (Enterprise JavaBeans) that are reusable unit size in business domain. The reusable units are used in business component through query to get business component. As a major contribution, we propose a systems model for registration, auto-arrange, search, test, and download component, which covers component reusability and component customization.

  • PDF

Directions for Developing Database Schema of Records in Archives Management Systems (영구기록물관리를 위한 기록물 데이터베이스 스키마 개발 방향)

  • Yim, Jin-Hee;Lee, Dae-Wook;Kim, Eun-Sil;Kim, Ik-Han
    • The Korean Journal of Archival Studies
    • /
    • no.34
    • /
    • pp.57-105
    • /
    • 2012
  • The CAMS(Central Archives Management System) of NAK(National Archives of Korea) is an important system which receives and manages large amount of electronic records annually from 2015. From the point of view in database design, this paper analyzes the database schema of CAMS and discusses the direction of overall improvement of the CAMS. Firstly this research analyzes the tables for records and folders in the CAMS database which are core tables for the electronic records management. As a result, researchers notice that it is difficult to trust the quality of the records in the CAMS, because two core tables are entirely not normalized and have many columns whose roles are unknown. Secondly, this study suggests directions of normalization for the tables for records and folders in the CAMS database like followings: First, redistributing the columns into proper tables to reduce the duplication. Second, separating the columns about the classification scheme into separate tables. Third, separating the columns about the records types and sorts into separate tables. Lastly, separating metadata information related to the acquisition, takeover and preservation into separate tables. Thirdly, this paper suggests considerations to design and manage the database schema in each phase of archival management. In the ingest phase, the system should be able to process large amount of records as batch jobs in time annually. In the preservation phase, the system should be able to keep the management histories in the CAMS as audit trails including the reclassification, revaluation, and preservation activities related to the records. In the access phase, the descriptive metadata sets for the access should be selected and confirmed in various ways. Lastly, this research also shows the prototype of conceptual database schema for the CAMS which fulfills the metadata standards for records.