• Title/Summary/Keyword: NER

Search Result 104, Processing Time 0.03 seconds

Integrated Char-Word Embedding on Chinese NER using Transformer (트랜스포머를 이용한 중국어 NER 관련 문자와 단어 통합 임배딩)

  • Jin, ChunGuang;Joe, Inwhee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.415-417
    • /
    • 2021
  • Since the words and words in Chinese sentences are continuous and the length of vocabulary is huge, Chinese NER(Named Entity Recognition) always based on character representation. In recent years, many Chinese research has been reconsidered how to integrate the word information into the Chinese NER model. However, the traditional sequence model has complex structure, the slow inference speed, and an additional dictionary information is needed, which is difficult to implement in the industry. The approach in this paper has the state of the art and parallelizable, which is integrated the char-word embeddings, so that the model learns word information. The proposed model is easy to implement, and outperforms traditional model in terms of speed and efficiency, which is improved f1-score on two dataset.

Chinese-clinical-record Named Entity Recognition using IDCNN-BiLSTM-Highway Network

  • Tinglong Tang;Yunqiao Guo;Qixin Li;Mate Zhou;Wei Huang;Yirong Wu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.7
    • /
    • pp.1759-1772
    • /
    • 2023
  • Chinese named entity recognition (NER) is a challenging work that seeks to find, recognize and classify various types of information elements in unstructured text. Due to the Chinese text has no natural boundary like the spaces in the English text, Chinese named entity identification is much more difficult. At present, most deep learning based NER models are developed using a bidirectional long short-term memory network (BiLSTM), yet the performance still has some space to improve. To further improve their performance in Chinese NER tasks, we propose a new NER model, IDCNN-BiLSTM-Highway, which is a combination of the BiLSTM, the iterated dilated convolutional neural network (IDCNN) and the highway network. In our model, IDCNN is used to achieve multiscale context aggregation from a long sequence of words. Highway network is used to effectively connect different layers of networks, allowing information to pass through network layers smoothly without attenuation. Finally, the global optimum tag result is obtained by introducing conditional random field (CRF). The experimental results show that compared with other popular deep learning-based NER models, our model shows superior performance on two Chinese NER data sets: Resume and Yidu-S4k, The F1-scores are 94.98 and 77.59, respectively.

Encoding Dictionary Feature for Deep Learning-based Named Entity Recognition

  • Ronran, Chirawan;Unankard, Sayan;Lee, Seungwoo
    • International Journal of Contents
    • /
    • v.17 no.4
    • /
    • pp.1-15
    • /
    • 2021
  • Named entity recognition (NER) is a crucial task for NLP, which aims to extract information from texts. To build NER systems, deep learning (DL) models are learned with dictionary features by mapping each word in the dataset to dictionary features and generating a unique index. However, this technique might generate noisy labels, which pose significant challenges for the NER task. In this paper, we proposed DL-dictionary features, and evaluated them on two datasets, including the OntoNotes 5.0 dataset and our new infectious disease outbreak dataset named GFID. We used (1) a Bidirectional Long Short-Term Memory (BiLSTM) character and (2) pre-trained embedding to concatenate with (3) our proposed features, named the Convolutional Neural Network (CNN), BiLSTM, and self-attention dictionaries, respectively. The combined features (1-3) were fed through BiLSTM - Conditional Random Field (CRF) to predict named entity classes as outputs. We compared these outputs with other predictions of the BiLSTM character, pre-trained embedding, and dictionary features from previous research, which used the exact matching and partial matching dictionary technique. The findings showed that the model employing our dictionary features outperformed other models that used existing dictionary features. We also computed the F1 score with the GFID dataset to apply this technique to extract medical or healthcare information.

Recognition of DNA Damage in Mammals

  • Lee, Suk-Hee
    • BMB Reports
    • /
    • v.34 no.6
    • /
    • pp.489-495
    • /
    • 2001
  • DNA damage by UV and environmental agents are the major cause of genomic instability that needs to be repaired, otherwise it give rise to cancer. Accordingly, mammalian cells operate several DNA repair pathways that are not only responsible for identifying various types of DNA damage but also involved in removing DNA damage. In mammals, nucleotide excision repair (NER) machinery is responsible for most, if not all, of the bulky adducts caused by UV and chemical agents. Although most of the proteins involved in NER pathway have been identified, only recently have we begun to gain some insight into the mechanism by which proteins recognize damaged DNA. Binding of Xeroderma pigmentosum group C protein (XPC)-hHR23B complex to damaged DNA is the initial damage recognition step in NER, which leads to the recruitment of XPA and RPA to form a damage recognition complex. Formation of damage recognition complex not only stabilizes low affinity binding of XPA to the damaged DNA, but also induces structural distortion, both of which are likely necessary for the recruitment of TFIIH and two structure-specific endonucleases for dual incision.

  • PDF

Characterization of Hrq1-Rad14 Interaction in Saccharomyces cerevisiae (효모에서 Hrq1과 Rad14의 상호작용에 대한 연구)

  • Min, Moon-Hee;Kim, Min-Ji;Choi, You-Jin;You, Min-Ju;Kim, Uy-Ra;An, Hyo-Bin;Kim, Chae-Hyun;Kwon, Chae-Yeon;Bae, Sung-Ho
    • Korean Journal of Microbiology
    • /
    • v.50 no.2
    • /
    • pp.95-100
    • /
    • 2014
  • Hrq1 is a novel member of RecQ helicase family, found in fungal genomes by bioinformatics analyses. It is most homologous to human RECQL4 and recent genetic and biochemical studies suggested that it may play roles in the maintenance of genome stability. In this study, we investigated yeast two-hybrid interactions between Hrq1 and the yeast genes homologous to the human genes that are known to interact with RECQL4. Among the 11 genes tested, Rad14, a nucleotide excision repair (NER) factor, was found to interact with Hrq1. In addition, pull-down assay with the purified proteins revealed direct protein-protein interaction between Hrq1 and Rad14. The yeast two-hybrid interaction was enhanced by the DNA damage induced by 4-nitroquinoline-1-oxide, which was dependent on the presence of Rad4, a key NER factor. These results suggest that Hrq1 may function in NER through interaction with Rad14.

Binding Pattern Elucidation of NNK and NNAL Cigarette Smoke Carcinogens with NER Pathway Enzymes: an Onco-Informatics Study

  • Jamal, Qazi Mohammad Sajid;Dhasmana, Anupam;Lohani, Mohtashim;Firdaus, Sumbul;Ansari, Md Yousuf;Sahoo, Ganesh Chandra;Haque, Shafiul
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.13
    • /
    • pp.5311-5317
    • /
    • 2015
  • Cigarette smoke derivatives like NNK (4-(Methylnitrosamino)-1-(3-pyridyl)-1-butanone) and NNAL (4-(methylnitrosamino)-1-(3-pyridyl)-1-butan-1-ol) are well-known carcinogens. We analyzed the interaction of enzymes involved in the NER (nucleotide excision repair) pathway with ligands (NNK and NNAL). Binding was characterized for the enzymes sharing equivalent or better interaction as compared to +Ve control. The highest obtained docking energy between NNK and enzymes RAD23A, CCNH, CDK7, and CETN2 were -7.13 kcal/mol, -7.27 kcal/mol, -8.05 kcal/mol and -7.58 kcal/mol respectively. Similarly the highest obtained docking energy between NNAL and enzymes RAD23A, CCNH, CDK7, and CETN2 were -7.46 kcal/mol, -7.94 kcal/mol, -7.83 kcal/mol and -7.67 kcal/mol respectively. In order to find out the effect of NNK and NNAL on enzymes involved in the NER pathway applying protein-protein interaction and protein-complex (i.e. enzymes docked with NNK/NNAL) interaction analysis. It was found that carcinogens are well capable to reduce the normal functioning of genes like RAD23A (HR23A), CCNH, CDK7 and CETN2. In silico analysis indicated loss of functions of these genes and their corresponding enzymes, which possibly might be a cause for alteration of DNA repair pathways leading to damage buildup and finally contributing to cancer formation.

Performance Comparison Analysis on Named Entity Recognition system with Bi-LSTM based Multi-task Learning (다중작업학습 기법을 적용한 Bi-LSTM 개체명 인식 시스템 성능 비교 분석)

  • Kim, GyeongMin;Han, Seunggnyu;Oh, Dongsuk;Lim, HeuiSeok
    • Journal of Digital Convergence
    • /
    • v.17 no.12
    • /
    • pp.243-248
    • /
    • 2019
  • Multi-Task Learning(MTL) is a training method that trains a single neural network with multiple tasks influences each other. In this paper, we compare performance of MTL Named entity recognition(NER) model trained with Korean traditional culture corpus and other NER model. In training process, each Bi-LSTM layer of Part of speech tagging(POS-tagging) and NER are propagated from a Bi-LSTM layer to obtain the joint loss. As a result, the MTL based Bi-LSTM model shows 1.1%~4.6% performance improvement compared to single Bi-LSTM models.

Deep recurrent neural networks with word embeddings for Urdu named entity recognition

  • Khan, Wahab;Daud, Ali;Alotaibi, Fahd;Aljohani, Naif;Arafat, Sachi
    • ETRI Journal
    • /
    • v.42 no.1
    • /
    • pp.90-100
    • /
    • 2020
  • Named entity recognition (NER) continues to be an important task in natural language processing because it is featured as a subtask and/or subproblem in information extraction and machine translation. In Urdu language processing, it is a very difficult task. This paper proposes various deep recurrent neural network (DRNN) learning models with word embedding. Experimental results demonstrate that they improve upon current state-of-the-art NER approaches for Urdu. The DRRN models evaluated include forward and bidirectional extensions of the long short-term memory and back propagation through time approaches. The proposed models consider both language-dependent features, such as part-of-speech tags, and language-independent features, such as the "context windows" of words. The effectiveness of the DRNN models with word embedding for NER in Urdu is demonstrated using three datasets. The results reveal that the proposed approach significantly outperforms previous conditional random field and artificial neural network approaches. The best f-measure values achieved on the three benchmark datasets using the proposed deep learning approaches are 81.1%, 79.94%, and 63.21%, respectively.

Growth of abalone (Haliotis discus hannai) in cages using epibiont control measures

  • Han, Jido;Jeon, Mi Ae;Kim, Da Woon;Park, Hon;Kim, Byong Hak;Lee, Deok Chan
    • Fisheries and Aquatic Sciences
    • /
    • v.24 no.12
    • /
    • pp.400-405
    • /
    • 2021
  • In this study, the relationship between the growth of abalone and the presence of epibionts was investigated in abalone cultured in Goheung, Jeollanam-do, where there are severe problems high water temperatures and attachment. The experiment was conducted for eight months (May-December 2020), and 40 abalone were collected every month. Water temperature was at its highest at a range of 13.5℃-26.6℃ and dissolved oxygen levels were at their lowest at a range of 4.0-10.2 ㎍/L in August. The shell height (mm) of abalone grew to 117.7% (81.8 ± 1.9 mm) in cultures where epibionts were removed (ER) and 111% (77.4 ± 3.3 mm) where they were not (non-epibionts, NER). Their total weight (TW) and body weight increased significantly and steadily with ER, whereas the TW increased sharply after August with NER. In the condition index, no significant difference was observed between ER and NER. The monthly proportion of epibionts increased significantly in July, accounting for the value of 69.9% reached in December.

Korean Entity Recognition System using Bi-directional LSTM-CNN-CRF (Bi-directional LSTM-CNN-CRF를 이용한 한국어 개체명 인식 시스템)

  • Lee, Dong-Yub;Lim, Heui-Seok
    • Annual Conference on Human and Language Technology
    • /
    • 2017.10a
    • /
    • pp.327-329
    • /
    • 2017
  • 개체명 인식(Named Entity Recognition) 시스템은 문서에서 인명(PS), 지명(LC), 단체명(OG)과 같은 개체명을 가지는 단어나 어구를 해당 개체명으로 인식하는 시스템이다. 개체명 인식 시스템을 개발하기 위해 딥러닝 기반의 워드 임베딩(word embedding) 자질과 문장의 형태적 특징 및 기구축 사전(lexicon) 기반의 자질 구성 방법을 제안하고, bi-directional LSTM, CNN, CRF과 같은 모델을 이용하여 구성된 자질을 학습하는 방법을 제안한다. 실험 데이터는 2017 국어 정보시스템 경진대회에서 제공한 2016klpNER 데이터를 이용하였다. 실험은 전체 4258 문장 중 학습 데이터 3406 문장, 검증 데이터 426 문장, 테스트 데이터 426 문장으로 데이터를 나누어 실험을 진행하였다. 실험 결과 본 연구에서 제안하는 모델은 BIO 태깅 방식의 개체 청크 단위 성능 평가 결과 98.9%의 테스트 정확도(test accuracy)와 89.4%의 f1-score를 나타냈다.

  • PDF