• Title/Summary/Keyword: Patent Corpus

Search Result 10, Processing Time 0.026 seconds

Named Entity Recognition for Patent Documents Based on Conditional Random Fields (조건부 랜덤 필드를 이용한 특허 문서의 개체명 인식)

  • Lee, Tae Seok;Shin, Su Mi;Kang, Seung Shik
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.9
    • /
    • pp.419-424
    • /
    • 2016
  • Named entity recognition is required to improve the retrieval accuracy of patent documents or similar patents in the claims and patent descriptions. In this paper, we proposed an automatic named entity recognition for patents by using a conditional random field that is one of the best methods in machine learning research. Named entity recognition system has been constructed from the training set of tagged corpus with 660,000 words and 70,000 words are used as a test set for evaluation. The experiment shows that the accuracy is 93.6% and the Kappa coefficient is 0.67 between manual tagging and automatic tagging system. This figure is better than the Kappa coefficient 0.6 for manually tagged results and it shows that automatic named entity tagging system can be used as a practical tagging for patent documents in replacement of a manual tagging.

KorPatELECTRA : A Pre-trained Language Model for Korean Patent Literature to improve performance in the field of natural language processing(Korean Patent ELECTRA)

  • Jang, Ji-Mo;Min, Jae-Ok;Noh, Han-Sung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.15-23
    • /
    • 2022
  • In the field of patents, as NLP(Natural Language Processing) is a challenging task due to the linguistic specificity of patent literature, there is an urgent need to research a language model optimized for Korean patent literature. Recently, in the field of NLP, there have been continuous attempts to establish a pre-trained language model for specific domains to improve performance in various tasks of related fields. Among them, ELECTRA is a pre-trained language model by Google using a new method called RTD(Replaced Token Detection), after BERT, for increasing training efficiency. The purpose of this paper is to propose KorPatELECTRA pre-trained on a large amount of Korean patent literature data. In addition, optimal pre-training was conducted by preprocessing the training corpus according to the characteristics of the patent literature and applying patent vocabulary and tokenizer. In order to confirm the performance, KorPatELECTRA was tested for NER(Named Entity Recognition), MRC(Machine Reading Comprehension), and patent classification tasks using actual patent data, and the most excellent performance was verified in all the three tasks compared to comparative general-purpose language models.

Korean Machine Reading Comprehension for Patent Consultation Using BERT (BERT를 이용한 한국어 특허상담 기계독해)

  • Min, Jae-Ok;Park, Jin-Woo;Jo, Yu-Jeong;Lee, Bong-Gun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.4
    • /
    • pp.145-152
    • /
    • 2020
  • MRC (Machine reading comprehension) is the AI NLP task that predict the answer for user's query by understanding of the relevant document and which can be used in automated consult services such as chatbots. Recently, the BERT (Pre-training of Deep Bidirectional Transformers for Language Understanding) model, which shows high performance in various fields of natural language processing, have two phases. First phase is Pre-training the big data of each domain. And second phase is fine-tuning the model for solving each NLP tasks as a prediction. In this paper, we have made the Patent MRC dataset and shown that how to build the patent consultation training data for MRC task. And we propose the method to improve the performance of the MRC task using the Pre-trained Patent-BERT model by the patent consultation corpus and the language processing algorithm suitable for the machine learning of the patent counseling data. As a result of experiment, we show that the performance of the method proposed in this paper is improved to answer the patent counseling query.

A case of partial trisomy 3p syndrome with rare clinical manifestations

  • Han, Dong-Hoon;Chang, Ji-Young;Lee, Woo-In;Bae, Chong-Woo
    • Clinical and Experimental Pediatrics
    • /
    • v.55 no.3
    • /
    • pp.107-110
    • /
    • 2012
  • Partial trisomy 3p results from either unbalanced translocation or $de$ $novo$ duplication. Common clinical features consist of dysmorphic facial features, congenital heart defects, psychomotor and mental retardation, abnormal muscle tone, and hypoplastic genitalia. In this paper, we report a case of partial trisomy 3p with rare clinical manifestations. A full-term, female newborn was transferred to our clinic. She had cleft lip-plate, dysgenesis of the corpus callosum, patent ductus arteriosus, pulmonary hypertension, and severe right-sided hydronephrosis, associated with ureteropelvic junction obstruction. Cytogenetic investigation revealed partial trisomy 3p; 46,XX,der(4)t(3;4)(p21.1;p16). The karyotype of her father showed a balanced translocation, t(3;4)(p21.1;p16). Therefore, the size of duplication can be an important factor.

Automatic Extraction of Alternative Words using Parallel Corpus (병렬말뭉치를 이용한 대체어 자동 추출 방법)

  • Baik, Jong-Bum;Lee, Soo-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.12
    • /
    • pp.1254-1258
    • /
    • 2010
  • In information retrieval, different surface forms of the same object can cause poor performance of systems. In this paper, we propose the method extracting alternative words using translation words as features of each word extracted from parallel corpus, korean/english title pair of patent information. Also, we propose an association word filtering method to remove association words from an alternative word list. Evaluation results show that the proposed method outperforms other alternative word extraction methods.

A Study on the Performance Analysis of Entity Name Recognition Techniques Using Korean Patent Literature

  • Gim, Jangwon
    • Journal of Advanced Information Technology and Convergence
    • /
    • v.10 no.2
    • /
    • pp.139-151
    • /
    • 2020
  • Entity name recognition is a part of information extraction that extracts entity names from documents and classifies the types of extracted entity names. Entity name recognition technologies are widely used in natural language processing, such as information retrieval, machine translation, and query response systems. Various deep learning-based models exist to improve entity name recognition performance, but studies that compared and analyzed these models on Korean data are insufficient. In this paper, we compare and analyze the performance of CRF, LSTM-CRF, BiLSTM-CRF, and BERT, which are actively used to identify entity names using Korean data. Also, we compare and evaluate whether embedding models, which are variously used in recent natural language processing tasks, can affect the entity name recognition model's performance improvement. As a result of experiments on patent data and Korean corpus, it was confirmed that the BiLSTM-CRF using FastText method showed the highest performance.

Effects of Crataegii fructus on the Contractile Response of Rabbit Corpus Cavernosum (산사(山査)가 토끼 음경해면체의 수축에 미치는 영향)

  • Lee, Han Seok;Park, Sun Young
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.27 no.5
    • /
    • pp.602-610
    • /
    • 2013
  • This study was aimed to evaluate the cavernosal relaxation effect of Crataegii fructus(CF) in the contracted rabbit penile corpus cavernosum by agonists.In order to study the effect of CF on the vasoconstriction of rabbit penile corpus cavernosum, isolated rabbit penile corpus cavernosum tissues were used for the experiment using organ baths containing Krebs solution.To investigate the cavernosal relaxation of CF, CF extract at $0.01{\sim}3.0mg/m{\ell}$ was added after penile corpus cavernosum were contracted by norepinephrine(NE) $1{\mu}M$. To analyze the mechanism of CF's vasorelaxation, CF extract infused into contracted penile tissues by NE after each treatment of indomethacin(IM), $N{\omega}$-nitro-L-arginine(L-NNA), methylene blue(MB), tetraethylammonium chloride(TEA).To study the effect of CF on influx of extracellular calcium chloride($Ca^{2+}$) in penile tissues, in $Ca^{2+}$-free krebs solution, $Ca^{2+}$ 1 mM infused into contracted penile tissues by NE after pretreatment of CF. Cytotoxic activity of CF on human umbilical vein endothelial cell(HUVEC) was measured by MTT assay, and nitric oxide(NO) prodution was measured by Griess reagent. CF relaxed cavernosal strip with endothelium contracted by NE, but in the strips without endothelium, CF-induced relaxation was significantly inhibited. The pretreatment of L-NNA, MB, TEA decreased significantly on the cavernosal relaxation than not-treatment of them. But the pretreatment of IM had no significant effect on the cavernosal relaxation. In $Ca^{2+}$-free krebs solution, when $Ca^{2+}$ infused into contracted penile tissues by NE, pretreatment of CF inhibit contraction induced by adding $Ca^{2+}$.NO production wasn't increased by treatment of CF on HUVEC. This findings showed that CF is effective for the relaxation of rabbit penile corpus cavernosum, and we suggest that CF relax rabbit corpus cavernosal smooth muscle through multiple action mechanisms that include increasing the release of nitric oxide from corporal sinusoidal endothelium, inhibition of $Ca^{2+}$ mobilization into cytosol from the extracellular fluid, and maybe a hyperpolarizing action.

Successful management of absent sternum in an infant using porcine acellular dermal matrix

  • Semlacher, Roy Alfred;Nuri, Muhammand A.K.
    • Archives of Plastic Surgery
    • /
    • v.46 no.5
    • /
    • pp.470-474
    • /
    • 2019
  • Congenital absent sternum is a rare birth defect that requires early intervention for optimal long-term outcomes. Descriptions of the repair of absent sternum are limited to case reports, and no preferred method for management has been described. Herein, we describe the use of porcine acellular dermal matrix to reconstruct the sternum of an infant with sternal infection following attempted repair using synthetic mesh. The patient was a full-term male with trisomy 21, agenesis of corpus callosum, ventricular septal defect, patent ductus arteriosus, right-sided aortic arch, and congenital absence of sternum with no sternal bars. Following removal of the infected synthetic mesh, negative pressure wound therapy with instillation was used to manage the open wound and provide direct antibiotic therapy. When blood C-reactive protein levels declined to ${\leq}2mg/L$, the sternum was reconstructed using porcine acellular dermal matrix. At 21 months postoperative, the patient demonstrated no respiratory issues. Physical examination and computed tomography imaging identified good approximation of the clavicular heads and sternal cleft and forward curvature of the ribs. This case illustrates the benefits of negative pressure wound therapy and acellular dermal matrix for the reconstruction of absent sternum in the context of infected sternal surgical site previously repaired with synthetic mesh.

Clustering-based Statistical Machine Translation Using Syntactic Structure and Word Similarity (문장구조 유사도와 단어 유사도를 이용한 클러스터링 기반의 통계기계번역)

  • Kim, Han-Kyong;Na, Hwi-Dong;Li, Jin-Ji;Lee, Jong-Hyeok
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.4
    • /
    • pp.297-304
    • /
    • 2010
  • Clustering method which based on sentence type or document genre is a technique used to improve translation quality of SMT(statistical machine translation) by domain-specific translation. But there is no previous research using sentence type and document genre information simultaneously. In this paper, we suggest an integrated clustering method that classifying sentence type by syntactic structure similarity and document genre by word similarity information. We interpolated domain-specific models from clusters with general models to improve translation quality of SMT system. Kernel function and cosine measures are applied to calculate structural similarity and word similarity. With these similarities, we used machine learning algorithms similar to K-means to clustering. In Japanese-English patent translation corpus, we got 2.5% point relative improvements of translation quality at optimal case.

Domain Adaptation Method for LHMM-based English Part-of-Speech Tagger (LHMM기반 영어 형태소 품사 태거의 도메인 적응 방법)

  • Kwon, Oh-Woog;Kim, Young-Gil
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.10
    • /
    • pp.1000-1004
    • /
    • 2010
  • A large number of current language processing systems use a part-of-speech tagger for preprocessing. Most language processing systems required a tagger with the highest possible accuracy. Specially, the use of domain-specific advantages has become a hot issue in machine translation community to improve the translation quality. This paper addresses a method for customizing an HMM or LHMM based English tagger from general domain to specific domain. The proposed method is to semi-automatically customize the output and transition probabilities of HMM or LHMM using domain-specific raw corpus. Through the experiments customizing to Patent domain, our LHMM tagger adapted by the proposed method shows the word tagging accuracy of 98.87% and the sentence tagging accuracy of 78.5%. Also, compared with the general tagger, our tagger improved the word tagging accuracy of 2.24% (ERR: 66.4%) and the sentence tagging accuracy of 41.0% (ERR: 65.6%).