• Title/Summary/Keyword: Term Extraction

Search Result 336, Processing Time 0.249 seconds

An analysis of terminological definitions (전문용어의 정의문 분석)

  • Lee Hae-Yun
    • Koreanishche Zeitschrift fur Deutsche Sprachwissenschaft
    • /
    • v.7
    • /
    • pp.145-163
    • /
    • 2003
  • In this paper, we examined various definitions of terminological definition for the extraction of terminological information from corpora. After we reviewed researches at the lexicography and at the terminology, we introduced the qualia structure of Generative Lexicon (Pustejovsky 1995) for the purpose of analyzing terminological definitions. By means of the qualia structure, we analyzed the definitions which are presented at the terminological dictionaries. As a result, we confirmed that the terminological definitions can be discomposed into 4 subtypes of qualia structure. Based on this examination, we analyzed terminological definitions of articles at a newspaper and showed the usefulness of the qualia structure at the extraction of terminological definitions from the corpora.

  • PDF

Tobacco Retail License Recognition Based on Dual Attention Mechanism

  • Shan, Yuxiang;Ren, Qin;Wang, Cheng;Wang, Xiuhui
    • Journal of Information Processing Systems
    • /
    • v.18 no.4
    • /
    • pp.480-488
    • /
    • 2022
  • Images of tobacco retail licenses have complex unstructured characteristics, which is an urgent technical problem in the robot process automation of tobacco marketing. In this paper, a novel recognition approach using a double attention mechanism is presented to realize the automatic recognition and information extraction from such images. First, we utilized a DenseNet network to extract the license information from the input tobacco retail license data. Second, bi-directional long short-term memory was used for coding and decoding using a continuous decoder integrating dual attention to realize the recognition and information extraction of tobacco retail license images without segmentation. Finally, several performance experiments were conducted using a largescale dataset of tobacco retail licenses. The experimental results show that the proposed approach achieves a correction accuracy of 98.36% on the ZY-LQ dataset, outperforming most existing methods.

Multi-document Summarization Based on Cluster using Term Co-occurrence (단어의 공기정보를 이용한 클러스터 기반 다중문서 요약)

  • Lee, Il-Joo;Kim, Min-Koo
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.2
    • /
    • pp.243-251
    • /
    • 2006
  • In multi-document summarization by means of salient sentence extraction, it is important to remove redundant information. In the removal process, the similarities and differences of sentences are considered. In this paper, we propose a method for multi-document summarization which extracts salient sentences without having redundant sentences by way of cohesive term clustering method that utilizes co-occurrence Information. In the cohesive term clustering method, we assume that each term does not exist independently, but rather it is related to each other in meanings. To find the relations between terms, we cluster sentences according to topics and use the co-occurrence information oi terms in the same topic. We conduct experimental tests with the DUC(Document Understanding Conferences) data. In the tests, our method shows better performance of summarization than other summarization methods which use term co-occurrence information based on term cohesion of document or sentence unit, and simple statistical information.

Feature-selection algorithm based on genetic algorithms using unstructured data for attack mail identification (공격 메일 식별을 위한 비정형 데이터를 사용한 유전자 알고리즘 기반의 특징선택 알고리즘)

  • Hong, Sung-Sam;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.1-10
    • /
    • 2019
  • Since big-data text mining extracts many features and data, clustering and classification can result in high computational complexity and low reliability of the analysis results. In particular, a term document matrix obtained through text mining represents term-document features, but produces a sparse matrix. We designed an advanced genetic algorithm (GA) to extract features in text mining for detection model. Term frequency inverse document frequency (TF-IDF) is used to reflect the document-term relationships in feature extraction. Through a repetitive process, a predetermined number of features are selected. And, we used the sparsity score to improve the performance of detection model. If a spam mail data set has the high sparsity, detection model have low performance and is difficult to search the optimization detection model. In addition, we find a low sparsity model that have also high TF-IDF score by using s(F) where the numerator in fitness function. We also verified its performance by applying the proposed algorithm to text classification. As a result, we have found that our algorithm shows higher performance (speed and accuracy) in attack mail classification.

Development of a Cell-based Long-term Hydrologic Model Using Geographic Information System(II) - Pre and Post Processor Development - (지리정보시스템을 이용한 장기유출모형의 개발(II) -전.후처리 시스템 개발-)

  • 최진용;정하우;김대식
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.39 no.2
    • /
    • pp.103-112
    • /
    • 1997
  • A CELTHYM(CEll-based Long-term HYdrologic Model), a pre-processor and a post-processor that can he integrated with geographic information system(GIS) were developed to predict the stream flow of a small agricultural watershed. Three kinds of routines, that are watershed boundary extraction routine(WBER), curve number calculation routine(CNR) and maximum available soil moisture calculation routine(MASR) composed pre-processor that was nicely interfaced with CELTRYM and GIS. Two kinds of routines, grapher and map composer composed post-processor that was well adapted CELTHYM output to chart making and GIS map making. The developed pre-post processor was useful for the GIS integration and spatial comprehension of the CELTHYM output.

  • PDF

Processing of Corbicula elatior Beverage (재첩을 이용한 음료 가공)

  • 강동수;최옥수
    • Journal of Life Science
    • /
    • v.11 no.2
    • /
    • pp.138-143
    • /
    • 2001
  • Marsh calm(Corbicular elatior)with a short-term storage in raw and a law-rate of utilization has been increasing the needs to develop new marsh calm processing products for a temporary mass treatment and long-term distribution, Therefore the processing conditions of marsh calm beverage using proteolytic enzyme hydrolysis were investigated. A partial hydrolysis at 6$0^{\circ}C$ for 1 hour after adding 3% Alcalase as more effective than a hot water extraction to develop taste compounds from the marsh calm. The result of ommission test showed that nucleotides and their related compounds were contributed in the taste of the marsh calm hydrolysates rather than free amino acids. The taste of the hydrolysates was produced by association with these compounds rather than only one compound s the hydrolystes taste differently for the control when one of these compound was omitted. The hydrolysates were fractionated to molecular weight below 500 dalton to eliminate bitter taste and to improve it flavor from the hydrolysates, 0.05% bay leaf was more effective to improve the odor than other herbs.

  • PDF

Effect of Long-term Corona-discharge on RTV Silicone Rubber (장기 코로나 처리에 따른 RTV 실리콘 절연재료의 특성변화)

  • Youn, Bok-Hee;Ahn, Jong-Sik;Huh, Chang-Su
    • Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
    • /
    • 2001.11b
    • /
    • pp.266-269
    • /
    • 2001
  • This paper investigated the aging characteristic under long-term corona discharge on roan temperature vulcanized silicone rubber, which has been using as a protective coating material for solving the contaminant problem. The applied electrical field is 10kV/cm ac and corona discharge treatment was applied on RTV silicone rubber sheet for maximum 250 hours. With the duration of corona discharge. the diffusible low molecular weight species increased, which was determined the usage of n-hexane extraction method. In addition, the contaminant layer was formed on the treated surface, and then measured the contact angle. We investigated the relation of contact angle and diffusible low molecular weight species. It is found that scissor of main chain PDMS and side chains $(CH_3)$ and the generation of LMW species were occurred by a corona discharge. The improvement of hydrophobicity rate is thought due to the increase of diffusible LMW species.

  • PDF

Discrete Wavelet Transform for Watermarking Three-Dimensional Triangular Meshes from a Kinect Sensor

  • Wibowo, Suryo Adhi;Kim, Eun Kyeong;Kim, Sungshin
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.14 no.4
    • /
    • pp.249-255
    • /
    • 2014
  • We present a simple method to watermark three-dimensional (3D) triangular meshes that have been generated from the depth data of the Kinect sensor. In contrast to previous methods, which maintain the shape of 3D triangular meshes and decide the embedding place, requiring calculations of vertices and their neighbors, our method is based on selecting one of the coordinate axes. To maintain shape, we use discrete wavelet transform and constant regularization. We know that the watermarking system needs the information to be embedded; we used a text to provide that information. We used geometry attacks such as rotation, scales, and translation, to test the performance of this watermarking system. Performance parameters in this paper include the vertices error rate (VER) and bit error rate (BER). The results from the VER and BER indicate that using a correction term before the extraction process makes our system robust to geometry attacks.

Effect of Long-term Corona-discharge on RTV Silicone Rubber (장기 코로나 처리에 따른 RTV 실리콘 절연재료의 특성변화)

  • 연복희;안종식;허창수
    • Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
    • /
    • 2001.11a
    • /
    • pp.266-269
    • /
    • 2001
  • This paper investigated the aging characteristic under long-term corona discharge on loon temperature vulcanized silicone rubber, which has been using as a protective coating material for solving the contaminant problem. The applied electrical field is 10kV/cm ac and corona discharge treatment was applied on RTV silicone rubber sheet for maximum 250 hours. With the duration of corona discharge, the diffusible low molecular weight species increased, which was determined the usage of n-hexane extraction method. In addition, the contaminant layer was formed on the treated surface, and then measured the contact angle. We investigated the relation of contact angle and diffusible low molecular weight species. It is found that scissor of main chain PDMS and side chains (CH$_3$) and the generation of LMW species were occurred by a corona discharge. The improvement of hydrophobicity rate is thought due to the increase of diffusible LMW species.

  • PDF

DG-based SPO tuple recognition using self-attention M-Bi-LSTM

  • Jung, Joon-young
    • ETRI Journal
    • /
    • v.44 no.3
    • /
    • pp.438-449
    • /
    • 2022
  • This study proposes a dependency grammar-based self-attention multilayered bidirectional long short-term memory (DG-M-Bi-LSTM) model for subject-predicate-object (SPO) tuple recognition from natural language (NL) sentences. To add recent knowledge to the knowledge base autonomously, it is essential to extract knowledge from numerous NL data. Therefore, this study proposes a high-accuracy SPO tuple recognition model that requires a small amount of learning data to extract knowledge from NL sentences. The accuracy of SPO tuple recognition using DG-M-Bi-LSTM is compared with that using NL-based self-attention multilayered bidirectional LSTM, DG-based bidirectional encoder representations from transformers (BERT), and NL-based BERT to evaluate its effectiveness. The DG-M-Bi-LSTM model achieves the best results in terms of recognition accuracy for extracting SPO tuples from NL sentences even if it has fewer deep neural network (DNN) parameters than BERT. In particular, its accuracy is better than that of BERT when the learning data are limited. Additionally, its pretrained DNN parameters can be applied to other domains because it learns the structural relations in NL sentences.