• 제목/요약/키워드: Term extraction

검색결과 336건 처리시간 0.025초

전문용어의 정의문 분석 (An analysis of terminological definitions)

  • 이해윤
    • 한국독어학회지:독어학
    • /
    • 제7집
    • /
    • pp.145-163
    • /
    • 2003
  • In this paper, we examined various definitions of terminological definition for the extraction of terminological information from corpora. After we reviewed researches at the lexicography and at the terminology, we introduced the qualia structure of Generative Lexicon (Pustejovsky 1995) for the purpose of analyzing terminological definitions. By means of the qualia structure, we analyzed the definitions which are presented at the terminological dictionaries. As a result, we confirmed that the terminological definitions can be discomposed into 4 subtypes of qualia structure. Based on this examination, we analyzed terminological definitions of articles at a newspaper and showed the usefulness of the qualia structure at the extraction of terminological definitions from the corpora.

  • PDF

Tobacco Retail License Recognition Based on Dual Attention Mechanism

  • Shan, Yuxiang;Ren, Qin;Wang, Cheng;Wang, Xiuhui
    • Journal of Information Processing Systems
    • /
    • 제18권4호
    • /
    • pp.480-488
    • /
    • 2022
  • Images of tobacco retail licenses have complex unstructured characteristics, which is an urgent technical problem in the robot process automation of tobacco marketing. In this paper, a novel recognition approach using a double attention mechanism is presented to realize the automatic recognition and information extraction from such images. First, we utilized a DenseNet network to extract the license information from the input tobacco retail license data. Second, bi-directional long short-term memory was used for coding and decoding using a continuous decoder integrating dual attention to realize the recognition and information extraction of tobacco retail license images without segmentation. Finally, several performance experiments were conducted using a largescale dataset of tobacco retail licenses. The experimental results show that the proposed approach achieves a correction accuracy of 98.36% on the ZY-LQ dataset, outperforming most existing methods.

단어의 공기정보를 이용한 클러스터 기반 다중문서 요약 (Multi-document Summarization Based on Cluster using Term Co-occurrence)

  • 이일주;김민구
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제33권2호
    • /
    • pp.243-251
    • /
    • 2006
  • 대표문장 추출에 의한 다중문서 요약에서는 비슷한 정보가 여러 문서에서 반복적으로 나타나는 정보의 중복문제에 대해 문장의 유사성과 차이점을 고려하여 이를 해결할 수 있는 효율적인 방법이 필요하다. 본 논문에서는 단어의 공기정보에 의한 관련단어 클러스터링 기법을 이용하여 문장의 중복성을 제거하고 중요문장을 추출하는 다중문서 요약을 제안한다. 관련단어 클러스터링 기법에서는 각 단어들은 서로 독립적으로 존재하는 것이 아니라 서로 간에 의미적으로 연관되어 있다고 보며 주제별 문장클러스터단위의 단어 연관성(cohesion)을 이용한다. 평가용 실험문서인 DUC(Document Understanding Conferences) 데이타를 이용하여 실험한 결과 본 논문에서 제안한 문장클러스터단위의 단어 공기정보를 이용한 방법이 단순 통계정보와 문서단위 단어 공기정보, 문장단위 단어 공기정보에 의한 다중문서 요약에 비해 좋은 결과를 보였다.

공격 메일 식별을 위한 비정형 데이터를 사용한 유전자 알고리즘 기반의 특징선택 알고리즘 (Feature-selection algorithm based on genetic algorithms using unstructured data for attack mail identification)

  • 홍성삼;김동욱;한명묵
    • 인터넷정보학회논문지
    • /
    • 제20권1호
    • /
    • pp.1-10
    • /
    • 2019
  • 빅 데이터에서 텍스트 마이닝은 많은 수의 데이터로부터 많은 특징 추출하기 때문에, 클러스터링 및 분류 과정의 계산 복잡도가 높고 분석결과의 신뢰성이 낮아질 수 있다. 특히 텍스트마이닝 과정을 통해 얻는 Term document matrix는 term과 문서간의 특징들을 표현하고 있지만, 희소행렬 형태를 보이게 된다. 본 논문에서는 탐지모델을 위해 텍스트마이닝에서 개선된 GA(Genetic Algorithm)을 이용한 특징 추출 방법을 설계하였다. TF-IDF는 특징 추출에서 문서와 용어간의 관계를 반영하는데 사용된다. 반복과정을 통해 사전에 미리 결정된 만큼의 특징을 선택한다. 또한 탐지모델의 성능 향상을 위해 sparsity score(희소성 점수)를 사용하였다. 스팸메일 세트의 희소성이 높으면 탐지모델의 성능이 낮아져 최적화된 탐지 모델을 찾기가 어렵다. 우리는 fitness function에서 s(F)를 사용하여 희소성이 낮고 TF-IDF 점수가 높은 탐지모델을 찾았다. 또한 제안된 알고리즘을 텍스트 분류 실험에 적용하여 성능을 검증하였다. 결과적으로, 제안한 알고리즘은 공격 메일 분류에서 좋은 성능(속도와 정확도)을 보여주었다.

지리정보시스템을 이용한 장기유출모형의 개발(II) -전.후처리 시스템 개발- (Development of a Cell-based Long-term Hydrologic Model Using Geographic Information System(II) - Pre and Post Processor Development -)

  • 최진용;정하우;김대식
    • 한국농공학회지
    • /
    • 제39권2호
    • /
    • pp.103-112
    • /
    • 1997
  • A CELTHYM(CEll-based Long-term HYdrologic Model), a pre-processor and a post-processor that can he integrated with geographic information system(GIS) were developed to predict the stream flow of a small agricultural watershed. Three kinds of routines, that are watershed boundary extraction routine(WBER), curve number calculation routine(CNR) and maximum available soil moisture calculation routine(MASR) composed pre-processor that was nicely interfaced with CELTRYM and GIS. Two kinds of routines, grapher and map composer composed post-processor that was well adapted CELTHYM output to chart making and GIS map making. The developed pre-post processor was useful for the GIS integration and spatial comprehension of the CELTHYM output.

  • PDF

재첩을 이용한 음료 가공 (Processing of Corbicula elatior Beverage)

  • 강동수;최옥수
    • 생명과학회지
    • /
    • 제11권2호
    • /
    • pp.138-143
    • /
    • 2001
  • Marsh calm(Corbicular elatior)with a short-term storage in raw and a law-rate of utilization has been increasing the needs to develop new marsh calm processing products for a temporary mass treatment and long-term distribution, Therefore the processing conditions of marsh calm beverage using proteolytic enzyme hydrolysis were investigated. A partial hydrolysis at 6$0^{\circ}C$ for 1 hour after adding 3% Alcalase as more effective than a hot water extraction to develop taste compounds from the marsh calm. The result of ommission test showed that nucleotides and their related compounds were contributed in the taste of the marsh calm hydrolysates rather than free amino acids. The taste of the hydrolysates was produced by association with these compounds rather than only one compound s the hydrolystes taste differently for the control when one of these compound was omitted. The hydrolysates were fractionated to molecular weight below 500 dalton to eliminate bitter taste and to improve it flavor from the hydrolysates, 0.05% bay leaf was more effective to improve the odor than other herbs.

  • PDF

장기 코로나 처리에 따른 RTV 실리콘 절연재료의 특성변화 (Effect of Long-term Corona-discharge on RTV Silicone Rubber)

  • 연복희;안종식;허창수
    • 한국전기전자재료학회:학술대회논문집
    • /
    • 한국전기전자재료학회 2001년도 추계학술대회 논문집 Vol.14 No.1
    • /
    • pp.266-269
    • /
    • 2001
  • This paper investigated the aging characteristic under long-term corona discharge on roan temperature vulcanized silicone rubber, which has been using as a protective coating material for solving the contaminant problem. The applied electrical field is 10kV/cm ac and corona discharge treatment was applied on RTV silicone rubber sheet for maximum 250 hours. With the duration of corona discharge. the diffusible low molecular weight species increased, which was determined the usage of n-hexane extraction method. In addition, the contaminant layer was formed on the treated surface, and then measured the contact angle. We investigated the relation of contact angle and diffusible low molecular weight species. It is found that scissor of main chain PDMS and side chains $(CH_3)$ and the generation of LMW species were occurred by a corona discharge. The improvement of hydrophobicity rate is thought due to the increase of diffusible LMW species.

  • PDF

Discrete Wavelet Transform for Watermarking Three-Dimensional Triangular Meshes from a Kinect Sensor

  • Wibowo, Suryo Adhi;Kim, Eun Kyeong;Kim, Sungshin
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제14권4호
    • /
    • pp.249-255
    • /
    • 2014
  • We present a simple method to watermark three-dimensional (3D) triangular meshes that have been generated from the depth data of the Kinect sensor. In contrast to previous methods, which maintain the shape of 3D triangular meshes and decide the embedding place, requiring calculations of vertices and their neighbors, our method is based on selecting one of the coordinate axes. To maintain shape, we use discrete wavelet transform and constant regularization. We know that the watermarking system needs the information to be embedded; we used a text to provide that information. We used geometry attacks such as rotation, scales, and translation, to test the performance of this watermarking system. Performance parameters in this paper include the vertices error rate (VER) and bit error rate (BER). The results from the VER and BER indicate that using a correction term before the extraction process makes our system robust to geometry attacks.

장기 코로나 처리에 따른 RTV 실리콘 절연재료의 특성변화 (Effect of Long-term Corona-discharge on RTV Silicone Rubber)

  • 연복희;안종식;허창수
    • 한국전기전자재료학회:학술대회논문집
    • /
    • 한국전기전자재료학회 2001년도 추계학술대회 논문집
    • /
    • pp.266-269
    • /
    • 2001
  • This paper investigated the aging characteristic under long-term corona discharge on loon temperature vulcanized silicone rubber, which has been using as a protective coating material for solving the contaminant problem. The applied electrical field is 10kV/cm ac and corona discharge treatment was applied on RTV silicone rubber sheet for maximum 250 hours. With the duration of corona discharge, the diffusible low molecular weight species increased, which was determined the usage of n-hexane extraction method. In addition, the contaminant layer was formed on the treated surface, and then measured the contact angle. We investigated the relation of contact angle and diffusible low molecular weight species. It is found that scissor of main chain PDMS and side chains (CH$_3$) and the generation of LMW species were occurred by a corona discharge. The improvement of hydrophobicity rate is thought due to the increase of diffusible LMW species.

  • PDF

DG-based SPO tuple recognition using self-attention M-Bi-LSTM

  • Jung, Joon-young
    • ETRI Journal
    • /
    • 제44권3호
    • /
    • pp.438-449
    • /
    • 2022
  • This study proposes a dependency grammar-based self-attention multilayered bidirectional long short-term memory (DG-M-Bi-LSTM) model for subject-predicate-object (SPO) tuple recognition from natural language (NL) sentences. To add recent knowledge to the knowledge base autonomously, it is essential to extract knowledge from numerous NL data. Therefore, this study proposes a high-accuracy SPO tuple recognition model that requires a small amount of learning data to extract knowledge from NL sentences. The accuracy of SPO tuple recognition using DG-M-Bi-LSTM is compared with that using NL-based self-attention multilayered bidirectional LSTM, DG-based bidirectional encoder representations from transformers (BERT), and NL-based BERT to evaluate its effectiveness. The DG-M-Bi-LSTM model achieves the best results in terms of recognition accuracy for extracting SPO tuples from NL sentences even if it has fewer deep neural network (DNN) parameters than BERT. In particular, its accuracy is better than that of BERT when the learning data are limited. Additionally, its pretrained DNN parameters can be applied to other domains because it learns the structural relations in NL sentences.