• Title/Summary/Keyword: 검색어 추출

Search Result 328, Processing Time 0.025 seconds

An Enhanced Method for Unsupervised Word Sense Disambiguation using Korean WordNet (한국어 어휘의미망을 이용한 비감독 어의 중의성 해소 방법의 성능 향상)

  • Kwon, Soonho;Kim, Minho;Kwon, Hyuk-Chul
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.11a
    • /
    • pp.693-696
    • /
    • 2010
  • 자연언어처리에서 어의 중의성 해소(word sense disambiguation)는 어휘의 의미를 정확하게 파악하는 기술로 기계번역, 정보검색과 같은 여러 응용 분야에서 중요한 역할을 한다. 본 논문에서는 한국어 어휘의미망(Korlex)을 이용한 비감독 어의 중의성 해소 방법을 제안한다. 의미미부착 말뭉치에서 추출한 통계 정보와 한국어 어휘의미망의 관계어 정보를 이용함으로써 자료 부족문제를 완화하였다. 또한, 중의성 어휘와 공기어휘 간의 거리 가중치, 의미별 사용 정보 가중치를 사용하여 언어적인 특징을 고려하여 본 논문의 기반이 되는 PNUWSD 시스템보다 성능을 향상하였다. 본 논문에서 제안하는 어의 중의성 해소 방법의 평가를 위해 SENSEVAL-2 한국어 데이터를 이용하였다. 중의성 어휘의 의미별 관계어와 지역 문맥 내 공기어휘 간의 카이제곱을 이용하였을 때 68.1%의 정확도를 보였고, 중의성 어휘와 공기어휘 간의 거리 가중치와 의미별 사용 정보 가중치를 사용하였을 때 76.9% 정확도를 보여 기존의 방법보다 정확도를 향상하였다.

Research to establish a road map for the standardization in military and commercial terminology (민·군규격용어 표준화를 위한 로드맵 구축 연구)

  • Park, jeong-ho;Choi, young-ho;Im, ik-soon;Jang, hyo-jun
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2015.05a
    • /
    • pp.251-252
    • /
    • 2015
  • 본 연구는 국방규격서의 전문어, 오용어, 어문규정 및 순화어 미(未)준수 어휘를 추출, 정의 또는 순화어로 정제하는 맵핑구조를 제시, 민 군규격용어 표준화를 위한 정보업무 로드맵을 구축하여 민간용어와의 호환성 및 일관성을 유지할 수 있는 지원체계를 연구하였다. 대상 규격용어는 KS용어표준 원칙을 기본으로 한 신뢰도 평가와 텍스트 마이닝 (text mining)빈도분석을 이용하여 선정하였으며, 시소러스(thesaurus) 체계를 삽입, 개념기반 서비스의 확장성을 제시하였다. 이를 기반으로 산출된 규격용어 DB는 민간 및 국방 관련분야의 용어표준관리 정보체계에 검색 및 용어설명에 활용될 수 있다.

  • PDF

Efficient Synonym Detection Method through Keyword Extension (키워드 확장을 통한 효율적인 유의어 검출 방법)

  • Ji, Ki Yong;Park, JiSu;Shon, Jin Gon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.10a
    • /
    • pp.767-770
    • /
    • 2018
  • 인공지능의 발달로 사람이 사용하는 자연어 형태의 문장을 통해 정보를 주고받는 질의응답 시스템이 주목받고 있다. 이러한 질의응답 시스템은 자연어로 구성된 사용자의 질의문에서 의도를 정확하게 파악해야 한다. 단순히 질의어의 키워드에 의존한 검색은 단어의 중의성을 고려하지 않아 질의문의 의도를 정확히 파악하는 데 문제가 있다. 이런 문제점을 해결하기 위해 질의문의 의미와 맥락에 따른 연관성을 이용하여 유의어를 확장하는 방법이 연구되고 있다. 본 논문에서는 워드 임베딩을 통해 생성된 단어 유사도를 이용하여 질의문에서 추출된 키워드를 확장하는 방법을 제안한다.

A WordNet-based Open Market Category Search System for Efficient Goods Registration (효율적인 상품등록을 위한 워드넷 기반의 오픈마켓 카테고리 검색 시스템)

  • Hong, Myung-Duk;Kim, Jang-Woo;Jo, Geun-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.9
    • /
    • pp.17-27
    • /
    • 2012
  • Open Market is one of the key factors to accelerate the profit. Usually retailers sell items in several Open Market. One of the challenges for retailers is to assign categories of items with different classification systems. In this research, we propose an item category recommendation method to support appropriate products category registration. Our recommendations are based on semantic relation between existing and any other Open Market categorization. In order to analyze correlations of categories, we use Morpheme analysis, Korean Wiki Dictionary, WordNet and Google Translation API. Our proposed method recommends a category, which is most similar to a guide word by measuring semantic similarity. The experimental results show that, our system improves the system accuracy in term of search category, and retailers can easily select the appropriate categories from our proposed method.

Social Big Data Analysis for Franchise Stores

  • Kim, Hyeon Gyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.8
    • /
    • pp.39-46
    • /
    • 2021
  • When conducting social big data analysis for franchise stores, reviews of multiple branches of a franchise can be collected together, from which analysis results can be distorted significantly. To improve its accuracy, it should be possible to filter reviews of other branches properly which are not subject to the analysis. This paper presents a method for social big data analysis which reflects characteristics of franchise stores. The proposed method consists of search key configuration and review filtering. For the former, the open data provided by Small Business Promotion Agency is used to extract region names for collecting reviews more accurately. For the latter, open search APIs provided by Naver or Kakao are used to obtain franchise branch information for filtering reviews of other branches that are not subject to analysis. To verify performance of the proposed method, experiments were conducted based on real social reviews collected from online, where the results showed that the accuracy of the proposed review filtering was 93.6% on the average.

A study on the Stochastic Model for Sentence Speech Understanding (문장음성 이해를 위한 확률모델에 관한 연구)

  • Roh, Yong-Wan;Hong, Kwang-Seok
    • The KIPS Transactions:PartB
    • /
    • v.10B no.7
    • /
    • pp.829-836
    • /
    • 2003
  • In this paper, we propose a stochastic model for sentence speech understanding using dictionary and thesaurus. The proposed model extracts words from an input speech or text into a sentence. A computer is sellected category of dictionary database compared the word extracting from the input sentence calculating a probability value to the compare results from stochastic model. At this time, computer read out upper dictionary information from the upper dictionary searching and extracting word compared input sentence caluclating value to the compare results from stochastic model. We compare adding the first and second probability value from the dictionary searching and the upper dictionary searching with threshold probability that we measure the sentence understanding rate. We evaluated the performance of the sentence speech understanding system by applying twenty questions game. As the experiment results, we got sentence speech understanding accuracy of 79.8%. In this case, probability ($\alpha$) of high level word is 0.9 and threshold probability ($\beta$) is 0.38.

Extraction Association Rule between Attribute Values Using Hash Table (해시테이블을 이용한 속성값 간의 연관관계 추출)

  • Yang, Jong-Won;Lee, Sang-Hee;Lee, Dong-Joo;Yang, Jung-Yun;Lee, Sang-Goo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11b
    • /
    • pp.220-222
    • /
    • 2005
  • 전자상거래의 발전은 필연적으로 상품 데이터베이스화를 수반하게 되었다. 이 상품 데이터베이스에 존재하는 각 상품들의 속성값들의 연관관계 추출은 검색- 유의어 추출 혹은 클러스터링등에 활용될 수 있다. 본 논문에서는 상품 속성값들의 연관관계 추출을 위하여 해쉬 테이블에 기반한 트리 형태 자료구조을 제안한다. 그리고 이 자료구조를 이용하여 상품 데이터에이스의 각 속성값 간의 연관관계를 threshold를 이용하여 선형 시간에 추출하는 알고리즘을 제시한다. 마지막으로, Support를 이용하여 트리의 탐색 공간을 줄이는 방식으로 최적화를 시키는 기법을 제시한다.

  • PDF

Improvement of a Product Recommendation Model using Customers' Search Patterns and Product Details

  • Lee, Yunju;Lee, Jaejun;Ahn, Hyunchul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.1
    • /
    • pp.265-274
    • /
    • 2021
  • In this paper, we propose a novel recommendation model based on Doc2vec using search keywords and product details. Until now, a lot of prior studies on recommender systems have proposed collaborative filtering (CF) as the main algorithm for recommendation, which uses only structured input data such as customers' purchase history or ratings. However, the use of unstructured data like online customer review in CF may lead to better recommendation. Under this background, we propose to use search keyword data and product detail information, which are seldom used in previous studies, for product recommendation. The proposed model makes recommendation by using CF which simultaneously considers ratings, search keywords and detailed information of the products purchased by customers. To extract quantitative patterns from these unstructured data, Doc2vec is applied. As a result of the experiment, the proposed model was found to outperform the conventional recommendation model. In addition, it was confirmed that search keywords and product details had a significant effect on recommendation. This study has academic significance in that it tries to apply the customers' online behavior information to the recommendation system and that it mitigates the cold start problem, which is one of the critical limitations of CF.

An Automatic LOINC Mapping Framework for Standardization of Laboratory Codes in Medical Informatics (의료 정보 검사코드 표준화를 위한 LOINC 자동 매핑 프레임웍)

  • Ahn, Hoo-Young;Park, Young-Ho
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.8
    • /
    • pp.1172-1181
    • /
    • 2009
  • An electronic medical record (EMR) is the medical system that all the test are recorded as text data. However, domestic EMR systems have various forms of medical records. There are a lot of related works to standardize the laboratory codes as a LOINC (Logical Observation Identifiers Names and Code). However the existing researches resolve the problem manually. The manual process does not work when the size of data is enormous. The paper proposes a novel automatic LOINC mapping algorithm which uses indexing techniques and semantic similarity analysis of medical information. They use file system which is not proper to enormous medical data. We designed and implemented mapping algorithm for standardization laboratory codes in medical informatics compared with the existing researches that are only proposed algorithms. The automatic creation of searching words is being possible. Moreover, the paper implemented medical searching framework based on database system that is considered large size of medical data.

  • PDF

Design and Implementation of MPEG-2 Compressed Video Information Management System (MPEG-2 압축 동영상 정보 관리 시스템의 설계 및 구현)

  • Heo, Jin-Yong;Kim, In-Hong;Bae, Jong-Min;Kang, Hyun-Syug
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.6
    • /
    • pp.1431-1440
    • /
    • 1998
  • Video data are retrieved and stored in various compressed forms according to their characteristics, In this paper, we present a generic data model that captures the structure of a video document and that provides a means for indexing a video stream, Using this model, we design and implement CVIMS (the MPEG-2 Compressed Video Information Management System) to store and retrieve video documents, CVIMS extracts I-frames from MPEG-2 files, selects key-frames from the I -frames, and stores in database the index information such as thumbnails, captions, and picture descriptors of the key-frames, And also, CVIMS retrieves MPEG- 2 video data using the thumbnails of key-frames and v31ious labels of queries.

  • PDF