• 제목/요약/키워드: Semantic Similarity

검색결과 280건 처리시간 0.035초

의미간의 유사도 연구의 패러다임 변화의 필요성-인지 의미론적 관점에서의 고찰 (The Need for Paradigm Shift in Semantic Similarity and Semantic Relatedness : From Cognitive Semantics Perspective)

  • 최영석;박진수
    • 지능정보연구
    • /
    • 제19권1호
    • /
    • pp.111-123
    • /
    • 2013
  • 개념간의 의미적 유사도 및 관계도(Semantic Similarity/Relatedness)를 구하는 연구는 고전적인 연구에서는 데이터 베이스 통합이나 시스템 통합, 그리고 현대의 연구에 있어서는 태그 및 키워드 추출, 연관 단어 추천 등에 걸쳐 다양한 분야에서 활용되어 온 연구이다. 그 연구는 역사가 오래되었을 뿐만 아니라, 경영정보와 컴퓨터 공학, 계산 언어학에 걸쳐 여러 분야에서도 많은 관심을 가져왔던 연구 분야라고 할 수 있다. 그러나, 지금까지의 개념간의 관계도 계산 방식은 미리 만들어진 사전이나 참조할 수 있는 다른 시맨틱 네트워크(Semantic Network)를 이용하여 계산하는 방법이 주를 이루었다. 이러한 접근 방법의 경우, 개념간의 의미적 관계가 변화에 대한 가능성을 고려하지 않는 것이 일반적이다. 하지만, 정보 기술의 발달과 빠른 사회변화는 개념간의 의미관계 등에 변화를 가져오고 있는 것이 현실이다. 사회적으로 일어나는 사건이나, 문화적 변화 등이 개념간의 의미관계를 변화시키는 것을 물론이며, 이러한 변화가 정보 통신 기술의 도움으로 빠르게 공유되고 있다. 이렇게 개념간의 의미 관계가 시간이나 맥락에 따라 빠르게 변화할 수 있는 가능성이 있음에도 불구하고, 기존의 개념간 의미적 유사도 및 관계도에 대한 연구들은 이러한 '의미관계의 변화'에 대한 새로운 문제에 대해 해답을 제시하지 못한 것이 사실이다. 따라서, 본 연구에서는 개념간의 유사도 연구에 있어 지금까지 있어왔던 '정적인 의미간 관계도 패러다임'에서 '동적인 의미간 관계도 패러다임'으로의 전환의 필요성과 그 당위성을 인지 의미론적(Cognitive Semantics)의 관점에서 역설하고자 한다. 인간이 인지하는 개념간의 의미관계가 변화할 수 있는 이론적 근거를 인지 의미론에서 찾아봄으로써, 패러다임 변화의 방향을 구체적으로 제시하였다. 또한 이러한 패러다임의 변화에 맞추어 개념간의 의미적 유사도 및 관계도에 대한 연구가 어떠한 방향으로 나아가야 할지 구체적인 연구 방향을 제시함으로써 관련 연구자들에게 새로운 연구의 가이드라인을 제시하였다.

Korean Semantic Similarity Measures for the Vector Space Models

  • Lee, Young-In;Lee, Hyun-jung;Koo, Myoung-Wan;Cho, Sook Whan
    • 말소리와 음성과학
    • /
    • 제7권4호
    • /
    • pp.49-55
    • /
    • 2015
  • It is argued in this paper that, in determining semantic similarity, Korean words should be recategorized with a focus on the semantic relation to ontology in light of cross-linguistic morphological variations. It is proposed, in particular, that Korean semantic similarity should be measured on three tracks, human judgements track, relatedness track, and cross-part-of-speech relations track. As demonstrated in Yang et al. (2015), GloVe, the unsupervised learning machine on semantic similarity, is applicable to Korean with its performance being compared with human judgement results. Based on this compatability, it was further thought that the model's performance might most likely vary with different kinds of specific relations in different languages. An attempt was made to analyze them in terms of two major Korean-specific categories involved in their lexical and cross-POS-relations. It is concluded that languages must be analyzed by varying methods so that semantic components across languages may allow varying semantic distance in the vector space models.

A Text Similarity Measurement Method Based on Singular Value Decomposition and Semantic Relevance

  • Li, Xu;Yao, Chunlong;Fan, Fenglong;Yu, Xiaoqiang
    • Journal of Information Processing Systems
    • /
    • 제13권4호
    • /
    • pp.863-875
    • /
    • 2017
  • The traditional text similarity measurement methods based on word frequency vector ignore the semantic relationships between words, which has become the obstacle to text similarity calculation, together with the high-dimensionality and sparsity of document vector. To address the problems, the improved singular value decomposition is used to reduce dimensionality and remove noises of the text representation model. The optimal number of singular values is analyzed and the semantic relevance between words can be calculated in constructed semantic space. An inverted index construction algorithm and the similarity definitions between vectors are proposed to calculate the similarity between two documents on the semantic level. The experimental results on benchmark corpus demonstrate that the proposed method promotes the evaluation metrics of F-measure.

A Method of Service Refinement for Network-Centric Operational Environment

  • Lee, Haejin;Kang, Dongsu
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권12호
    • /
    • pp.97-105
    • /
    • 2016
  • Network-Centric Operational Environment(NCOE) service becomes critical in today's military environment network because reusability of service and interaction are being increasingly important as well in business process. However, the refinement of service by semantic similarity and functional similarity at the business process was not detailed yet. In order to enhance accuracy of refining of business service, in this study, the authors introduce a method for refining service by semantic similarity and functional similarity in BPMN model. The business process are designed in a BPMN model. In this model, candidated services are refined through binding related activities by the analysis result of semantic similarity based on word-net and functional similarity based on properties specification between activities. Then, the services are identified through refining the candidated service. The proposed method is expected to enhance the service identification with accuracy and modularity. It also can accelerate more standardized service refinement developments by the proposed method.

잠재의미분석을 활용한 성격검사문항의 의미표상과 요인구조의 비교 (A Comparison between Factor Structure and Semantic Representation of Personality Test Items Using Latent Semantic Analysis)

  • 박성준;박희영;김청택
    • 인지과학
    • /
    • 제30권3호
    • /
    • pp.133-156
    • /
    • 2019
  • 본 연구는 수검자가 검사 문항을 어떻게 이해했는지를 조사하기 위해 검사문항의 의미표상을 탐구하였다. 잠재의미분석을 활용하여 성격검사문항과 성격요인의 의미표상 간 유사도를 나타내는 의미유사도 행렬을 제안하였고, 이를 기존의 탐색적 요인분석 결과와 비교하였다. 이를 위해 예비 연구에서 대학생 154명을 대상으로 제한된 맥락에서 성격의 5요인을 각각 묘사하는 지문을 수집하였고, 이를 바탕으로 5차원의 축소하여 의미공간을 구성하였다. 연구 1에서는 간편형 한국어 BFI의 요인부하량 행렬과, 예비 연구에서 구성한 의미공간에서 생성한 의미유사도 행렬을 비교하여, 두 행렬이 높은 정적 상관이 있음을 보여주었다. 연구 2에서는 의미유사도를 기반으로 성격검사문항을 생성하고, 수검자의 반응을 수집하여 탐색적 요인분석을 통해 요인구조를 도출하여 두 행렬이 유사함을 보였다. 결론적으로 본 연구는 성격검사에 대한 수검자의 반응 없이 검사문항의 의미표상을 분석하여 구성타당도를 추론할 수 있는 방법을 제안하였고, 성격검사의 요인구조를 검사문항과 성격요인의 의미표상 간 유사도로 해석할 수 있음을 보여주었다. 이러한 결과는 성격검사 개발에 실용적인 도움을 줄 수 있을 것이다.

Development of the Recommender System of Arabic Books Based on the Content Similarity

  • Alotaibi, Shaykhah Hajed;Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • 제22권8호
    • /
    • pp.175-186
    • /
    • 2022
  • This research article develops an Arabic books' recommendation system, which is based on the content similarity that assists users to search for the right book and predict the appropriate and suitable books pertaining to their literary style. In fact, the system directs its users toward books, which can meet their needs from a large dataset of Information. Further, this system makes its predictions based on a set of data that is gathered from different books and converts it to vectors by using the TF-IDF system. After that, the recommendation algorithms such as the cosine similarity, the sequence matcher similarity, and the semantic similarity aggregate data to produce an efficient and effective recommendation. This approach is advantageous in recommending previously unrated books to users with unique interests. It is found to be proven from the obtained results that the results of the cosine similarity of the full content of books, the results of the sequence matcher similarity of Arabic titles of the books, and the results of the semantic similarity of English titles of the books are the best obtained results, and extremely close to the average of the result related to the human assigned/annotated similarity. Flask web application is developed with a simple interface to show the recommended Arabic books by using cosine similarity, sequence matcher similarity, and semantic similarity algorithms with all experiments that are conducted.

Assessment of performance of machine learning based similarities calculated for different English translations of Holy Quran

  • Al Ghamdi, Norah Mohammad;Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • 제22권4호
    • /
    • pp.111-118
    • /
    • 2022
  • This research article presents the work that is related to the application of different machine learning based similarity techniques on religious text for identifying similarities and differences among its various translations. The dataset includes 10 different English translations of verses (Arabic: Ayah) of two Surahs (chapters) namely, Al-Humazah and An-Nasr. The quantitative similarity values for different translations for the same verse were calculated by using the cosine similarity and semantic similarity. The corpus went through two series of experiments: before pre-processing and after pre-processing. In order to determine the performance of machine learning based similarities, human annotated similarities between translations of two Surahs (chapters) namely Al-Humazah and An-Nasr were recorded to construct the ground truth. The average difference between the human annotated similarity and the cosine similarity for Surah (chapter) Al-Humazah was found to be 1.38 per verse (ayah) per pair of translation. After pre-processing, the average difference increased to 2.24. Moreover, the average difference between human annotated similarity and semantic similarity for Surah (chapter) Al-Humazah was found to be 0.09 per verse (Ayah) per pair of translation. After pre-processing, it increased to 0.78. For the Surah (chapter) An-Nasr, before preprocessing, the average difference between human annotated similarity and cosine similarity was found to be 1.93 per verse (Ayah), per pair of translation. And. After pre-processing, the average difference further increased to 2.47. The average difference between the human annotated similarity and the semantic similarity for Surah An-Nasr before preprocessing was found to be 0.93 and after pre-processing, it was reduced to 0.87 per verse (ayah) per pair of translation. The results showed that as expected, the semantic similarity was proven to be better measurement indicator for calculation of the word meaning.

온톨로지 트리기반 멀티에이전트 세만틱 유사도매칭 알고리즘 (A Multi-Agent Improved Semantic Similarity Matching Algorithm Based on Ontology Tree)

  • ;조영임
    • 제어로봇시스템학회논문지
    • /
    • 제18권11호
    • /
    • pp.1027-1033
    • /
    • 2012
  • Semantic-based information retrieval techniques understand the meanings of the concepts that users specify in their queries, but the traditional semantic matching methods based on the ontology tree have three weaknesses which may lead to many false matches, causing the falling precision. In order to improve the matching precision and the recall of the information retrieval, this paper proposes a multi-agent improved semantic similarity matching algorithm based on the ontology tree, which can avoid the considerable computation redundancies and mismatching during the entire matching process. The results of the experiments performed on our algorithm show improvements in precision and recall compared with the information retrieval techniques based on the traditional semantic similarity matching methods.

KNN-based Image Annotation by Collectively Mining Visual and Semantic Similarities

  • Ji, Qian;Zhang, Liyan;Li, Zechao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제11권9호
    • /
    • pp.4476-4490
    • /
    • 2017
  • The aim of image annotation is to determine labels that can accurately describe the semantic information of images. Many approaches have been proposed to automate the image annotation task while achieving good performance. However, in most cases, the semantic similarities of images are ignored. Towards this end, we propose a novel Visual-Semantic Nearest Neighbor (VS-KNN) method by collectively exploring visual and semantic similarities for image annotation. First, for each label, visual nearest neighbors of a given test image are constructed from training images associated with this label. Second, each neighboring subset is determined by mining the semantic similarity and the visual similarity. Finally, the relevance between the images and labels is determined based on maximum a posteriori estimation. Extensive experiments were conducted using three widely used image datasets. The experimental results show the effectiveness of the proposed method in comparison with state-of-the-arts methods.

GORank: Gene Ontology를 이용한 유전자 산물의 의미적 유사성 검색 (GORank: Semantic Similarity Search for Gene Products using Gene Ontology)

  • 김기성;유상원;김형주
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제33권7호
    • /
    • pp.682-692
    • /
    • 2006
  • 유사한 생물학적 특성을 가진 유전자 산물을 검색하는 것은 생물정보학 연구에 필수적인 기술이다. 현재 대부분의 생물학 데이타베이스에서 Gene Ontology의 용어를 사용하여 유전자 산물의 생물학적 특성을 기술하고 있다. 본 논문에서는 이런 유전자 산물의 주석 정보를 사용해 의미적으로 유사한 유전자 산물을 검색하는 방법을 제안한다. 이를 위해 우선 정보 이론에 기반한 유전자 산물간의 의미적 유사도를 정의하였다. 그리고 이 유사도를 이용한 의미적 유사성 검색 알고리즘을 제안하였다. 의미적 유사성 검색을 처리하기 위해 Fagin의 문턱값 알고리즘(threshold algorithm)을 다음과 같이 변형한 기법을 사용하였다. 우선 사용하는 유사도 함수가 단조 증가 성질을 갖지 않기 때문에 유사도 함수에 맞는 문턱값을 재정의 하였다. 또 역색인 리스트의 구조를 사용하여 중간 검색을 생략할 수 있는 클러스터 스키핑 기법과 역색인 리스트 액세스 순서를 제안하였다. 실제 GO와 주석 정보를 이용하여 성능 평가를 했으며 제안한 알고리즘은 효율적인 알고리즘임을 보였다.