• Title/Summary/Keyword: Semantic Similarity

Search Result 281, Processing Time 0.027 seconds

Improving The Performance of Triple Generation Based on Distant Supervision By Using Semantic Similarity (의미 유사도를 활용한 Distant Supervision 기반의 트리플 생성 성능 향상)

  • Yoon, Hee-Geun;Choi, Su Jeong;Park, Seong-Bae
    • Journal of KIISE
    • /
    • v.43 no.6
    • /
    • pp.653-661
    • /
    • 2016
  • The existing pattern-based triple generation systems based on distant supervision could be flawed by assumption of distant supervision. For resolving flaw from an excessive assumption, statistics information has been commonly used for measuring confidence of patterns in previous studies. In this study, we proposed a more accurate confidence measure based on semantic similarity between patterns and properties. Unsupervised learning method, word embedding and WordNet-based similarity measures were adopted for learning meaning of words and measuring semantic similarity. For resolving language discordance between patterns and properties, we adopted CCA for aligning bilingual word embedding models and a translation-based approach for a WordNet-based measure. The results of our experiments indicated that the accuracy of triples that are filtered by the semantic similarity-based confidence measure was 16% higher than that of the statistics-based approach. These results suggested that semantic similarity-based confidence measure is more effective than statistics-based approach for generating high quality triples.

Do Words in Central Bank Press Releases Affect Thailand's Financial Markets?

  • CHATCHAWAN, Sapphasak
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.8 no.4
    • /
    • pp.113-124
    • /
    • 2021
  • The study investigates how financial markets respond to a shock to tone and semantic similarity of the Bank of Thailand press releases. The techniques in natural language processing are employed to quantify the tone and the semantic similarity of 69 press releases from 2010 to 2018. The corpus of the press releases is accessible to the general public. Stock market returns and bond yields are measured by logged return on SET50 and short-term and long-term government bonds, respectively. Data are daily from January 4, 2010, to August 8, 2019. The study uses the Structural Vector Auto Regressive model (SVAR) to analyze the effects of unanticipated and temporary shocks to the tone and the semantic similarity on bond yields and stock market returns. Impulse response functions are also constructed for the analysis. The results show that 1-month, 3-month, 6-month and 1-year bond yields significantly increase in response to a positive shock to the tone of press releases and 1-month, 3-month, 6-month, 1-year and 25-year bond yields significantly increase in response to a positive shock to the semantic similarity. Interestingly, stock market returns obtained from the SET50 index insignificantly respond to the shocks from the tone and the semantic similarity of the press releases.

Semantic Conceptual Relational Similarity Based Web Document Clustering for Efficient Information Retrieval Using Semantic Ontology

  • Selvalakshmi, B;Subramaniam, M;Sathiyasekar, K
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.9
    • /
    • pp.3102-3119
    • /
    • 2021
  • In the modern rapid growing web era, the scope of web publication is about accessing the web resources. Due to the increased size of web, the search engines face many challenges, in indexing the web pages as well as producing result to the user query. Methodologies discussed in literatures towards clustering web documents suffer in producing higher clustering accuracy. Problem is mitigated using, the proposed scheme, Semantic Conceptual Relational Similarity (SCRS) based clustering algorithm which, considers the relationship of any document in two ways, to measure the similarity. One is with the number of semantic relations of any document class covered by the input document and the second is the number of conceptual relation the input document covers towards any document class. With a given data set Ds, the method estimates the SCRS measure for each document Di towards available class of documents. As a result, a class with maximum SCRS is identified and the document is indexed on the selected class. The SCRS measure is measured according to the semantic relevancy of input document towards each document of any class. Similarly, the input query has been measured for Query Relational Semantic Score (QRSS) towards each class of documents. Based on the value of QRSS measure, the document class is identified, retrieved and ranked based on the QRSS measure to produce final population. In both the way, the semantic measures are estimated based on the concepts available in semantic ontology. The proposed method had risen efficient result in indexing as well as search efficiency also has been improved.

Similarity measure for P2P processing of semantic data (시맨틱웹 데이터의 P2P 처리를 위한 유사도 측정)

  • Kim, Byung Gon;Kim, Youn Hee
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.6 no.4
    • /
    • pp.11-20
    • /
    • 2010
  • Ontology is important role in semantic web to construct and query semantic data. Because of dynamic characteristic of ontology, P2P environment is considered for ontology processing in web environment. For efficient processing of ontology in P2P environment, clustering of peers should be considered. When new peer is added to the network, cluster allocation problem of the new peer is important for system efficiency. For clustering of peers with similar chateristics, similarlity measure method of ontology in added peer with ontologies in other clusters is needed. In this paper, we propose similarity measure techniques of ontologies for clustering of peers. Similarity measure method in this paper considered ontology's strucural characteristics like schema, class, property. Results of experiments show that ontologies of similar topics, class, property can be allocated to the same cluster.

Research on Comparing System with Syntactic-Semantic Tree in Subjective-type Grading (주관식 문제 채점에서의 구문의미트리 비교 시스템에 대한 연구)

  • Kang, WonSeog
    • The Journal of Korean Association of Computer Education
    • /
    • v.20 no.5
    • /
    • pp.79-88
    • /
    • 2017
  • To upgrade the subjective question grading, we need the syntactic-semantic analysis to analyze syntatic-semantic relation between words in answering. However, since the syntactic-semantic tree has structural and semantic relation between words, we can not apply the method calculating the similarity between vectors. This paper suggests the comparing system with syntactic-semantic tree which has structural and semantic relation between words. In this thesis, we suggest similarity calculation principles for comparing the trees and verify the principles through experiments. This system will help the subjective question grading by comparing the trees and be utilized in distinguishing similar documents.

Semantic Trajectory Based Behavior Generation for Groups Identification

  • Cao, Yang;Cai, Zhi;Xue, Fei;Li, Tong;Ding, Zhiming
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.12
    • /
    • pp.5782-5799
    • /
    • 2018
  • With the development of GPS and the popularity of mobile devices with positioning capability, collecting massive amounts of trajectory data is feasible and easy. The daily trajectories of moving objects convey a concise overview of their behaviors. Different social roles have different trajectory patterns. Therefore, we can identify users or groups based on similar trajectory patterns by mining implicit life patterns. However, most existing daily trajectories mining studies mainly focus on the spatial and temporal analysis of raw trajectory data but missing the essential semantic information or behaviors. In this paper, we propose a novel trajectory semantics calculation method to identify groups that have similar behaviors. In our model, we first propose a fast and efficient approach for stay regions extraction from daily trajectories, then generate semantic trajectories by enriching the stay regions with semantic labels. To measure the similarity between semantic trajectories, we design a semantic similarity measure model based on spatial and temporal similarity factor. Furthermore, a pruning strategy is proposed to lighten tedious calculations and comparisons. We have conducted extensive experiments on real trajectory dataset of Geolife project, and the experimental results show our proposed method is both effective and efficient.

Semantic Process Retrieval with Similarity Algorithms (유사도 알고리즘을 활용한 시맨틱 프로세스 검색방안)

  • Lee, Hong-Joo;Klein, Mark
    • Asia pacific journal of information systems
    • /
    • v.18 no.1
    • /
    • pp.79-96
    • /
    • 2008
  • One of the roles of the Semantic Web services is to execute dynamic intra-organizational services including the integration and interoperation of business processes. Since different organizations design their processes differently, the retrieval of similar semantic business processes is necessary in order to support inter-organizational collaborations. Most approaches for finding services that have certain features and support certain business processes have relied on some type of logical reasoning and exact matching. This paper presents our approach of using imprecise matching for expanding results from an exact matching engine to query the OWL(Web Ontology Language) MIT Process Handbook. MIT Process Handbook is an electronic repository of best-practice business processes. The Handbook is intended to help people: (1) redesigning organizational processes, (2) inventing new processes, and (3) sharing ideas about organizational practices. In order to use the MIT Process Handbook for process retrieval experiments, we had to export it into an OWL-based format. We model the Process Handbook meta-model in OWL and export the processes in the Handbook as instances of the meta-model. Next, we need to find a sizable number of queries and their corresponding correct answers in the Process Handbook. Many previous studies devised artificial dataset composed of randomly generated numbers without real meaning and used subjective ratings for correct answers and similarity values between processes. To generate a semantic-preserving test data set, we create 20 variants for each target process that are syntactically different but semantically equivalent using mutation operators. These variants represent the correct answers of the target process. We devise diverse similarity algorithms based on values of process attributes and structures of business processes. We use simple similarity algorithms for text retrieval such as TF-IDF and Levenshtein edit distance to devise our approaches, and utilize tree edit distance measure because semantic processes are appeared to have a graph structure. Also, we design similarity algorithms considering similarity of process structure such as part process, goal, and exception. Since we can identify relationships between semantic process and its subcomponents, this information can be utilized for calculating similarities between processes. Dice's coefficient and Jaccard similarity measures are utilized to calculate portion of overlaps between processes in diverse ways. We perform retrieval experiments to compare the performance of the devised similarity algorithms. We measure the retrieval performance in terms of precision, recall and F measure? the harmonic mean of precision and recall. The tree edit distance shows the poorest performance in terms of all measures. TF-IDF and the method incorporating TF-IDF measure and Levenshtein edit distance show better performances than other devised methods. These two measures are focused on similarity between name and descriptions of process. In addition, we calculate rank correlation coefficient, Kendall's tau b, between the number of process mutations and ranking of similarity values among the mutation sets. In this experiment, similarity measures based on process structure, such as Dice's, Jaccard, and derivatives of these measures, show greater coefficient than measures based on values of process attributes. However, the Lev-TFIDF-JaccardAll measure considering process structure and attributes' values together shows reasonably better performances in these two experiments. For retrieving semantic process, we can think that it's better to consider diverse aspects of process similarity such as process structure and values of process attributes. We generate semantic process data and its dataset for retrieval experiment from MIT Process Handbook repository. We suggest imprecise query algorithms that expand retrieval results from exact matching engine such as SPARQL, and compare the retrieval performances of the similarity algorithms. For the limitations and future work, we need to perform experiments with other dataset from other domain. And, since there are many similarity values from diverse measures, we may find better ways to identify relevant processes by applying these values simultaneously.

An Artificial Intelligence Approach for Word Semantic Similarity Measure of Hindi Language

  • Younas, Farah;Nadir, Jumana;Usman, Muhammad;Khan, Muhammad Attique;Khan, Sajid Ali;Kadry, Seifedine;Nam, Yunyoung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.6
    • /
    • pp.2049-2068
    • /
    • 2021
  • AI combined with NLP techniques has promoted the use of Virtual Assistants and have made people rely on them for many diverse uses. Conversational Agents are the most promising technique that assists computer users through their operation. An important challenge in developing Conversational Agents globally is transferring the groundbreaking expertise obtained in English to other languages. AI is making it possible to transfer this learning. There is a dire need to develop systems that understand secular languages. One such difficult language is Hindi, which is the fourth most spoken language in the world. Semantic similarity is an important part of Natural Language Processing, which involves applications such as ontology learning and information extraction, for developing conversational agents. Most of the research is concentrated on English and other European languages. This paper presents a Corpus-based word semantic similarity measure for Hindi. An experiment involving the translation of the English benchmark dataset to Hindi is performed, investigating the incorporation of the corpus, with human and machine similarity ratings. A significant correlation to the human intuition and the algorithm ratings has been calculated for analyzing the accuracy of the proposed similarity measures. The method can be adapted in various applications of word semantic similarity or module for any other language.

Semantic Similarity Calculation based on Siamese TRAT (트랜스포머 인코더와 시암넷 결합한 시맨틱 유사도 알고리즘)

  • Lu, Xing-Cen;Joe, Inwhee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.397-400
    • /
    • 2021
  • To solve the problem that existing computing methods cannot adequately represent the semantic features of sentences, Siamese TRAT, a semantic feature extraction model based on Transformer encoder is proposed. The transformer model is used to fully extract the semantic information within sentences and carry out deep semantic coding for sentences. In addition, the interactive attention mechanism is introduced to extract the similar features of the association between two sentences, which makes the model better at capturing the important semantic information inside the sentence. As a result, it improves the semantic understanding and generalization ability of the model. The experimental results show that the proposed model can improve the accuracy significantly for the semantic similarity calculation task of English and Chinese, and is more effective than the existing methods.

A Semantic Aspect-Based Vector Space Model to Identify the Event Evolution Relationship within Topics

  • Xi, Yaoyi;Li, Bicheng;Liu, Yang
    • Journal of Computing Science and Engineering
    • /
    • v.9 no.2
    • /
    • pp.73-82
    • /
    • 2015
  • Understanding how the topic evolves is an important and challenging task. A topic usually consists of multiple related events, and the accurate identification of event evolution relationship plays an important role in topic evolution analysis. Existing research has used the traditional vector space model to represent the event, which cannot be used to accurately compute the semantic similarity between events. This has led to poor performance in identifying event evolution relationship. This paper suggests constructing a semantic aspect-based vector space model to represent the event: First, use hierarchical Dirichlet process to mine the semantic aspects. Then, construct a semantic aspect-based vector space model according to these aspects. Finally, represent each event as a point and measure the semantic relatedness between events in the space. According to our evaluation experiments, the performance of our proposed technique is promising and significantly outperforms the baseline methods.