• Title/Summary/Keyword: Semantic matching

Search Result 148, Processing Time 0.027 seconds

Question Similarity Measurement of Chinese Crop Diseases and Insect Pests Based on Mixed Information Extraction

  • Zhou, Han;Guo, Xuchao;Liu, Chengqi;Tang, Zhan;Lu, Shuhan;Li, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.11
    • /
    • pp.3991-4010
    • /
    • 2021
  • The Question Similarity Measurement of Chinese Crop Diseases and Insect Pests (QSM-CCD&IP) aims to judge the user's tendency to ask questions regarding input problems. The measurement is the basis of the Agricultural Knowledge Question and Answering (Q & A) system, information retrieval, and other tasks. However, the corpus and measurement methods available in this field have some deficiencies. In addition, error propagation may occur when the word boundary features and local context information are ignored when the general method embeds sentences. Hence, these factors make the task challenging. To solve the above problems and tackle the Question Similarity Measurement task in this work, a corpus on Chinese crop diseases and insect pests(CCDIP), which contains 13 categories, was established. Then, taking the CCDIP as the research object, this study proposes a Chinese agricultural text similarity matching model, namely, the AgrCQS. This model is based on mixed information extraction. Specifically, the hybrid embedding layer can enrich character information and improve the recognition ability of the model on the word boundary. The multi-scale local information can be extracted by multi-core convolutional neural network based on multi-weight (MM-CNN). The self-attention mechanism can enhance the fusion ability of the model on global information. In this research, the performance of the AgrCQS on the CCDIP is verified, and three benchmark datasets, namely, AFQMC, LCQMC, and BQ, are used. The accuracy rates are 93.92%, 74.42%, 86.35%, and 83.05%, respectively, which are higher than that of baseline systems without using any external knowledge. Additionally, the proposed method module can be extracted separately and applied to other models, thus providing reference for related research.

Development of Deep Learning-based Automatic Classification of Architectural Objects in Point Clouds for BIM Application in Renovating Aging Buildings (딥러닝 기반 노후 건축물 리모델링 시 BIM 적용을 위한 포인트 클라우드의 건축 객체 자동 분류 기술 개발)

  • Kim, Tae-Hoon;Gu, Hyeong-Mo;Hong, Soon-Min;Choo, Seoung-Yeon
    • Journal of KIBIM
    • /
    • v.13 no.4
    • /
    • pp.96-105
    • /
    • 2023
  • This study focuses on developing a building object recognition technology for efficient use in the remodeling of buildings constructed without drawings. In the era of the 4th industrial revolution, smart technologies are being developed. This research contributes to the architectural field by introducing a deep learning-based method for automatic object classification and recognition, utilizing point cloud data. We use a TD3D network with voxels, optimizing its performance through adjustments in voxel size and number of blocks. This technology enables the classification of building objects such as walls, floors, and roofs from 3D scanning data, labeling them in polygonal forms to minimize boundary ambiguities. However, challenges in object boundary classifications were observed. The model facilitates the automatic classification of non-building objects, thereby reducing manual effort in data matching processes. It also distinguishes between elements to be demolished or retained during remodeling. The study minimized data set loss space by labeling using the extremities of the x, y, and z coordinates. The research aims to enhance the efficiency of building object classification and improve the quality of architectural plans by reducing manpower and time during remodeling. The study aligns with its goal of developing an efficient classification technology. Future work can extend to creating classified objects using parametric tools with polygon-labeled datasets, offering meaningful numerical analysis for remodeling processes. Continued research in this direction is anticipated to significantly advance the efficiency of building remodeling techniques.

A Method to Solve the Entity Linking Ambiguity and NIL Entity Recognition for efficient Entity Linking based on Wikipedia (위키피디아 기반의 효과적인 개체 링킹을 위한 NIL 개체 인식과 개체 연결 중의성 해소 방법)

  • Lee, Hokyung;An, Jaehyun;Yoon, Jeongmin;Bae, Kyoungman;Ko, Youngjoong
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.813-821
    • /
    • 2017
  • Entity Linking find the meaning of an entity mention, which indicate the entity using different expressions, in a user's query by linking the entity mention and the entity in the knowledge base. This task has four challenges, including the difficult knowledge base construction problem, multiple presentation of the entity mention, ambiguity of entity linking, and NIL entity recognition. In this paper, we first construct the entity name dictionary based on Wikipedia to build a knowledge base and solve the multiple presentation problem. We then propose various methods for NIL entity recognition and solve the ambiguity of entity linking by training the support vector machine based on several features, including the similarity of the context, semantic relevance, clue word score, named entity type similarity of the mansion, entity name matching score, and object popularity score. We sequentially use the proposed two methods based on the constructed knowledge base, to obtain the good performance in the entity linking. In the result of the experiment, our system achieved 83.66% and 90.81% F1 score, which is the performance of the NIL entity recognition to solve the ambiguity of the entity linking.

Automatic Merging of Distributed Topic Maps based on T-MERGE Operator (T-MERGE 연산자에 기반한 분산 토픽맵의 자동 통합)

  • Kim Jung-Min;Shin Hyo-Pil;Kim Hyoung-Joo
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.9
    • /
    • pp.787-801
    • /
    • 2006
  • Ontology merging describes the process of integrating two ontologies into a new ontology. How this is done best is a subject of ongoing research in the Semantic Web, Data Integration, Knowledge Management System, and other ontology-related application systems. Earlier research on ontology merging, however, has studied for developing effective ontology matching approaches but missed analyzing and solving methods of problems of merging two ontologies given correspondences between them. In this paper, we propose a specific ontology merging process and a generic operator, T-MERGE, for integrating two source ontologies into a new ontology. Also, we define a taxonomy of merging conflicts which is derived from differing representations between input ontologies and a method for detecting and resolving them. Our T-MERGE operator encapsulates the process of detection and resolution of conflicts and merging two entities based on given correspondences between them. We define a data structure, MergeLog, for logging the execution of T-MERGE operator. MergeLog is used to inform detailed results of execution of merging to users or recover errors. For our experiments, we used oriental philosophy ontologies, western philosophy ontologies, Yahoo western philosophy dictionary, and Naver philosophy dictionary as input ontologies. Our experiments show that the automatic merging module compared with manual merging by a expert has advantages in terms of time and effort.

Metamorphic Malware Detection using Subgraph Matching (행위 그래프 기반의 변종 악성코드 탐지)

  • Kwon, Jong-Hoon;Lee, Je-Hyun;Jeong, Hyun-Cheol;Lee, Hee-Jo
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.21 no.2
    • /
    • pp.37-47
    • /
    • 2011
  • In the recent years, malicious codes called malware are having shown significant increase due to the code obfuscation to evade detection mechanisms. When the code obfuscation technique is applied to malwares, they can change their instruction sequence and also even their signature. These malwares which have same functionality and different appearance are able to evade signature-based AV products. Thus, AV venders paid large amount of cost to analyze and classify malware for generating the new signature. In this paper, we propose a novel approach for detecting metamorphic malwares. The proposed mechanism first converts malware's API call sequences to call graph through dynamic analysis. After that, the callgraph is converted to semantic signature using 128 abstract nodes. Finally, we extract all subgraphs and analyze how similar two malware's behaviors are through subgraph similarity. To validate proposed mechanism, we use 273 real-world malwares include obfuscated malware and analyze 10,100 comparison results. In the evaluation, all metamorphic malwares are classified correctly, and similar module behaviors among different malwares are also discovered.

An Intelligence Support System Research on KTX Rolling Stock Failure Using Case-based Reasoning and Text Mining (사례기반추론과 텍스트마이닝 기법을 활용한 KTX 차량고장 지능형 조치지원시스템 연구)

  • Lee, Hyung Il;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.47-73
    • /
    • 2020
  • KTX rolling stocks are a system consisting of several machines, electrical devices, and components. The maintenance of the rolling stocks requires considerable expertise and experience of maintenance workers. In the event of a rolling stock failure, the knowledge and experience of the maintainer will result in a difference in the quality of the time and work to solve the problem. So, the resulting availability of the vehicle will vary. Although problem solving is generally based on fault manuals, experienced and skilled professionals can quickly diagnose and take actions by applying personal know-how. Since this knowledge exists in a tacit form, it is difficult to pass it on completely to a successor, and there have been studies that have developed a case-based rolling stock expert system to turn it into a data-driven one. Nonetheless, research on the most commonly used KTX rolling stock on the main-line or the development of a system that extracts text meanings and searches for similar cases is still lacking. Therefore, this study proposes an intelligence supporting system that provides an action guide for emerging failures by using the know-how of these rolling stocks maintenance experts as an example of problem solving. For this purpose, the case base was constructed by collecting the rolling stocks failure data generated from 2015 to 2017, and the integrated dictionary was constructed separately through the case base to include the essential terminology and failure codes in consideration of the specialty of the railway rolling stock sector. Based on a deployed case base, a new failure was retrieved from past cases and the top three most similar failure cases were extracted to propose the actual actions of these cases as a diagnostic guide. In this study, various dimensionality reduction measures were applied to calculate similarity by taking into account the meaningful relationship of failure details in order to compensate for the limitations of the method of searching cases by keyword matching in rolling stock failure expert system studies using case-based reasoning in the precedent case-based expert system studies, and their usefulness was verified through experiments. Among the various dimensionality reduction techniques, similar cases were retrieved by applying three algorithms: Non-negative Matrix Factorization(NMF), Latent Semantic Analysis(LSA), and Doc2Vec to extract the characteristics of the failure and measure the cosine distance between the vectors. The precision, recall, and F-measure methods were used to assess the performance of the proposed actions. To compare the performance of dimensionality reduction techniques, the analysis of variance confirmed that the performance differences of the five algorithms were statistically significant, with a comparison between the algorithm that randomly extracts failure cases with identical failure codes and the algorithm that applies cosine similarity directly based on words. In addition, optimal techniques were derived for practical application by verifying differences in performance depending on the number of dimensions for dimensionality reduction. The analysis showed that the performance of the cosine similarity was higher than that of the dimension using Non-negative Matrix Factorization(NMF) and Latent Semantic Analysis(LSA) and the performance of algorithm using Doc2Vec was the highest. Furthermore, in terms of dimensionality reduction techniques, the larger the number of dimensions at the appropriate level, the better the performance was found. Through this study, we confirmed the usefulness of effective methods of extracting characteristics of data and converting unstructured data when applying case-based reasoning based on which most of the attributes are texted in the special field of KTX rolling stock. Text mining is a trend where studies are being conducted for use in many areas, but studies using such text data are still lacking in an environment where there are a number of specialized terms and limited access to data, such as the one we want to use in this study. In this regard, it is significant that the study first presented an intelligent diagnostic system that suggested action by searching for a case by applying text mining techniques to extract the characteristics of the failure to complement keyword-based case searches. It is expected that this will provide implications as basic study for developing diagnostic systems that can be used immediately on the site.

Evaluation of Web Service Similarity Assessment Methods (웹서비스 유사성 평가 방법들의 실험적 평가)

  • Hwang, You-Sub
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.4
    • /
    • pp.1-22
    • /
    • 2009
  • The World Wide Web is transitioning from being a mere collection of documents that contain useful information toward providing a collection of services that perform useful tasks. The emerging Web service technology has been envisioned as the next technological wave and is expected to play an important role in this recent transformation of the Web. By providing interoperable interface standards for application-to-application communication, Web services can be combined with component based software development to promote application interaction and integration both within and across enterprises. To make Web services for service-oriented computing operational, it is important that Web service repositories not only be well-structured but also provide efficient tools for developers to find reusable Web service components that meet their needs. As the potential of Web services for service-oriented computing is being widely recognized, the demand for effective Web service discovery mechanisms is concomitantly growing. A number of techniques for Web service discovery have been proposed, but the discovery challenge has not been satisfactorily addressed. Unfortunately, most existing solutions are either too rudimentary to be useful or too domain dependent to be generalizable. In this paper, we propose a Web service organizing framework that combines clustering techniques with string matching and leverages the semantics of the XML-based service specification in WSDL documents. We believe that this is one of the first attempts at applying data mining techniques in the Web service discovery domain. Our proposed approach has several appealing features : (1) It minimizes the requirement of prior knowledge from both service consumers and publishers; (2) It avoids exploiting domain dependent ontologies; and (3) It is able to visualize the semantic relationships among Web services. We have developed a prototype system based on the proposed framework using an unsupervised artificial neural network and empirically evaluated the proposed approach and tool using real Web service descriptions drawn from operational Web service registries. We report on some preliminary results demonstrating the efficacy of the proposed approach.

  • PDF

A Study on Knowledge Entity Extraction Method for Individual Stocks Based on Neural Tensor Network (뉴럴 텐서 네트워크 기반 주식 개별종목 지식개체명 추출 방법에 관한 연구)

  • Yang, Yunseok;Lee, Hyun Jun;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.25-38
    • /
    • 2019
  • Selecting high-quality information that meets the interests and needs of users among the overflowing contents is becoming more important as the generation continues. In the flood of information, efforts to reflect the intention of the user in the search result better are being tried, rather than recognizing the information request as a simple string. Also, large IT companies such as Google and Microsoft focus on developing knowledge-based technologies including search engines which provide users with satisfaction and convenience. Especially, the finance is one of the fields expected to have the usefulness and potential of text data analysis because it's constantly generating new information, and the earlier the information is, the more valuable it is. Automatic knowledge extraction can be effective in areas where information flow is vast, such as financial sector, and new information continues to emerge. However, there are several practical difficulties faced by automatic knowledge extraction. First, there are difficulties in making corpus from different fields with same algorithm, and it is difficult to extract good quality triple. Second, it becomes more difficult to produce labeled text data by people if the extent and scope of knowledge increases and patterns are constantly updated. Third, performance evaluation is difficult due to the characteristics of unsupervised learning. Finally, problem definition for automatic knowledge extraction is not easy because of ambiguous conceptual characteristics of knowledge. So, in order to overcome limits described above and improve the semantic performance of stock-related information searching, this study attempts to extract the knowledge entity by using neural tensor network and evaluate the performance of them. Different from other references, the purpose of this study is to extract knowledge entity which is related to individual stock items. Various but relatively simple data processing methods are applied in the presented model to solve the problems of previous researches and to enhance the effectiveness of the model. From these processes, this study has the following three significances. First, A practical and simple automatic knowledge extraction method that can be applied. Second, the possibility of performance evaluation is presented through simple problem definition. Finally, the expressiveness of the knowledge increased by generating input data on a sentence basis without complex morphological analysis. The results of the empirical analysis and objective performance evaluation method are also presented. The empirical study to confirm the usefulness of the presented model, experts' reports about individual 30 stocks which are top 30 items based on frequency of publication from May 30, 2017 to May 21, 2018 are used. the total number of reports are 5,600, and 3,074 reports, which accounts about 55% of the total, is designated as a training set, and other 45% of reports are designated as a testing set. Before constructing the model, all reports of a training set are classified by stocks, and their entities are extracted using named entity recognition tool which is the KKMA. for each stocks, top 100 entities based on appearance frequency are selected, and become vectorized using one-hot encoding. After that, by using neural tensor network, the same number of score functions as stocks are trained. Thus, if a new entity from a testing set appears, we can try to calculate the score by putting it into every single score function, and the stock of the function with the highest score is predicted as the related item with the entity. To evaluate presented models, we confirm prediction power and determining whether the score functions are well constructed by calculating hit ratio for all reports of testing set. As a result of the empirical study, the presented model shows 69.3% hit accuracy for testing set which consists of 2,526 reports. this hit ratio is meaningfully high despite of some constraints for conducting research. Looking at the prediction performance of the model for each stocks, only 3 stocks, which are LG ELECTRONICS, KiaMtr, and Mando, show extremely low performance than average. this result maybe due to the interference effect with other similar items and generation of new knowledge. In this paper, we propose a methodology to find out key entities or their combinations which are necessary to search related information in accordance with the user's investment intention. Graph data is generated by using only the named entity recognition tool and applied to the neural tensor network without learning corpus or word vectors for the field. From the empirical test, we confirm the effectiveness of the presented model as described above. However, there also exist some limits and things to complement. Representatively, the phenomenon that the model performance is especially bad for only some stocks shows the need for further researches. Finally, through the empirical study, we confirmed that the learning method presented in this study can be used for the purpose of matching the new text information semantically with the related stocks.