• Title/Summary/Keyword: Boolean

Search Result 513, Processing Time 0.021 seconds

A Study on the Retrieval Effectiveness of KoreaMed using MeSH Search Filter and Word-Proximity Search (검색용 MeSH 필터와 단어인접탐색 기법을 활용한 KoreaMed 검색 효율성 향상 연구)

  • Jeong, So-Na;Jeong, Ji-Na
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.5
    • /
    • pp.596-607
    • /
    • 2017
  • This study examined the method for adding related to "stomach neoplasms" as filters to the Medical Subject Headings (MeSH) for search as well as a method for improving the search efficiency through a word-proximity search by measuring the distance of co-occurring terms. A total of 8,625 articles published between 2007 and 2016 with the major topic terms "stomach neoplasms" were downloaded from PubMed article titles. The vocabulary to be added to the MeSH for search were analyzed. The search efficiency was verified by 277 articles that had "Stomach Neoplasms" indexed as MEDLINE MeSH in KoreaMed. As a result, 973 terms were selected as the candidate vocabulary. "Gastric Cancer" (2,780 appearances) was the most frequent term and 7,376 compound words (88.51%) combined the histological terms of "stomach" and "neoplasm", such as "gastric adenocarcinoma" and "gastric MALT lymphoma". A total of 5,234 compounds words (70.95%), in which the co-occurring distance was two words, were found. The matching rate through the MEDLINE MeSH and KoreaMed MeSH Indexer was 209 articles (75.5%). The search efficiency improved to 263 articles (94.9%) when the search filters were added, and to 268 articles (96.7%) when the 13 word-proximity search technique of the co-occurring terms was applied. This study showed that the use of a thesaurus as a means of improving the search efficiency in a natural language search could maintain the advantages of controlled vocabulary. The search accuracy can be improved using the word-proximity search instead of a Boolean search.

A Study on the Intelligent Service Selection Reasoning for Enhanced User Satisfaction : Appliance to Cloud Computing Service (사용자 만족도 향상을 위한 지능형 서비스 선정 방안에 관한 연구 : 클라우드 컴퓨팅 서비스에의 적용)

  • Shin, Dong Cheon
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.35-51
    • /
    • 2012
  • Cloud computing is internet-based computing where computing resources are offered over the Internet as scalable and on-demand services. In particular, in case a number of various cloud services emerge in accordance with development of internet and mobile technology, to select and provide services with which service users satisfy is one of the important issues. Most of previous works show the limitation in the degree of user satisfaction because they are based on so called concept similarity in relation to user requirements or are lack of versatility of user preferences. This paper presents cloud service selection reasoning which can be applied to the general cloud service environments including a variety of computing resource services, not limited to web services. In relation to the service environments, there are two kinds of services: atomic service and composite service. An atomic service consists of service attributes which represent the characteristics of service such as functionality, performance, or specification. A composite service can be created by composition of atomic services and other composite services. Therefore, a composite service inherits attributes of component services. On the other hand, the main participants in providing with cloud services are service users, service suppliers, and service operators. Service suppliers can register services autonomously or in accordance with the strategic collaboration with service operators. Service users submit request queries including service name and requirements to the service management system. The service management system consists of a query processor for processing user queries, a registration manager for service registration, and a selection engine for service selection reasoning. In order to enhance the degree of user satisfaction, our reasoning stands on basis of the degree of conformance to user requirements of service attributes in terms of functionality, performance, and specification of service attributes, instead of concept similarity as in ontology-based reasoning. For this we introduce so called a service attribute graph (SAG) which is generated by considering the inclusion relationship among instances of a service attribute from several perspectives like functionality, performance, and specification. Hence, SAG is a directed graph which shows the inclusion relationships among attribute instances. Since the degree of conformance is very close to the inclusion relationship, we can say the acceptability of services depends on the closeness of inclusion relationship among corresponding attribute instances. That is, the high closeness implies the high acceptability because the degree of closeness reflects the degree of conformance among attributes instances. The degree of closeness is proportional to the path length between two vertex in SAG. The shorter path length means more close inclusion relationship than longer path length, which implies the higher degree of conformance. In addition to acceptability, in this paper, other user preferences such as priority for attributes and mandatary options are reflected for the variety of user requirements. Furthermore, to consider various types of attribute like character, number, and boolean also helps to support the variety of user requirements. Finally, according to service value to price cloud services are rated and recommended to users. One of the significances of this paper is the first try to present a graph-based selection reasoning unlike other works, while considering various user preferences in relation with service attributes.

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

  • Kim, Museong;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.175-197
    • /
    • 2021
  • Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.