• 제목/요약/키워드: search word

Search Result 381, Processing Time 0.025 seconds

Resolving Ambiquity in search query by using the WordNet (워드넷을 이용한 검색 질의어의 모호성 해결)

  • 김형일;김준태
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.10b
    • /
    • pp.75-77
    • /
    • 2000
  • 방대한 웹에서의 자신이 원하는 정보를 정확히 얻어내기란 매우 어렵다. 현존하는 대부분의 검색엔진들은 내용기반 방식을 이용하므로, 검색 질의어의 모호성에 적절한 대응을 하지 못하고 있다. 다시 말하면 일반 사용자들이 사용하는 질의어들은 다의어로 표현되는 것이 빈번히 나타나지만, 사용자가 나타내고 싶어하는 질의어의 정확한 의미에 대하여서는 검색엔진 자체로써는 해결할 수 없다. 특히, 빈번히 사용되지 않는 어휘의 의미를 가지고 검색엔진에 질의를 할 경우, 질의어의 형태만 같고 일반적으로 널리 사용되고 있는 어휘의 의미와 관련 있는 웹 페이지들만을 사용자에게 보여주게 된다. 이러한 점을 보완하기 위하여 본 논문에서는 사용자의 명시적 반응을 받아들이는 사용자 인터페이스와 워드넷(WordNet)을 이용하여 질의어의 모호성 해결하였다.

  • PDF

Development of a System to Detect the Risk Factors of Trade based on Network Search Technology (네트워크 탐색 기술을 기반으로 한 무역 거래 위험 요소 적발 시스템 개발)

  • Seo, Dongmin;Kim, Jaesoo;Song, Jeong a;Park, Moon il
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2018.05a
    • /
    • pp.11-12
    • /
    • 2018
  • 빅데이터 분석에 활용되는 원천 데이터는 네트워크 형태이며, 최근 소셜 네트워크 분석을 통한 효과적인 상품 광고, 핵심 유전자 발굴, 신약 재창출 등 다양한 영역에서 네트워크 분석 기술이 사회와 인류에게 가치 있는 정보를 제공할 수 있는 가능성을 제시하면서 네트워크 분석 기술의 중요성이 부각되고 있다. 또한, 세계화와 정보통신기술의 급격한 발전으로 빠르게 변화하는 무역 환경 속에서 신속하고 정확한 무역 거래에 대한 안전 관리의 요구가 점차 증가하고 있다. 그래서 본 논문에서는 네트워크 탐색 기술을 기반으로 한 무역 거래 위험 요소 적발 기술을 제시했다.

  • PDF

Trend Analysis of Research Topics in Ecological Research

  • Suntae Kim
    • Proceedings of the National Institute of Ecology of the Republic of Korea
    • /
    • v.4 no.1
    • /
    • pp.43-48
    • /
    • 2023
  • This study analyzed research trends in the field of ecological research. Data were collected based on a keyword search of the SCI, SSCI, and A&HCI databases from January 2002 to September 2022. The seven keywords, including biodiversity, ecology, ecotourism, species, climate change, ecosystem, restoration, wildlife, were recommended by ecological research experts. Word clouds were created for each of the searched keywords, and topic map analysis was performed. Topic map analysis using biodiversity, climate change, ecology, ecosystem, and restoration each generated 10 topics; topic maps analysis using the ecotourism keyword generated 5 topics; and topic map analysis using the wildlife keyword generated 4 topics. Each topic contained six keywords.

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.

Binary Visual Word Generation Techniques for A Fast Image Search (고속 이미지 검색을 위한 2진 시각 단어 생성 기법)

  • Lee, Suwon
    • Journal of KIISE
    • /
    • v.44 no.12
    • /
    • pp.1313-1318
    • /
    • 2017
  • Aggregating local features in a single vector is a fundamental problem in an image search. In this process, the image search process can be speeded up if binary features which are extracted almost two order of magnitude faster than gradient-based features are utilized. However, in order to utilize the binary features in an image search, it is necessary to study the techniques for clustering binary features to generate binary visual words. This investigation is necessary because traditional clustering techniques for gradient-based features are not compatible with binary features. To this end, this paper studies the techniques for clustering binary features for the purpose of generating binary visual words. Through experiments, we analyze the trade-off between the accuracy and computational efficiency of an image search using binary features, and we then compare the proposed techniques. This research is expected to be applied to mobile applications, real-time applications, and web scale applications that require a fast image search.

Personalized Search Technique using Users' Personal Profiles (사용자 개인 프로파일을 이용한 개인화 검색 기법)

  • Yoon, Sung-Hee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.3
    • /
    • pp.587-594
    • /
    • 2019
  • This paper proposes a personalized web search technique that produces ranked results reflecting user's query intents and individual interests. The performance of personalized search relies on an effective users' profiling strategy to accurately capture their interests and preferences. User profile is a data set of words and customized weights based on recent user queries and the topic words of web documents from their click history. Personal profile is used to expand a user query to the personalized query before the web search. To determine the exact meaning of ambiguous queries and topic words, this strategy uses WordNet to calculate semantic similarities to words in the user personal profile. Experimental results with query expansion and re-ranking modules installed on general search systems shows enhanced performance with this personalized search technique in terms of precision and recall.

Correlation Analysis between Key Word Search Frequencies Related to Food Safety Issue and Foodborne Illness Outbreaks (식중독 사고 발생과 식품 안전 관련 검색어 빈도와의 상관성 분석 연구)

  • Lee, Heeyoung;Jo, Heekoung;Kim, Kyungmi;Youn, Hyewon;Yoon, Yohan
    • Journal of Food Hygiene and Safety
    • /
    • v.32 no.2
    • /
    • pp.96-100
    • /
    • 2017
  • Through the increasing use of internet and smart device, consumers can search the information what they want to find. The information has been accumulated and become into a big data. Analyzing the big data regarding key words associated with foods and foodborne pathogens could be a method for predicting foodborne illness outbreaks, especially in school food services. Therefore, the objective of this study was to elucidate the correlations between key words associated with foods and food safety issues. Frequencies of the key words for foodborne pathogens and food safety issues were searched using an internet portal site from January 1, 2012 to December 31, 2014. In addition, foodborne outbreak data were collected from Ministry of Food and Drug Safety for the same period of time. There was correlation between the time having maximum key word frequencies of foods and foodborne pathogens, and the time for foodborne illness outbreak occurred. In addition, the search frequencies for foods and foodborne pathogens were generally increased right after foodborne outbreaks occurred. However, in some cases foodborne outbreaks occurred after the search frequencies for certain seasonal foods increased These results could be useful in food safety management for reducing foodborne illness and in food safety communication.

Optimizing the Chien Search Machine without using Divider (나눗셈회로가 필요없는 치엔머신의 최적설계)

  • An, Hyeong-Keon
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.49 no.5
    • /
    • pp.15-20
    • /
    • 2012
  • In this paper, we show new method to find the error locations of received Reed-Solomon code word. New design is much faster and has much simpler logic circuit than the former design method. This optimization was possible by very simplified square/$X^4$ calculating circuit, parallel processing and not using the very complex Divider. The Reed Solomon decoder using this new Chien Machine can be applicated for data protection of almost all digital communication and consumer electronic devices.

A Study on the Implementation of Connected-Digit Recognition System and Changes of its Performance (연결 숫자음 인식 시스템의 구현과 성능 변화)

  • Yun Young-Sun;Park Yoon-Sang;Chae Yi-Geun
    • MALSORI
    • /
    • no.45
    • /
    • pp.47-61
    • /
    • 2003
  • In this paper, we consider the implementation of connected digit recognition system and the several approaches to improve its performance. To implement efficiently the fixed or variable length digit recognition system, finite state network (FSN) is required. We merge the word network algorithm that implements the FSN with one pass dynamic programming search algorithm that is used for general speech recognition system for fast search. To find the efficient modeling of digit recognition system, we perform some experiments along the various conditions to affect the performance and summarize the results.

  • PDF

Literal expression of nausea in medical classics written until Tang dynasty (당대 이전의 오심 증상 표현)

  • Ko, Bok-Young;Chang, Jae-Soon;Kim, Ki-Wang
    • Journal of Korean Medical classics
    • /
    • v.26 no.1
    • /
    • pp.79-83
    • /
    • 2013
  • Objective : Osim((惡心) stands for nausea which usually precede vomiting(嘔吐). Although it is very common symptom, we can't find the word Osim in some ancient classics. So we tried to find when it had appeared, and what had been its substitute in former medical classics. Material and Methods : The digitalized text in Zhonghuayidian(中華醫典) was used for text search. The text search was performed chronologically. Results : We found that there had been yokto(欲吐), yokgu(欲嘔), geongu(乾嘔), beon(煩), beonsim (煩心), simbeon(心煩), min(悶), ongi(溫氣) as the precedent expression of osim(惡心), which had appeared in Jebyungwonhuron(諸病源候論, 610) for the first time. Conclusion : Until Tang dynasty, there had been kinds of alternative expressions correspond to osim(nausea).