• Title/Summary/Keyword: retrieving system

Search Result 246, Processing Time 0.026 seconds

Recommending Core and Connecting Keywords of Research Area Using Social Network and Data Mining Techniques (소셜 네트워크와 데이터 마이닝 기법을 활용한 학문 분야 중심 및 융합 키워드 추천 서비스)

  • Cho, In-Dong;Kim, Nam-Gyu
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.127-138
    • /
    • 2011
  • The core service of most research portal sites is providing relevant research papers to various researchers that match their research interests. This kind of service may only be effective and easy to use when a user can provide correct and concrete information about a paper such as the title, authors, and keywords. However, unfortunately, most users of this service are not acquainted with concrete bibliographic information. It implies that most users inevitably experience repeated trial and error attempts of keyword-based search. Especially, retrieving a relevant research paper is more difficult when a user is novice in the research domain and does not know appropriate keywords. In this case, a user should perform iterative searches as follows : i) perform an initial search with an arbitrary keyword, ii) acquire related keywords from the retrieved papers, and iii) perform another search again with the acquired keywords. This usage pattern implies that the level of service quality and user satisfaction of a portal site are strongly affected by the level of keyword management and searching mechanism. To overcome this kind of inefficiency, some leading research portal sites adopt the association rule mining-based keyword recommendation service that is similar to the product recommendation of online shopping malls. However, keyword recommendation only based on association analysis has limitation that it can show only a simple and direct relationship between two keywords. In other words, the association analysis itself is unable to present the complex relationships among many keywords in some adjacent research areas. To overcome this limitation, we propose the hybrid approach for establishing association network among keywords used in research papers. The keyword association network can be established by the following phases : i) a set of keywords specified in a certain paper are regarded as co-purchased items, ii) perform association analysis for the keywords and extract frequent patterns of keywords that satisfy predefined thresholds of confidence, support, and lift, and iii) schematize the frequent keyword patterns as a network to show the core keywords of each research area and connecting keywords among two or more research areas. To estimate the practical application of our approach, we performed a simple experiment with 600 keywords. The keywords are extracted from 131 research papers published in five prominent Korean journals in 2009. In the experiment, we used the SAS Enterprise Miner for association analysis and the R software for social network analysis. As the final outcome, we presented a network diagram and a cluster dendrogram for the keyword association network. We summarized the results in Section 4 of this paper. The main contribution of our proposed approach can be found in the following aspects : i) the keyword network can provide an initial roadmap of a research area to researchers who are novice in the domain, ii) a researcher can grasp the distribution of many keywords neighboring to a certain keyword, and iii) researchers can get some idea for converging different research areas by observing connecting keywords in the keyword association network. Further studies should include the following. First, the current version of our approach does not implement a standard meta-dictionary. For practical use, homonyms, synonyms, and multilingual problems should be resolved with a standard meta-dictionary. Additionally, more clear guidelines for clustering research areas and defining core and connecting keywords should be provided. Finally, intensive experiments not only on Korean research papers but also on international papers should be performed in further studies.

RGB Channel Selection Technique for Efficient Image Segmentation (효율적인 이미지 분할을 위한 RGB 채널 선택 기법)

  • 김현종;박영배
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.10
    • /
    • pp.1332-1344
    • /
    • 2004
  • Upon development of information super-highway and multimedia-related technoiogies in recent years, more efficient technologies to transmit, store and retrieve the multimedia data are required. Among such technologies, firstly, it is common that the semantic-based image retrieval is annotated separately in order to give certain meanings to the image data and the low-level property information that include information about color, texture, and shape Despite the fact that the semantic-based information retrieval has been made by utilizing such vocabulary dictionary as the key words that given, however it brings about a problem that has not yet freed from the limit of the existing keyword-based text information retrieval. The second problem is that it reveals a decreased retrieval performance in the content-based image retrieval system, and is difficult to separate the object from the image that has complex background, and also is difficult to extract an area due to excessive division of those regions. Further, it is difficult to separate the objects from the image that possesses multiple objects in complex scene. To solve the problems, in this paper, I established a content-based retrieval system that can be processed in 5 different steps. The most critical process of those 5 steps is that among RGB images, the one that has the largest and the smallest background are to be extracted. Particularly. I propose the method that extracts the subject as well as the background by using an Image, which has the largest background. Also, to solve the second problem, I propose the method in which multiple objects are separated using RGB channel selection techniques having optimized the excessive division of area by utilizing Watermerge's threshold value with the object separation using the method of RGB channels separation. The tests proved that the methods proposed by me were superior to the existing methods in terms of retrieval performances insomuch as to replace those methods that developed for the purpose of retrieving those complex objects that used to be difficult to retrieve up until now.

Text Mining-Based Emerging Trend Analysis for the Aviation Industry (항공산업 미래유망분야 선정을 위한 텍스트 마이닝 기반의 트렌드 분석)

  • Kim, Hyun-Jung;Jo, Nam-Ok;Shin, Kyung-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.65-82
    • /
    • 2015
  • Recently, there has been a surge of interest in finding core issues and analyzing emerging trends for the future. This represents efforts to devise national strategies and policies based on the selection of promising areas that can create economic and social added value. The existing studies, including those dedicated to the discovery of future promising fields, have mostly been dependent on qualitative research methods such as literature review and expert judgement. Deriving results from large amounts of information under this approach is both costly and time consuming. Efforts have been made to make up for the weaknesses of the conventional qualitative analysis approach designed to select key promising areas through discovery of future core issues and emerging trend analysis in various areas of academic research. There needs to be a paradigm shift in toward implementing qualitative research methods along with quantitative research methods like text mining in a mutually complementary manner. The change is to ensure objective and practical emerging trend analysis results based on large amounts of data. However, even such studies have had shortcoming related to their dependence on simple keywords for analysis, which makes it difficult to derive meaning from data. Besides, no study has been carried out so far to develop core issues and analyze emerging trends in special domains like the aviation industry. The change used to implement recent studies is being witnessed in various areas such as the steel industry, the information and communications technology industry, the construction industry in architectural engineering and so on. This study focused on retrieving aviation-related core issues and emerging trends from overall research papers pertaining to aviation through text mining, which is one of the big data analysis techniques. In this manner, the promising future areas for the air transport industry are selected based on objective data from aviation-related research papers. In order to compensate for the difficulties in grasping the meaning of single words in emerging trend analysis at keyword levels, this study will adopt topic analysis, which is a technique used to find out general themes latent in text document sets. The analysis will lead to the extraction of topics, which represent keyword sets, thereby discovering core issues and conducting emerging trend analysis. Based on the issues, it identified aviation-related research trends and selected the promising areas for the future. Research on core issue retrieval and emerging trend analysis for the aviation industry based on big data analysis is still in its incipient stages. So, the analysis targets for this study are restricted to data from aviation-related research papers. However, it has significance in that it prepared a quantitative analysis model for continuously monitoring the derived core issues and presenting directions regarding the areas with good prospects for the future. In the future, the scope is slated to expand to cover relevant domestic or international news articles and bidding information as well, thus increasing the reliability of analysis results. On the basis of the topic analysis results, core issues for the aviation industry will be determined. Then, emerging trend analysis for the issues will be implemented by year in order to identify the changes they undergo in time series. Through these procedures, this study aims to prepare a system for developing key promising areas for the future aviation industry as well as for ensuring rapid response. Additionally, the promising areas selected based on the aforementioned results and the analysis of pertinent policy research reports will be compared with the areas in which the actual government investments are made. The results from this comparative analysis are expected to make useful reference materials for future policy development and budget establishment.

The Brassica rapa Tissue-specific EST Database (배추의 조직 특이적 발현유전자 데이터베이스)

  • Yu, Hee-Ju;Park, Sin-Gi;Oh, Mi-Jin;Hwang, Hyun-Ju;Kim, Nam-Shin;Chung, Hee;Sohn, Seong-Han;Park, Beom-Seok;Mun, Jeong-Hwan
    • Horticultural Science & Technology
    • /
    • v.29 no.6
    • /
    • pp.633-640
    • /
    • 2011
  • Brassica rapa is an A genome model species for Brassica crop genetics, genomics, and breeding. With the completion of sequencing the B. rapa genome, functional analysis of the genome is forthcoming issue. The expressed sequence tags are fundamental resources supporting annotation and functional analysis of the genome including identification of tissue-specific genes and promoters. As of July 2011, 147,217 ESTs from 39 cDNA libraries of B. rapa are reported in the public database. However, little information can be retrieved from the sequences due to lack of organized databases. To leverage the sequence information and to maximize the use of publicly-available EST collections, the Brassica rapa tissue-specific EST database (BrTED) is developed. BrTED includes sequence information of 23,962 unigenes assembled by StackPack program. The unigene set is used as a query unit for various analyses such as BLAST against TAIR gene model, functional annotation using MIPS and UniProt, gene ontology analysis, and prediction of tissue-specific unigene sets based on statistics test. The database is composed of two main units, EST sequence processing and information retrieving unit and tissue-specific expression profile analysis unit. Information and data in both units are tightly inter-connected to each other using a web based browsing system. RT-PCR evaluation of 29 selected unigene sets successfully amplified amplicons from the target tissues of B. rapa. BrTED provided here allows the user to identify and analyze the expression of genes of interest and aid efforts to interpret the B. rapa genome through functional genomics. In addition, it can be used as a public resource in providing reference information to study the genus Brassica and other closely related crop crucifer plants.

A Study on Forecasting Accuracy Improvement of Case Based Reasoning Approach Using Fuzzy Relation (퍼지 관계를 활용한 사례기반추론 예측 정확성 향상에 관한 연구)

  • Lee, In-Ho;Shin, Kyung-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.67-84
    • /
    • 2010
  • In terms of business, forecasting is a work of what is expected to happen in the future to make managerial decisions and plans. Therefore, the accurate forecasting is very important for major managerial decision making and is the basis for making various strategies of business. But it is very difficult to make an unbiased and consistent estimate because of uncertainty and complexity in the future business environment. That is why we should use scientific forecasting model to support business decision making, and make an effort to minimize the model's forecasting error which is difference between observation and estimator. Nevertheless, minimizing the error is not an easy task. Case-based reasoning is a problem solving method that utilizes the past similar case to solve the current problem. To build the successful case-based reasoning models, retrieving the case not only the most similar case but also the most relevant case is very important. To retrieve the similar and relevant case from past cases, the measurement of similarities between cases is an important key factor. Especially, if the cases contain symbolic data, it is more difficult to measure the distances. The purpose of this study is to improve the forecasting accuracy of case-based reasoning approach using fuzzy relation and composition. Especially, two methods are adopted to measure the similarity between cases containing symbolic data. One is to deduct the similarity matrix following binary logic(the judgment of sameness between two symbolic data), the other is to deduct the similarity matrix following fuzzy relation and composition. This study is conducted in the following order; data gathering and preprocessing, model building and analysis, validation analysis, conclusion. First, in the progress of data gathering and preprocessing we collect data set including categorical dependent variables. Also, the data set gathered is cross-section data and independent variables of the data set include several qualitative variables expressed symbolic data. The research data consists of many financial ratios and the corresponding bond ratings of Korean companies. The ratings we employ in this study cover all bonds rated by one of the bond rating agencies in Korea. Our total sample includes 1,816 companies whose commercial papers have been rated in the period 1997~2000. Credit grades are defined as outputs and classified into 5 rating categories(A1, A2, A3, B, C) according to credit levels. Second, in the progress of model building and analysis we deduct the similarity matrix following binary logic and fuzzy composition to measure the similarity between cases containing symbolic data. In this process, the used types of fuzzy composition are max-min, max-product, max-average. And then, the analysis is carried out by case-based reasoning approach with the deducted similarity matrix. Third, in the progress of validation analysis we verify the validation of model through McNemar test based on hit ratio. Finally, we draw a conclusion from the study. As a result, the similarity measuring method using fuzzy relation and composition shows good forecasting performance compared to the similarity measuring method using binary logic for similarity measurement between two symbolic data. But the results of the analysis are not statistically significant in forecasting performance among the types of fuzzy composition. The contributions of this study are as follows. We propose another methodology that fuzzy relation and fuzzy composition could be applied for the similarity measurement between two symbolic data. That is the most important factor to build case-based reasoning model.

A Study on the Retrieval of River Turbidity Based on KOMPSAT-3/3A Images (KOMPSAT-3/3A 영상 기반 하천의 탁도 산출 연구)

  • Kim, Dahui;Won, You Jun;Han, Sangmyung;Han, Hyangsun
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.1285-1300
    • /
    • 2022
  • Turbidity, the measure of the cloudiness of water, is used as an important index for water quality management. The turbidity can vary greatly in small river systems, which affects water quality in national rivers. Therefore, the generation of high-resolution spatial information on turbidity is very important. In this study, a turbidity retrieval model using the Korea Multi-Purpose Satellite-3 and -3A (KOMPSAT-3/3A) images was developed for high-resolution turbidity mapping of Han River system based on eXtreme Gradient Boosting (XGBoost) algorithm. To this end, the top of atmosphere (TOA) spectral reflectance was calculated from a total of 24 KOMPSAT-3/3A images and 150 Landsat-8 images. The Landsat-8 TOA spectral reflectance was cross-calibrated to the KOMPSAT-3/3A bands. The turbidity measured by the National Water Quality Monitoring Network was used as a reference dataset, and as input variables, the TOA spectral reflectance at the locations of in situ turbidity measurement, the spectral indices (the normalized difference vegetation index, normalized difference water index, and normalized difference turbidity index), and the Moderate Resolution Imaging Spectroradiometer (MODIS)-derived atmospheric products(the atmospheric optical thickness, water vapor, and ozone) were used. Furthermore, by analyzing the KOMPSAT-3/3A TOA spectral reflectance of different turbidities, a new spectral index, new normalized difference turbidity index (nNDTI), was proposed, and it was added as an input variable to the turbidity retrieval model. The XGBoost model showed excellent performance for the retrieval of turbidity with a root mean square error (RMSE) of 2.70 NTU and a normalized RMSE (NRMSE) of 14.70% compared to in situ turbidity, in which the nNDTI proposed in this study was used as the most important variable. The developed turbidity retrieval model was applied to the KOMPSAT-3/3A images to map high-resolution river turbidity, and it was possible to analyze the spatiotemporal variations of turbidity. Through this study, we could confirm that the KOMPSAT-3/3A images are very useful for retrieving high-resolution and accurate spatial information on the river turbidity.