• Title/Summary/Keyword: Document searching

Search Result 170, Processing Time 0.029 seconds

An Development of Image Retrieval Model based on Image2Vec using GAN (Generative Adversarial Network를 활용한 Image2Vec기반 이미지 검색 모델 개발)

  • Jo, Jaechoon;Lee, Chanhee;Lee, Dongyub;Lim, Heuiseok
    • Journal of Digital Convergence
    • /
    • v.16 no.12
    • /
    • pp.301-307
    • /
    • 2018
  • The most of the IR focus on the method for searching the document, so the keyword-based IR system is not able to reflect the feature information of the image. In order to overcome these limitations, we have developed a system that can search similar images based on the vector information of images, and it can search for similar images based on sketches. The proposed system uses the GAN to up sample the sketch to the image level, convert the image to the vector through the CNN, and then retrieve the similar image using the vector space model. The model was learned using fashion image and the image retrieval system was developed. As a result, the result is showed meaningful performance.

A Study on Search Query Topics and Types using Topic Modeling and Principal Components Analysis (토픽모델링 및 주성분 분석 기반 검색 질의 유형 분류 연구)

  • Kang, Hyun-Ah;Lim, Heui-Seok
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.6
    • /
    • pp.223-234
    • /
    • 2021
  • Recent advances in the 4th Industrial Revolution have accelerated the change of the shopping behavior from offline to online. Search queries show customers' information needs most intensively in online shopping. However, there are not many search query research in the field of search, and most of the prior research in the field of search query research has been studied on a limited topic and data-based basis based on researchers' qualitative judgment. To this end, this study defines the type of search query with data-based quantitative methodology by applying machine learning to search research query field to define the 15 topics of search query by conducting topic modeling based on search query and clicked document information. Furthermore, we present a new classification system of new search query types representing searching behavior characteristics by extracting key variables through principal component analysis and analyzing. The results of this study are expected to contribute to the establishment of effective search services and the development of search systems.

Design and Implementation of Web-based Retrieval System for Massive Image Contents in Green Computing Environment (그린 환경을 위한 웹기반 대용량 이미지 콘텐츠 검색 시스템 설계 및 구현)

  • Na, Moon-Sung;Lee, Jae-Dong
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.14 no.5
    • /
    • pp.113-123
    • /
    • 2009
  • As environmental issues are emerging, many efforts are globally conducted to reduce waste of energies and resources for green growth, as well as low-carbon emitting and replacement of document papers with digital files and images. On the other hand, it may require much time and efforts for users to find the proper image files on the web, where enormous un-standardized digital files are flourishing. Therefore, power and resource consumption may also grow up again in searching and retrieving files. This paper suggests efficient system design and implementation for fast and precise massive image contents retrieval for saving the energies and resources. Eventually it will contribute to green growth in computing environment.

A Comparative Study of the Impacts among Patent Assignees in Pharmaceutical Research based on Bibliometric Analyses (계량서지학적 분석을 통한 약물연구분야 특허출원인 간 영향력 비교)

  • Kim, Heeyoung;Park, Ji-Hong
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.1
    • /
    • pp.1-15
    • /
    • 2022
  • This study analyzes the relationship of citations appearing in the patent data to understand knowledge transfers and impacts between patent documents in the field of pharmaceutical research. Patent data were collected from a website, Google Patents. The top 25 assignees were selected by searching for patent documents related to pharmaceutical research. We identify the citation relationships between assignees, then calculate and compare the values of h-index and derived indicators by using the number of citations and rank for each document of each assignee. As a result, in the case of pharmaceutical research, the assignee, such as 'Pfizer, MIT, and Abbott' shows a high impact. Among the five bibliometric indicators, the g-index and hS-index show similar results, and the indicators are the most related to the rankings of Total Citation Frequency, Cites per Patents, and Maximum Citation Frequency. In addition, it is highly related to the five indicators in the order of Total Citation Frequency, Cites per Patents, and Maximum Citation Frequency. In some cases, it is difficult to make an accurate comparison with Cites per Patents alone, which is previously known to indicate the technological influence of patent assignees.

Consumers' perceptions of dietary supplements before and after the COVID-19 pandemic based on big data

  • Eunjung Lee;Hyo Sun Jung;Jin A Jang
    • Journal of Nutrition and Health
    • /
    • v.56 no.3
    • /
    • pp.330-347
    • /
    • 2023
  • Purpose: This study identified words closely associated with the keyword "dietary supplement" (DS) using big data in Korean social media and investigated consumer perceptions and trends related to DSs before (2019) and after the coronavirus disease 2019 (COVID-19) pandemic (2021). Methods: A total of 37,313 keywords were found for the 2019 period, and 35,336 keywords were found for the 2021 period using blogs and cafes on Daum and Naver. Results were derived by text mining, semantic networking, network visualization analysis, and sentiment analysis. Results: The DS-related keywords that frequently appeared before and after COVID-19 were "recommend", "vitamin", "health", "children", "multiple", and "lactobacillus". "Calcium", "lutein", "skin", and "immunity" also had high frequency-inverse document frequency (TF-IDF) values. These keywords imply a keen interest in DSs among Korean consumers. Big data results also reflected social phenomena related to DSs; for example, "baby" and "pregnant woman" had lower TD-IDF values after the pandemic, suggesting lower marriage and birth rates but higher values for "joint", indicating reduced physical activity. A network centered on vitamins and health care was produced by semantic network analysis in 2019. In 2021, values were highest for deficiency and need, indicating that individuals were searching for DSs after the COVID-19 pandemic due to a lack an awareness of the need for adequate nutrient intake. Before the pandemic, DSs and vitamins were associated with healthcare and life cycle-related topics, such as pregnancy, but after the COVID-19 pandemic, consumer interests changed to disease prevention and treatment. Conclusion: This study provides meaningful clues regarding consumer perceptions and trends related to DSs before and after the COVID-19 pandemic and fundamental data on the effect of the pandemic on consumer interest in dietary supplements.

Methods for Integration of Documents using Hierarchical Structure based on the Formal Concept Analysis (FCA 기반 계층적 구조를 이용한 문서 통합 기법)

  • Kim, Tae-Hwan;Jeon, Ho-Cheol;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.63-77
    • /
    • 2011
  • The World Wide Web is a very large distributed digital information space. From its origins in 1991, the web has grown to encompass diverse information resources as personal home pasges, online digital libraries and virtual museums. Some estimates suggest that the web currently includes over 500 billion pages in the deep web. The ability to search and retrieve information from the web efficiently and effectively is an enabling technology for realizing its full potential. With powerful workstations and parallel processing technology, efficiency is not a bottleneck. In fact, some existing search tools sift through gigabyte.syze precompiled web indexes in a fraction of a second. But retrieval effectiveness is a different matter. Current search tools retrieve too many documents, of which only a small fraction are relevant to the user query. Furthermore, the most relevant documents do not nessarily appear at the top of the query output order. Also, current search tools can not retrieve the documents related with retrieved document from gigantic amount of documents. The most important problem for lots of current searching systems is to increase the quality of search. It means to provide related documents or decrease the number of unrelated documents as low as possible in the results of search. For this problem, CiteSeer proposed the ACI (Autonomous Citation Indexing) of the articles on the World Wide Web. A "citation index" indexes the links between articles that researchers make when they cite other articles. Citation indexes are very useful for a number of purposes, including literature search and analysis of the academic literature. For details of this work, references contained in academic articles are used to give credit to previous work in the literature and provide a link between the "citing" and "cited" articles. A citation index indexes the citations that an article makes, linking the articleswith the cited works. Citation indexes were originally designed mainly for information retrieval. The citation links allow navigating the literature in unique ways. Papers can be located independent of language, and words in thetitle, keywords or document. A citation index allows navigation backward in time (the list of cited articles) and forwardin time (which subsequent articles cite the current article?) But CiteSeer can not indexes the links between articles that researchers doesn't make. Because it indexes the links between articles that only researchers make when they cite other articles. Also, CiteSeer is not easy to scalability. Because CiteSeer can not indexes the links between articles that researchers doesn't make. All these problems make us orient for designing more effective search system. This paper shows a method that extracts subject and predicate per each sentence in documents. A document will be changed into the tabular form that extracted predicate checked value of possible subject and object. We make a hierarchical graph of a document using the table and then integrate graphs of documents. The graph of entire documents calculates the area of document as compared with integrated documents. We mark relation among the documents as compared with the area of documents. Also it proposes a method for structural integration of documents that retrieves documents from the graph. It makes that the user can find information easier. We compared the performance of the proposed approaches with lucene search engine using the formulas for ranking. As a result, the F.measure is about 60% and it is better as about 15%.

XSLT Stylesheet Design for Building Web Presentation Layer (웹 프리젠테이션 레이어 생성을 위한 XSLT 스타일쉬트 설계)

  • 채정화;유철중;장옥배
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.3
    • /
    • pp.255-266
    • /
    • 2004
  • In the Web-based information systems, separating the business process logic from the data and presentation logic brings about a wide range of advantages. However, this separation is not easily achieved; even the data logic may be not separated from the presentation layer. So, it requires to define an model for business processes, and then to map the model into the user's dynamic interface using the logic separating strategy. This paper presents a stylesheet method to recognize the process by extending XSLT (Extensible Stylesheet Language Transformations), in order to achieve the logic separation. To do this, it provides an specification of the business process, and a scheme that extracts business model factors and their interactions using a Petri-net notation to show the business model into the process point of view. This is an attempt to separate users' interaction from the business process, that is, dynamic components of interaction Web document from the process structure of Web applications. Our architecture consist mainly of an XSLT controller that is extended by a process control component. The XSLT controller is responsible for receiving the user requests and searching the relevant templet rule related to different user requests one by one. Separation of concerns facilities the development of service-oriented Web sites by making if modular. As a result, the development of service-oriented Web sites would be very easy, and can be changed without affecting the other modules, by virtue of the modularization concept. So, it is easy to develop and maintain the Web applications in independent manner.

Using Web as CAI in the Classroom of Information Age (정보화시대를 대비한 CAI로서의 Web 활용)

  • Lee, Kwang-Hi
    • Journal of The Korean Association of Information Education
    • /
    • v.1 no.1
    • /
    • pp.38-48
    • /
    • 1997
  • This study is an attempt to present a usage of the Web as CAI in the classroom and to give a direction to the future education in the face of information age. Characteristcs of information society, current curriculum, educational and teacher education are first analyzed in this article. The features of internet and 'Web are then summarized to present benefits of usage in the classroom as a CAI tool. The literature shows several characteristics of information society as follows : a technological computer, a provision and sharing of information, multi functional society, a participative democracy', an autonomy, a time value..A problem solving and 4 Cs(e.g., cooperation, copying, communication, creativity) are newly needed in this learning environment. The Internet is a large collection of networks that are tied together so that users can share their vast resources, a wealth of information, and give a key to a successful, efficient. individual study over a time and space. The 'Web increases an academic achievement, a creativity, a problem solving, a cognitive thinking, and a learner's motivation through an easy access to : documents available on the Internet, files containing programs, pictures, movies, and sounds from an FTP site, Usenet newsgroups, WAIS seraches, computers accessible through telnet, hypertext document, Java applets and other multimedia browser enhancements, and much more, In the Web browser will be our primary tool in searching for information on the Internet in this information age.

  • PDF

Application of Advertisement Filtering Model and Method for its Performance Improvement (광고 글 필터링 모델 적용 및 성능 향상 방안)

  • Park, Raegeun;Yun, Hyeok-Jin;Shin, Ui-Cheol;Ahn, Young-Jin;Jeong, Seungdo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.11
    • /
    • pp.1-8
    • /
    • 2020
  • In recent years, due to the exponential increase in internet data, many fields such as deep learning have developed, but side effects generated as commercial advertisements, such as viral marketing, have been discovered. This not only damages the essence of the internet for sharing high-quality information, but also causes problems that increase users' search times to acquire high-quality information. In this study, we define advertisement as "a text that obscures the essence of information transmission" and we propose a model for filtering information according to that definition. The proposed model consists of advertisement filtering and advertisement filtering performance improvement and is designed to continuously improve performance. We collected data for filtering advertisements and learned document classification using KorBERT. Experiments were conducted to verify the performance of this model. For data combining five topics, accuracy and precision were 89.2% and 84.3%, respectively. High performance was confirmed, even if atypical characteristics of advertisements are considered. This approach is expected to reduce wasted time and fatigue in searching for information, because our model effectively delivers high-quality information to users through a process of determining and filtering advertisement paragraphs.

A Study on Implementation of SVG for ENC Applications (전자해도 활용을 위한 SVG 변환 연구)

  • Oh, Se-Woong;Park, Jong-Min;Suh, Sang-Hyun
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • v.1
    • /
    • pp.133-138
    • /
    • 2006
  • Electronic Navigational Charts(ENCs) are official nautical charts which are equivalent to paper charts with supplementary information. Although their main purpose is to be used for the safe navigation of ships, they also contain much information on coasts and seas which may be interesting to ordinary people. However, there is no easy way to access them because of therir specialized data format, access method and visualization. This paper proposes on implementation of SVG for the access and services of ENCs. SVG(Scalable Vector Graphic) makes it possible to make use of Vector graphics for servicing maps in basic internet browsing environment. Implement of SVG for ENC applications by this research is free of special server side GIS mapping system and client side extra technology. The implementation of SVG for ENC Applications can be summarized as follows: Firstly, SVG provides spatial information to possess searching engine to embody SVG map. Secondly, SVG can provide high-quality vector map graphics and interactive facility without special Internet GIS system. It makes it possible to use services with very low cost. Thirdly, SVG information service targeting on maritime transportation can be used as template, so it can be used dynamically any other purpose such as traffic management and vessel monitoring. Many good characteristics of SVG in mapping at computer screen and reusability of SVG document provide new era of visualization of marine geographic information.

  • PDF