• Title/Summary/Keyword: Automatic Query Generation

Search Result 21, Processing Time 0.03 seconds

A Brief Survey into the Field of Automatic Image Dataset Generation through Web Scraping and Query Expansion

  • Bart Dikmans;Dongwann Kang
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.602-613
    • /
    • 2023
  • High-quality image datasets are in high demand for various applications. With many online sources providing manually collected datasets, a persisting challenge is to fully automate the dataset collection process. In this study, we surveyed an automatic image dataset generation field through analyzing a collection of existing studies. Moreover, we examined fields that are closely related to automated dataset generation, such as query expansion, web scraping, and dataset quality. We assess how both noise and regional search engine differences can be addressed using an automated search query expansion focused on hypernyms, allowing for user-specific manual query expansion. Combining these aspects provides an outline of how a modern web scraping application can produce large-scale image datasets.

A Theoretical Study of Designing Thesaurus Browser by Clustering Algorithm (클러스터링을 이용한 시소러스 브라우저의 설계에 대한 이론적 연구)

  • Seo, Hwi
    • Journal of Korean Library and Information Science Society
    • /
    • v.30 no.3
    • /
    • pp.427-456
    • /
    • 1999
  • This paper deals with the problems of information retrieval through full-test database which arise from both the deficiency of searching strategies or methods by information searcher and the difficulties of query representation, generation, extension, etc. In oder to solve these problems, we should use automatic retrieval instead of manual retrieval in the past. One of the ways to make the gap narrow between the terms by the writers and query by the searchers is that the query should be searched with the terms which the writers use. Thus, the preconditions which should be taken one accorded way to solve the problems are that all areas of information retrieval such as should taken one accorded way to solve the problems are that all areas of information retrieval such as contents analysis, information structure, query formation, query evaluation, etc. should be solved as a coherence way. We need to deal all the ares of automatic information retrieval for the efficiency of retrieval thought this paper is trying to solve the design of thesaurus browser. Thus, this paper shows the theoretical analyses about the form of information retrieval, automatic indexing, clustering technique, establishing and expressing thesaurus, and information retrieval technique. As the result of analyzing them, this paper shows us theoretical model, that is to say, the thesaurus browser by clustering algorithm. The result in the paper will be a theoretical basis on new retrieval algorithm.

  • PDF

Automatic Generation of Machine Readable Context Annotations for SPARQL Results

  • Choi, Ji-Woong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.10
    • /
    • pp.1-10
    • /
    • 2016
  • In this paper, we propose an approach to generate machine readable context annotations for SPARQL Results. According to W3C Recommendations, the retrieved data from RDF or OWL data sources are represented in tabular form, in which each cell's data is described by only type and value. The simple query result form is generally useful, but it is not sufficient to explain the semantics of the data in query results. To explain the meaning of the data, appropriate annotations must be added to the query results. In this paper, we generate the annotations from the basic graph patterns in user's queries. We could also manipulate the original queries to complete the annotations. The generated annotations are represented using the RDFa syntax in our study. The RDFa expressions in HTML are machine-understandable. We believe that our work will improve the trustworthiness of query results and contribute to distribute the data to meet the vision of the Semantic Web.

AutoCor: A Query Based Automatic Acquisition of Corpora of Closely-related Languages

  • Dimalen, Davis Muhajereen D.;Roxas, Rachel Edita O.
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.146-154
    • /
    • 2007
  • AutoCor is a method for the automatic acquisition and classification of corpora of documents in closely-related languages. It is an extension and enhancement of CorpusBuilder, a system that automatically builds specific minority language corpora from a closed corpus, since some Tagalog documents retrieved by CorpusBuilder are actually documents in other closely-related Philippine languages. AutoCor used the query generation method odds ratio, and introduced the concept of common word pruning to differentiate between documents of closely-related Philippine languages and Tagalog. The performance of the system using with and without pruning are compared, and common word pruning was found to improve the precision of the system.

  • PDF

Best Practice on Automatic Toon Image Creation from JSON File of Message Sequence Diagram via Natural Language based Requirement Specifications

  • Hyuntae Kim;Ji Hoon Kong;Hyun Seung Son;R. Young Chul Kim
    • International journal of advanced smart convergence
    • /
    • v.13 no.1
    • /
    • pp.99-107
    • /
    • 2024
  • In AI image generation tools, most general users must use an effective prompt to craft queries or statements to elicit the desired response (image, result) from the AI model. But we are software engineers who focus on software processes. At the process's early stage, we use informal and formal requirement specifications. At this time, we adapt the natural language approach into requirement engineering and toon engineering. Most Generative AI tools do not produce the same image in the same query. The reason is that the same data asset is not used for the same query. To solve this problem, we intend to use informal requirement engineering and linguistics to create a toon. Therefore, we propose a sequence diagram and image generation mechanism by analyzing and applying key objects and attributes as an informal natural language requirement analysis. Identify morpheme and semantic roles by analyzing natural language through linguistic methods. Based on the analysis results, a sequence diagram and an image are generated through the diagram. We expect consistent image generation using the same image element asset through the proposed mechanism.

Automatic Generation of Issue Analysis Report Based on Social Big Data Mining (소셜 빅데이터 마이닝 기반 이슈 분석보고서 자동 생성)

  • Heo, Jeong;Lee, Chung Hee;Oh, Hyo Jung;Yoon, Yeo Chan;Kim, Hyun Ki;Jo, Yo Han;Ock, Cheol Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.12
    • /
    • pp.553-564
    • /
    • 2014
  • In this paper, we propose the system for automatic generation of issue analysis report based on social big data mining, with the purpose of resolving three problems of the previous technologies in a social media analysis and analytic report generation. Three problems are the isolation of analysis, the subjectivity of experts and the closure of information attributable to a high price. The system is comprised of the natural language query analysis, the issue analysis, the social big data analysis, the social big data correlation analysis and the automatic report generation. For the evaluation of report usefulness, we used a Likert scale and made two experts of big data analysis evaluate. The result shows that the quality of report is comparatively useful and reliable. Because of a low price of the report generation, the correlation analysis of social big data and the objectivity of social big data analysis, the proposed system will lead us to the popularization of social big data analysis.

Seq2SPARQL: Automatic Generation of Knowledge base Query Language using Neural Machine Translation (Seq2SPARQL: 신경망 기계 번역을 사용한 지식 베이스 질의 언어 자동 생성)

  • Hong, Dong-Gyun;Shen, Hong-Mei;Kim, Kwang-Min
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.898-900
    • /
    • 2019
  • SPARQL(SPARQL Protocol and RDF Query Language)은 지식 베이스를 위한 표준 시맨틱 질의 언어이다. 최근 인공지능 분야에서 지식 베이스는 질의 응답 시스템, 시맨틱 검색 등 그 활용성이 커지고 있다. 그러나 SPARQL 과 같은 질의 언어를 사용하기 위해서는 질의 언어의 문법을 이해하기 때문에, 일반 사용자의 경우에는 그 활용성이 제한될 수밖에 없다. 이에 본 논문은 신경망 기반 기계 번역 기술을 활용하여 자연어 질의로부터 SPARQL 을 생성하는 방법을 제안한다. 우리는 제안하는 방법을 대규모 공개 지식 베이스인 Wikidata 를 사용해 검증하였다. 우리는 실험에서 사용할 Wikidata 에 존재하는 영화 지식을 묻는 자연어 질의-SPARQL 질의 쌍 20,000 건을 생성하였고, 여러 sequence-to-sequence 모델을 비교한 실험에서 합성곱 신경망 기반의 모델이 BLEU 96.8%의 가장 좋은 결과를 얻음을 보였다.

A Natural Language Retrieval System for Entertainment Data (엔터테인먼트 데이터를 위한 자연어 검색시스템)

  • Kim, Jung-In
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.1
    • /
    • pp.52-64
    • /
    • 2015
  • Recently, as the quality of life has been improving, search items in the area of entertainment represent an increasing share of the total usage of Internet portal sites. Information retrieval in the entertainment area is mainly depending on keywords that users are inputting, and the results of information retrieval are the contents that contain those keywords. In this paper, we propose a search method that takes natural language inputs and retrieves the database pertaining to entertainment. The main components of our study are the simple Korean morphological analyzer using case particle information, predicate-oriented token generation, standardized pattern generation coherent to tokens, and automatic generation of the corresponding SQL queries. We also propose an efficient retrieval system that searches the most relevant results from the database in terms of natural language querying, especially in the restricted domain of music, and shows the effectiveness of our system.

Implementation of Non-SQL Data Server Framework Applying Web Tier Object Modeling (웹티어 오브젝트 모델링을 통한 non-SQL 데이터 서버 프레임웍 구현)

  • Kwon Ki-Hyeon;Cheon Sang-Ho;Choi Hyung-Jin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.4B
    • /
    • pp.285-290
    • /
    • 2006
  • Various aspects should be taken into account while developing a distributed architecture based on a multi-tier model or an enterprise architecture. Among those, the separation of role between page designer and page developer, defining entity which is used for database connection and transaction processing are very much important. In this paper, we presented DONSL(Data Server of Non SQL query) architecture to solve these problems applying web tier object modelling. This architecture solves the above problems by simplifying tiers coupling and removing DAO(Data Access Object) and entity from programming logic. We concentrate upon these three parts. One is about how to develop the DAO not concerning the entity modification, another is automatic transaction processing technique including SQL generation and the other is how to use the AET/MET(Automated/Manual Execute d Transaction) effectively.

An Automatic Generation Method of the Initial Query Set for Image Search on the Mobile Internet (모바일 인터넷 기반 이미지 검색을 위한 초기질의 자동생성 기법)

  • Kim, Deok-Hwan;Cho, Yoon-Ho
    • Journal of Intelligence and Information Systems
    • /
    • v.13 no.1
    • /
    • pp.1-14
    • /
    • 2007
  • Character images for the background screen of cell phones are one of the fast growing sectors of the mobile content market. However, character image buyers currently experience tremendous difficulties in searching for desired images due to the awkward image search process. Content-based image retrieval (CBIR) widely used for image retrieval could be a good candidate as a solution to this problem, but it needs to overcome the limitation of the mobile Internet environment where an initial query set (IQS) cannot be easily provided as in the PC-based environment. We propose a new approach, IQS-AutoGen, which automatically generates an initial query set for CBIR on the mobile Internet. The approach applies the collaborative filtering (CF), a well-known recommendation technique, to the CBIR process by using users' preference information collected during the relevance feedback process of CBIR. The results of the experiment using a PC-based prototype system show that the proposed approach successfully satisfies the initial query requirement of CBIR in the mobile Internet environment, thereby outperforming the current image search process on the mobile Internet.

  • PDF