통합 검색 | Korea Science

A Brief Survey into the Field of Automatic Image Dataset Generation through Web Scraping and Query Expansion

Bart Dikmans;Dongwann Kang
- Journal of Information Processing Systems
- /
- 제19권5호
- /
- pp.602-613
- /
- 2023
High-quality image datasets are in high demand for various applications. With many online sources providing manually collected datasets, a persisting challenge is to fully automate the dataset collection process. In this study, we surveyed an automatic image dataset generation field through analyzing a collection of existing studies. Moreover, we examined fields that are closely related to automated dataset generation, such as query expansion, web scraping, and dataset quality. We assess how both noise and regional search engine differences can be addressed using an automated search query expansion focused on hypernyms, allowing for user-specific manual query expansion. Combining these aspects provides an outline of how a modern web scraping application can produce large-scale image datasets.
https://doi.org/10.3745/JIPS.04.0288 인용 PDF

클러스터링을 이용한 시소러스 브라우저의 설계에 대한 이론적 연구 (A Theoretical Study of Designing Thesaurus Browser by Clustering Algorithm)

Seo, Hwi
- 한국도서관정보학회지
- /
- 제30권3호
- /
- pp.427-456
- /
- 1999
This paper deals with the problems of information retrieval through full-test database which arise from both the deficiency of searching strategies or methods by information searcher and the difficulties of query representation, generation, extension, etc. In oder to solve these problems, we should use automatic retrieval instead of manual retrieval in the past. One of the ways to make the gap narrow between the terms by the writers and query by the searchers is that the query should be searched with the terms which the writers use. Thus, the preconditions which should be taken one accorded way to solve the problems are that all areas of information retrieval such as should taken one accorded way to solve the problems are that all areas of information retrieval such as contents analysis, information structure, query formation, query evaluation, etc. should be solved as a coherence way. We need to deal all the ares of automatic information retrieval for the efficiency of retrieval thought this paper is trying to solve the design of thesaurus browser. Thus, this paper shows the theoretical analyses about the form of information retrieval, automatic indexing, clustering technique, establishing and expressing thesaurus, and information retrieval technique. As the result of analyzing them, this paper shows us theoretical model, that is to say, the thesaurus browser by clustering algorithm. The result in the paper will be a theoretical basis on new retrieval algorithm.
PDF

Automatic Generation of Machine Readable Context Annotations for SPARQL Results

Choi, Ji-Woong
- 한국컴퓨터정보학회논문지
- /
- 제21권10호
- /
- pp.1-10
- /
- 2016
In this paper, we propose an approach to generate machine readable context annotations for SPARQL Results. According to W3C Recommendations, the retrieved data from RDF or OWL data sources are represented in tabular form, in which each cell's data is described by only type and value. The simple query result form is generally useful, but it is not sufficient to explain the semantics of the data in query results. To explain the meaning of the data, appropriate annotations must be added to the query results. In this paper, we generate the annotations from the basic graph patterns in user's queries. We could also manipulate the original queries to complete the annotations. The generated annotations are represented using the RDFa syntax in our study. The RDFa expressions in HTML are machine-understandable. We believe that our work will improve the trustworthiness of query results and contribute to distribute the data to meet the vision of the Semantic Web.
https://doi.org/10.9708/jksci.2016.21.10.001 인용 PDF KSCI

AutoCor: A Query Based Automatic Acquisition of Corpora of Closely-related Languages

Dimalen, Davis Muhajereen D.;Roxas, Rachel Edita O.
- 한국언어정보학회:학술대회논문집
- /
- 한국언어정보학회 2007년도 정기학술대회
- /
- pp.146-154
- /
- 2007
AutoCor is a method for the automatic acquisition and classification of corpora of documents in closely-related languages. It is an extension and enhancement of CorpusBuilder, a system that automatically builds specific minority language corpora from a closed corpus, since some Tagalog documents retrieved by CorpusBuilder are actually documents in other closely-related Philippine languages. AutoCor used the query generation method odds ratio, and introduced the concept of common word pruning to differentiate between documents of closely-related Philippine languages and Tagalog. The performance of the system using with and without pruning are compared, and common word pruning was found to improve the precision of the system.
PDF

Best Practice on Automatic Toon Image Creation from JSON File of Message Sequence Diagram via Natural Language based Requirement Specifications

Hyuntae Kim;Ji Hoon Kong;Hyun Seung Son;R. Young Chul Kim
- International journal of advanced smart convergence
- /
- 제13권1호
- /
- pp.99-107
- /
- 2024
In AI image generation tools, most general users must use an effective prompt to craft queries or statements to elicit the desired response (image, result) from the AI model. But we are software engineers who focus on software processes. At the process's early stage, we use informal and formal requirement specifications. At this time, we adapt the natural language approach into requirement engineering and toon engineering. Most Generative AI tools do not produce the same image in the same query. The reason is that the same data asset is not used for the same query. To solve this problem, we intend to use informal requirement engineering and linguistics to create a toon. Therefore, we propose a sequence diagram and image generation mechanism by analyzing and applying key objects and attributes as an informal natural language requirement analysis. Identify morpheme and semantic roles by analyzing natural language through linguistic methods. Based on the analysis results, a sequence diagram and an image are generated through the diagram. We expect consistent image generation using the same image element asset through the proposed mechanism.
https://doi.org/10.7236/IJASC.2024.13.1.99 인용 PDF

소셜 빅데이터 마이닝 기반 이슈 분석보고서 자동 생성 (Automatic Generation of Issue Analysis Report Based on Social Big Data Mining)

허정;이충희;오효정;윤여찬;김현기;조요한;옥철영
- 정보처리학회논문지:소프트웨어 및 데이터공학
- /
- 제3권12호
- /
- pp.553-564
- /
- 2014
본 논문은 지금까지의 소셜미디어 분석과 분석보고서 생성의 세 가지 문제점을 해결하기 위해서 소셜 빅데이터 마이닝에 기반한 이슈분석보고서 자동 생성 시스템을 제안한다. 세 가지 문제점은 분석의 고립성, 전문가의 주관성과 고비용에 기인한 정보의 폐쇄성이다. 시스템은 자연언어 질의분석, 이슈분석, 소셜 빅데이터 분석, 소셜 빅데이터 상관성분석과 자동 보고서 생성으로 구성된다. 생성된 보고서의 유용성을 평가하기 위해, 본 논문에서는 리커트척도를 사용하였고, 빅데이터 분석 전문가 2명이 평가하였다. 평가결과는 리커트 척도 평가에서 보고서의 품질이 비교적 유용하고 신뢰할 수 있는 것으로 평가되었다. 보고서 생성의 저비용, 소셜 빅데이터의 상관성 분석과 소셜 빅데이터 분석의 객관성 때문에, 제안된 시스템이 소셜 빅데이터 분석의 대중화를 선도할 것으로 기대된다.
https://doi.org/10.3745/KTSDE.2014.3.12.553 인용 PDF KSCI

Seq2SPARQL: 신경망 기계 번역을 사용한 지식 베이스 질의 언어 자동 생성 (Seq2SPARQL: Automatic Generation of Knowledge base Query Language using Neural Machine Translation)

홍동균;심홍매;김광민
- 한국정보처리학회:학술대회논문집
- /
- 한국정보처리학회 2019년도 추계학술발표대회
- /
- pp.898-900
- /
- 2019
SPARQL(SPARQL Protocol and RDF Query Language)은 지식 베이스를 위한 표준 시맨틱 질의 언어이다. 최근 인공지능 분야에서 지식 베이스는 질의 응답 시스템, 시맨틱 검색 등 그 활용성이 커지고 있다. 그러나 SPARQL 과 같은 질의 언어를 사용하기 위해서는 질의 언어의 문법을 이해하기 때문에, 일반 사용자의 경우에는 그 활용성이 제한될 수밖에 없다. 이에 본 논문은 신경망 기반 기계 번역 기술을 활용하여 자연어 질의로부터 SPARQL 을 생성하는 방법을 제안한다. 우리는 제안하는 방법을 대규모 공개 지식 베이스인 Wikidata 를 사용해 검증하였다. 우리는 실험에서 사용할 Wikidata 에 존재하는 영화 지식을 묻는 자연어 질의-SPARQL 질의 쌍 20,000 건을 생성하였고, 여러 sequence-to-sequence 모델을 비교한 실험에서 합성곱 신경망 기반의 모델이 BLEU 96.8%의 가장 좋은 결과를 얻음을 보였다.
https://doi.org/10.3745/PKIPS.y2019m10a.898 인용 PDF

엔터테인먼트 데이터를 위한 자연어 검색시스템 (A Natural Language Retrieval System for Entertainment Data)

김정인
- 한국멀티미디어학회논문지
- /
- 제18권1호
- /
- pp.52-64
- /
- 2015
Recently, as the quality of life has been improving, search items in the area of entertainment represent an increasing share of the total usage of Internet portal sites. Information retrieval in the entertainment area is mainly depending on keywords that users are inputting, and the results of information retrieval are the contents that contain those keywords. In this paper, we propose a search method that takes natural language inputs and retrieves the database pertaining to entertainment. The main components of our study are the simple Korean morphological analyzer using case particle information, predicate-oriented token generation, standardized pattern generation coherent to tokens, and automatic generation of the corresponding SQL queries. We also propose an efficient retrieval system that searches the most relevant results from the database in terms of natural language querying, especially in the restricted domain of music, and shows the effectiveness of our system.
https://doi.org/10.9717/kmms.2015.18.1.052 인용 PDF KSCI KPUBS HTML

웹티어 오브젝트 모델링을 통한 non-SQL 데이터 서버 프레임웍 구현 (Implementation of Non-SQL Data Server Framework Applying Web Tier Object Modeling)

권기현;천상호;최형진
- 한국통신학회논문지
- /
- 제31권4B호
- /
- pp.285-290
- /
- 2006
엔터프라이즈 애플리케이션 개발을 위한 분산 아키텍처를 개발할 때는 여러 고려 사항 중에서 계층(tier)의 응집력(cohesion)을 높이고 계층간 연결 결합력(coupling)을 낮추기 위해 페이지 작성자와 소프트웨어 개발자의 역할을 명확히 분리하는 것과 비즈니스 로직의 단위가 되는 엔터티(entity)를 정의하고 데이터베이스 연결과 트랜잭션 처리에 엔터티의 사용 및 역할에 대해 정의하는 것이 우선적으로 필요하다. 이 논문에서는 DONSL(Data Server of Non SQL query) 아키텍처를 제시하여 이러한 문제점을 해결하고자 한다. 이 아키텍처는 웹 티어 오브젝트 모델링 방법을 사용하며 계층(tier)간의 결합도를 낮추고, 데이터베이스 연결에 반드시 사용되는 DAO(Data Access Object)와 엔터티를 효과적으로 분리하여 이러한 문제점을 해결 한다. 핵심 내용으로 DAO에서 엔터티 객체를 제거하는 방안을 통해 DAO 개발을 용이하게 하는 방법과 SQL 질의 자동 생성을 통해 트랜잭션 처리 자동화 방법, 그리고 트랜잭션 처리시 AET(Automated Executed Transaction)와 MET(Manual Executed Transaction)를 효율적으로 운용하는 방법에 대해 제시하고 시스템을 구현하였다.
PDF KSCI

모바일 인터넷 기반 이미지 검색을 위한 초기질의 자동생성 기법 (An Automatic Generation Method of the Initial Query Set for Image Search on the Mobile Internet)

김덕환;조윤호
- 지능정보연구
- /
- 제13권1호
- /
- pp.1-14
- /
- 2007
휴대전화의 배경화면을 위한 캐릭터 이미지의 수요가 모바일 컨텐츠 시장에서 빠르게 성장함에도 불구하고 지능형 검색 도구의 부재로 인해 사용자들은 원하는 이미지를 검색하는 데 많은 어려움을 겪고 있다. 이 문제를 해결하기 위한 방법으로 이미지 검색을 위해 가장 널리 사용되는 내용기반 이미지 검색(Content-Based Image Retrieval; CBIR)이 사용될 수 있겠으나 PC-기반 시스템과는 달리 초기 질의 요구를 만족시킬 수 없는 모바일 응용 소프트웨어의 제약 사항의 극복이 필요하다. 본 연구에서는 적합성 피드백과정에서 얻어진 선호도 정보를 이용하는 협업필터링(Collaborative Filtering; CF) 기법을 사용하여 내용기반 이미지 검색의 초기 질의로 사용될 수 있는 후보이미지의 리스트를 자동 생성하는 IQS-AutoGen이라고 하는 새로운 방법을 제안한다. IQS-AutoGen은 CBIR로부터 피드백된 이미지들에 대한 적합성 정보를 이용하여 목표 사용자와 선호도가 유사한 이웃(neighbor)을 확인하고 이웃들이 선호하는 이미지들의 리스트를 제공하는 CF 프로세스를 통해 CBIR을 위한 초기 질의 집합(Initial Query Set : IQS)을 자동으로 생성한다. 따라서 모바일 사용자는 IQS에 있는 이미지들 중의 하나를 선택하여 CBIR 세션을 위한 질의 이미지로 사용할 수 있게 된다. PC-기반 프로토타입 시스템을 사용하여 실험한 결과로부터 제안한 방법이 모바일 인터넷 환경에서 CBIR의 초기질의 요구를 성공적으로 만족시킬 뿐만 아니라 현재의 검색 방법보다 우수한 성능을 보여주고 있음을 알 수 있다.
PDF

검색결과 21건 처리시간 0.02초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)