• Title/Summary/Keyword: query quality

Search Result 91, Processing Time 0.024 seconds

Keyword Spotting on Hangul Document Images Using Character Feature Models (문자 별 특징 모델을 이용한 한글 문서 영상에서 키워드 검색)

  • Park, Sang-Cheol;Kim, Soo-Hyung;Choi, Deok-Jai
    • The KIPS Transactions:PartB
    • /
    • v.12B no.5 s.101
    • /
    • pp.521-526
    • /
    • 2005
  • In this Paper, we propose a keyword spotting system as an alternative to searching system for poor quality Korean document images and compare the Proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to remove the connectivity between adjacent characters and a character segmentation method by making the variance of character widths minimum. In the query creation step, feature vector for the query is constructed by a combination of a character model by typeface. In the matching step, word-to-word matching is applied base on a character-to-character matching. We demonstrated that the proposed keyword spotting system is more efficient than the OCR-based one to search a keyword on the Korean document images, especially when the quality of documents is quite poor and point size is small.

Building Intelligent User Interface Agent for Semantically Reformulating User Query in Medicine

  • Yang, Jung-Jin;Lim, Chae-Myung;Chu, Sung-Joon;Lee, Dong-Hoon;Park, Duck-Whan;Park, Tae-Yong
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.2
    • /
    • pp.101-119
    • /
    • 2003
  • Achieving the beneficiary goal of recent discovery in human genome project still needs a way to retrieve and analyze the exponentially expanding bio-related information. Research on bio-related fields naturally applies knowledge discovered to the current problem and make inferences to extract new information where shared concepts and data containing information need to be defined and used in a coherent way. In such a professional domain, while the need to help users reduce their work and to improve search results has been emerged, methods for systematic retrieval and adequate exchange of relevant information are still in their infancy. The design of our system aims at improving the quality of information retrieval in a professional domain by utilizing both corpus-based and concept-based ontology. Meta-rules of helping users to make an adequate query are formed into an ontology in the domain. The integration of those knowledge permits the system to retrieve relevant information in a more semantic and systematic fashion. This work mainly describes the query models with details of GUI and a secondary query generation of the system.

  • PDF

Protection of Location Privacy for Spatio-Temporal Query Processing Using R-Trees (R-트리를 활용한 시공간 질의 처리의 위치 개인정보 보호 기법)

  • Kwon, Dong-Seop
    • The Journal of Society for e-Business Studies
    • /
    • v.15 no.3
    • /
    • pp.85-98
    • /
    • 2010
  • The prevailing infrastructure of ubiquitous computing paradigm on the one hand making significant development for integrating technology in the daily life but on the other hand raising concerns for privacy and confidentiality. This research presents a new privacy-preserving spatio-temporal query processing technique, in which location based services (LBS) can be serviced without revealing specific locations of private users. Existing location cloaking techniques are based on a grid-based structures such as a Quad-tree and a multi-layered grid. Grid-based approaches can suffer a deterioration of the quality in query results since they are based on pre-defined size of grids which cannot be adapted for variations of data distributions. Instead of using a grid, we propose a location-cloaking algorithm which uses the R-tree, a widely adopted spatio-temporal index structure. The proposed algorithm uses the MBRs of leaf nodes as the cloaked locations of users, since each leaf node guarantees having not less than a certain number of objects. Experimental results show the superiority of the proposed method.

Semantic-based Keyword Search System over Relational Database (관계형 데이터베이스에서의 시맨틱 기반 키워드 탐색 시스템)

  • Yang, Younghyoo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.12
    • /
    • pp.91-101
    • /
    • 2013
  • One issue with keyword search in general is its ambiguity which can ultimately impact the effectiveness of the search in terms of the quality of the search results. This ambiguity is primarily due to the ambiguity of the contextual meaning of each term in the query. In addition to the query ambiguity itself, the relationships between the keywords in the search results are crucial for the proper interpretation of the search results by the user and should be clearly presented in the search results. We address the keyword search ambiguity issue by adapting some of the existing approaches for keyword mapping from the query terms to the schema terms/instances. The approaches we have adapted for term mapping capture both the syntactic similarity between the query keywords and the schema terms as well as the semantic similarity of the two and give better mappings and ultimately 50% raised accurate results. Finally, to address the last issue of lacking clear relationships among the terms appearing in the search results, our system has leveraged semantic web technologies in order to enrich the knowledgebase and to discover the relationships between the keywords.

A Design of Book Retrieval System for Electronic Commerce in based Web (웹 기반의 전자상거래를 위한 도서검색 시스템 설계)

  • Ha, Chu-Ja;Jeong, Jong-Geun;Park, Jong-Hun;Kim, Chul-Won
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.1
    • /
    • pp.659-662
    • /
    • 2005
  • XML is standard of web document, and is used in language for document data exchange. XML document is used as example that change existing document to XML or makes new document by XML increases and XML search system to search XML document efficiently accordingly is requiring. This paper describes design and implementation of query processing system for translating XML elements and data between XML documents and relational database and consist of XML to DB processor, DB to XML processor and XML document management processor. Through this, described for design and embodiment of efficient XML document search system of JAVA base using XQL that is proposed in language of quality of XML document.

  • PDF

AN EFFICIENT DENSITY BASED ANT COLONY APPROACH ON WEB DOCUMENT CLUSTERING

  • M. REKA
    • Journal of applied mathematics & informatics
    • /
    • v.41 no.6
    • /
    • pp.1327-1339
    • /
    • 2023
  • World Wide Web (WWW) use has been increasing recently due to users needing more information. Lately, there has been a growing trend in the document information available to end users through the internet. The web's document search process is essential to find relevant documents for user queries.As the number of general web pages increases, it becomes increasingly challenging for users to find records that are appropriate to their interests. However, using existing Document Information Retrieval (DIR) approaches is time-consuming for large document collections. To alleviate the problem, this novel presents Spatial Clustering Ranking Pattern (SCRP) based Density Ant Colony Information Retrieval (DACIR) for user queries based DIR. The proposed first stage is the Term Frequency Weight (TFW) technique to identify the query weightage-based frequency. Based on the weight score, they are grouped and ranked using the proposed Spatial Clustering Ranking Pattern (SCRP) technique. Finally, based on ranking, select the most relevant information retrieves the document using DACIR algorithm.The proposed method outperforms traditional information retrieval methods regarding the quality of returned objects while performing significantly better in run time.

Trends in Recent Studies on Post-Harvest Technology

  • Natsuga, Motoyasu
    • Journal of Biosystems Engineering
    • /
    • v.40 no.4
    • /
    • pp.359-367
    • /
    • 2015
  • Purpose: This article summarizes the trends in recent research publications in relation to post-harvest technology for drying, storage, and quality, between 2005 and 2015. Methods: As of S eptember 7, 2015, a s earch query using two keywords, drying and agriculture, on the Web of Science (Registered trademark of Thomson Reuters) resulted in 3749 articles that were published between 2005 and 2015. However, the review was restricted to research articles published in the journals Transactions of the ASABE (American Society of Agricultural and Biological Engineers) and Biosystems Engineering: Journal of European Agricultural Engineering. Results: The total number of articles in the two journals related to drying, storage, and quality was 500, 319, and 885, respectively. The number of articles related to drying, storage, and quality was 250, 177, and 250, respectively, in Transactions of the ASABE. The number of articles related to drying, storage, and quality was 250, 142, and 283, respectively, in Biosystems Engineering. Conclusions: A shift in research focus from drying and storage to quality in Transactions of the ASABE might reflect a shift toward quality-conscious consumers. It seems that ASABE members are more focused on articles related to post-harvest technologies on quality than their European counterparts. Articles were cited based on their abstract content. Readers should read the full articles for more details.

A PageRank based Data Indexing Method for Designing Natural Language Interface to CRM Databases (분석 CRM 실무자의 자연어 질의 처리를 위한 기업 데이터베이스 구성요소 인덱싱 방법론)

  • Park, Sung-Hyuk;Hwang, Kyeong-Seo;Lee, Dong-Won
    • CRM연구
    • /
    • v.2 no.2
    • /
    • pp.53-70
    • /
    • 2009
  • Understanding consumer behavior based on the analysis of the customer data is one essential part of analytic CRM. To do this, the analytic skills for data extraction and data processing are required to users. As a user has various kinds of questions for the consumer data analysis, the user should use database language such as SQL. However, for the firm's user, to generate SQL statements is not easy because the accuracy of the query result is hugely influenced by the knowledge of work-site operation and the firm's database. This paper proposes a natural language based database search framework finding relevant database elements. Specifically, we describe how our TableRank method can understand the user's natural query language and provide proper relations and attributes of data records to the user. Through several experiments, it is supported that the TableRank provides accurate database elements related to the user's natural query. We also show that the close distance among relations in the database represents the high data connectivity which guarantees matching with a search query from a user.

  • PDF

Dynamic Load Management Method for Spatial Data Stream Processing on MapReduce Online Frameworks (맵리듀스 온라인 프레임워크에서 공간 데이터 스트림 처리를 위한 동적 부하 관리 기법)

  • Jeong, Weonil
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.8
    • /
    • pp.535-544
    • /
    • 2018
  • As the spread of mobile devices equipped with various sensors and high-quality wireless network communications functionsexpands, the amount of spatio-temporal data generated from mobile devices in various service fields is rapidly increasing. In conventional research into processing a large amount of real-time spatio-temporal streams, it is very difficult to apply a Hadoop-based spatial big data system, designed to be a batch processing platform, to a real-time service for spatio-temporal data streams. This paper extends the MapReduce online framework to support real-time query processing for continuous-input, spatio-temporal data streams, and proposes a load management method to distribute overloads for efficient query processing. The proposed scheme shows a dynamic load balancing method for the nodes based on the inflow rate and the load factor of the input data based on the space partition. Experiments show that it is possible to support efficient query processing by distributing the spatial data stream in the corresponding area to the shared resources when load management in a specific area is required.

Adaptive Buffer Control over Disordered Streams (비순서화된 스트림 처리를 위한 적응적 버퍼 제어 기법)

  • Kim, Hyeon-Gyu;Kim, Cheol-Gi;Lee, Chung-Ho;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.34 no.5
    • /
    • pp.379-388
    • /
    • 2007
  • Disordered streams may cause inaccurate or delayed results in window-based queries. Existing approaches usually leverage buffers to hand]e the streams. However, most of the approaches estimate the buffer size simply based on the maximum network delay in the streams, which tends to over-estimate the buffer size and result in high latency. In this paper, we propose a probabilistic approach to estimate the buffer size adaptively according to the fluctuated network delays. We first assume that intervals of tuple generations follow an exponential distribution and network delays have a normal distribution. Then, we derive an estimation function from the assumptions. The function takes a drop ratio as an input parameter, which denotes a percentage of tuple drops permissible during query execution. By describing the drop ratio in a query specification, users can control the quality of query results such as accuracy or latency according to application requirements. Our experimental results show that the proposed function has better adaptivity than the existing function based on the maximum network delay.