• Title/Summary/Keyword: Web Document

Search Result 759, Processing Time 0.025 seconds

Techniques for Location Mapping and Querying of Geo-Texts in Web Documents (웹 문서상의 공간 텍스트 위치 맵핑과 질의 기법)

  • Ha, Tae Seok;Nam, Kwang Woo
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.27 no.3
    • /
    • pp.1-10
    • /
    • 2022
  • With the development of web technology, large amounts of web documents are being produced. This web document contains various spatial texts, and by converting these texts into spatial information, it is the basis for searching for text documents with spatial query. These spatial texts consist of a wide range of areas, including postal codes and local phone numbers, as well as administrative place names and POI names. This paper presents algorithms that can map locations based on spatial text information existing within web documents. Through these algorithms, web documents can be searched for documents describing the region on a map rather than a general web search. In this paper, we demonstrated the presented algorithms are useful by implementing a web geo-text query system.

Personalized web searching with Reinforcement Learning (강화학습을 사용한 개인화된 웹 검색)

  • 이승준;장병탁
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.12a
    • /
    • pp.259-262
    • /
    • 2001
  • 본 논문에서는 사용자의 취향에 맞춰 특정 웹 문서를 탐색하는 개인화된 웹 검색기의 구현을 다룬다. 사용자의 취향은 사용자의 직접적인 평가와 사용자의 검색 과정을 통해 얻어지는 간접적인 평가를 사용한 강화 학습을 사용하여 학습된다. 웹 문서의 검색은 사용자의 취향과 현재 문서와의 관련 도를 보상으로 사용한 강화 학습을 통하여 이루어진다.

  • PDF

시공 PM 시스템(Construction PM 시스템)

  • Choi Sungwoon
    • Proceedings of the Safety Management and Science Conference
    • /
    • 2005.05a
    • /
    • pp.137-146
    • /
    • 2005
  • This paper describes web-based construction project management and project document information flows within the construction industry. This study is to consider the practice for process improvement including reengineering and the engineering project practice. CCPM based on TOC is introduced.

  • PDF

A Study on the online of PDF Electronic Documents System (인터넷 원거리출판의 응용과 PDF의 인쇄활용에 관한 연구)

  • 유영수;강영립;김병현;이광수
    • Proceedings of the Korean Printing Society Conference
    • /
    • 2001.06a
    • /
    • pp.63-77
    • /
    • 2001
  • PDF(Portable Document Format) is a file format that Adobe advances postscritp technique and use in managing document information or electric publishing(internet, CD-ROM, DVD). PDF is a devised document type for being able to read and print anywhere, independent of OS, printer type, resolution, and the kind of computer etc. Because this includes a compressing function, it transfers document through a small size of file in internet or intranet. In addition, that is a file format has various advantages-sharing of information and transfering documents in on line or off line environment. In this paper, we developed electronic document system using PDF format. Electronic document system consists of filter, automatic indexing, special searching system and web server. The information used in this paper is database made using Zwon\`s DocuCom. The filter recognizes various kinds of document structure. And according to property of document, it produces ASCII output. In addition to processing various formats of document, the filter can extract keywords in documents of MS WORD, Excel, Powerpoint, PDF, CAD etc. This filter uses the structure of window printer drive and can extract the information for text, page, font type and size from relevant document. The automatic indexing recognizes the formatted tag of document form ASCII text produced by filter and extracts adequate keyword to structure and property of document. PDF electronic document systems proposed in this paper can be used in Internet, PC communication. Users can choose and read electronic documents by two ways. First, users can choose and read relevant books using PDF electronic document homepage. Second, users can use PDF integrated-search system. User can search after inputing keyword and choose reference field and type of data. But, now, PDF products of Adobe can\`t support the Korean character. If this problem is resolved, we thick that PDF applications system looks active. Although there is limited function in case of using Zwon DocuCom used in this study, we think that there isn\`t a great deal of difficulty in electronic document and building digital database.

  • PDF

Object Modeling for Mapping from XML Document and Query to UML Class Diagram based on XML-GDM (XML-GDM을 기반으로 한 UML 클래스 다이어그램으로 사상을 위한 XML문서와 질의의 객체 모델링)

  • Park, Dae-Hyun;Kim, Yong-Sung
    • The KIPS Transactions:PartD
    • /
    • v.17D no.2
    • /
    • pp.129-146
    • /
    • 2010
  • Nowadays, XML has been favored by many companies internally and externally as a means of sharing and distributing data. there are many researches and systems for modeling and storing XML documents by an object-oriented method as for the method of saving and managing web-based multimedia document more easily. The representative tool for the object-oriented modeling of XML documents is UML (Unified Modeling Language). UML at the beginning was used as the integrated methodology for software development, but now it is used more frequently as the modeling language of various objects. Currently, UML supports various diagrams for object-oriented analysis and design like class diagram and is widely used as a tool of creating various database schema and object-oriented codes from them. This paper proposes an Efficinet Query Modelling of XML-GL using the UML class diagram and OCL for searching XML document which its application scope is widely extended due to the increased use of WWW and its flexible and open nature. In order to accomplish this, we propose the modeling rules and algorithm that map XML-GL. which has the modeling function for XML document and DTD and the graphical query function about that. In order to describe precisely about the constraint of model component, it is defined by OCL (Object Constraint Language). By using proposed technique creates a query for the XML document of holding various properties of object-oriented model by modeling the XML-GL query from XML document, XML DTD, and XML query while using the class diagram of UML. By converting, saving and managing XML document visually into the object-oriented graphic data model, user can prepare the base that can express the search and query on XML document intuitively and visually. As compared to existing XML-based query languages, it has various object-oriented characteristics and uses the UML notation that is widely used as object modeling tool. Hence, user can construct graphical and intuitive queries on XML-based web document without learning a new query language. By using the same modeling tool, UML class diagram on XML document content, query syntax and semantics, it allows consistently performing all the processes such as searching and saving XML document from/to object-oriented database.

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

Evaluation of Mobile Unified Search Contents of Naver and Google Korea (네이버와 구글의 모바일 통합 검색 컨텐츠 평가)

  • Park, So-Yeon
    • Journal of Korean Library and Information Science Society
    • /
    • v.42 no.4
    • /
    • pp.263-280
    • /
    • 2011
  • This study aims to investigate current status of mobile search services of Korean search portals, and analyze mobile unified search contents of Naver and Google Korea. In particular, this study analyzed characteristics of mobile unified search such as number of retrieved documents, collection distribution, and yearly distribution. Also, documents were evaluated in terms of relevance, credibility, and currency. This study compared quality of Naver's unified Web best and unified Web, and Google's best Web documents and Web documents. The correlation between document's ranking and document's relevance was analyzed. The results of this study can be implemented to the portal's effective development of mobile search service.

Clustering of Web Document Exploiting with the Union of Term frequency and Co-link in Hypertext (단어빈도와 동시링크의 결합을 통한 웹 문서 클러스터링 성능 향상에 관한 연구)

  • Lee, Kyo-Woon;Lee, Won-hee;Park, Heum;Kim, Young-Gi;Kwon, Hyuk-Chul
    • Journal of Korean Library and Information Science Society
    • /
    • v.34 no.3
    • /
    • pp.211-229
    • /
    • 2003
  • In this paper, we have focused that the number of word in the web document affects definite clustering performance. Our experimental results have clearly shown the relationship between the amounts of word and its impact on clustering performance. We also have presented an algorithm that can be supplemented of the contrast portion through co-links frequency of web documents. Testing bench of this research is 1,449 web documents included on 'Natural science' category among the Naver Directory. We have clustered these objects by term-based clustering, link-based clustering, and hybrid clustering method, and compared the output results with originally allocated category of Naver directory.

  • PDF

The WSDL Framework Extension for Business Service Inter-Operation (비즈니스 서비스 상호운용을 위한 WSDL의 확장 체계에 관한 연구)

  • Lee, Jong-Ok;Jung, Min-Ho
    • The Journal of Society for e-Business Studies
    • /
    • v.13 no.4
    • /
    • pp.17-32
    • /
    • 2008
  • To support business service interoperability, it is necessary to extend WSDL and develop Business Service Document(BSD) to contain various business service informations. W3C delegates extension of WSDL to the user groups for their usages and objectives. Therefore this article defines BSD, which is extended version of WSDL. This article also presents Business Web Service Framework(BWSF), which supports business service interoperability and uses BSD. BSD Creator is developed to create correct, valid and well-formed BSD which is core component of BWSF. This article is expected to be used as base concepts for industrial adoption of business service interoperability architecture, and it is also expected to contribute for revitalize business service interoperability.

  • PDF

Web Site Construction Using Internet Information Extraction (인터넷 정보 추출을 이용한 웹문서 구조화)