• Title/Summary/Keyword: On-Line Document

Search Result 99, Processing Time 0.028 seconds

A Knowledge-based Wrapper Learning Agent for Semi-Structured Information Sources (준구조화된 정보소스에 대한 지식기반의 Wrapper 학습 에이전트)

  • Seo, Hee-Kyoung;Yang, Jae-Young;Choi, Joong-Min
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.1_2
    • /
    • pp.42-52
    • /
    • 2002
  • Information extraction(IE) is a process of recognizing and fetching particular information fragments from a document. In previous work, most IE systems generate the extraction rules called the wrappers manually, and although this manual wrapper generation may achieve more correct extraction, it reveals some problems in flexibility, extensibility, and efficiency. Some other researches that employ automatic ways of generating wrappers are also experiencing difficulties in acquiring and representing useful domain knowledge and in coping with the structural heterogeneity among different information sources, and as a result, the real-world information sources with complex document structures could not be correctly analyzed. In order to resolve these problems, this paper presents an agent-based information extraction system named XTROS that exploits the domain knowledge to learn from documents in a semi-structured information source. This system generates a wrapper for each information source automatically and performs information extraction and information integration by applying this wrapper to the corresponding source. In XTROS, both the domain knowledge and the wrapper are represented as XML-type documents. The wrapper generation algorithm first recognizes the meaning of each logical line of a sample document by using the domain knowledge, and then finds the most frequent pattern from the sequence of semantic representations of the logical lines. Eventually, the location and the structure of this pattern represented by an XML document becomes the wrapper. By testing XTROS on several real-estate information sites, we claim that it creates the correct wrappers for most Web sources and consequently facilitates effective information extraction and integration for heterogeneous and complex information sources.

구문 및 의미 분석을 통한 한국어 자동 색인

  • 최기선
    • Journal of the Korean Society for information Management
    • /
    • v.8 no.2
    • /
    • pp.96-107
    • /
    • 1991
  • The inherent limitation of the conventional approaches in automatic indexing lies in the fact that they compute the relevancy between index terms and documents rather indirectly or relatively. As an alternative the anlaysis of document texts seeks a means of establishing a direct relevancy of the terms. More rigorous linguistic analysis will ensure better chance of relevancy. Various semantic topologies among terms may suggest the sufficient quality for relevancy. The enhanced and guaranteed relevance will allow the high precision of retrieval. Along with this line the on going project in KAIST pursues the user oriented retrieval system that spawns still may other issues that are not c o m n in traditional perspective.

  • PDF

Nexus between Inflation, Inflation Perceptions and Expectations

  • NAM, MINHO;GO, MINJI
    • KDI Journal of Economic Policy
    • /
    • v.40 no.3
    • /
    • pp.45-68
    • /
    • 2018
  • We uncover a nexus between actual inflation, inflation perceptions and expectations in Korea through analyzing micro as well as aggregate data from the Consumer Survey. We document two novel findings. First, households' subjective perceptions of inflation exert more impact on expectation formation than actual inflation. Second, inflation perceptions are broadly in line with the trajectory of the inflation trend. This is attributable to the fact that changes in actual inflation have been generated mainly by the consumption items whose price changes are perceived more sensitively as those items are frequently bought or have a larger share in household expenditures. Conducting a cross-country comparison, we find that information rigidity in expectation formation process and the nexus between perceptions and expectations of inflation prove to be stronger in Korea. Additionally, we reconfirm the existing finding that the scope of information utilized for forming inflation expectations is fairly circumscribed.

A Study on the Parallel Control(Change) at the Total Traffic Control (종합열차운행제어의 병행 운전(교체) 방안에 관한 고찰)

  • Kim, Jung-Su;Lee, Jae-Nam;Lee, Gi-Seung;An, Hyun-Jun
    • Proceedings of the KSR Conference
    • /
    • 2006.11b
    • /
    • pp.675-682
    • /
    • 2006
  • The command of Subway is intended to adjust and control the train operation, and to play a key role of the total passenger transport and all kinds of affairs related to the safe train operation. Also, this can be considered as affairs to be controlled and operated by Total Traffic Control. For the purpose of developing the ATO system in using the new control technology by substituting the conventional ATS equipment, this technical document includes the technical points resulted from doing the replacement construction in the entire section of Subway Line 2 by Seoul Metro. The replacement work with the new ATO System should neither stop nor affect the system under operation while operating the current ATS System. The different systems should not interfere with each other while performing their individual affairs, and be composed to share the important data for the parallel operation. This technology is needed to proceed in assurance of a high degree of reliability.

  • PDF

Design of System for Prevent Forgery of Digital Document on Off-Line (오프라인상에서의 전자문서 위변조 방지 시스템 설계)

  • 이윤오;유황빈
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04a
    • /
    • pp.503-505
    • /
    • 2003
  • 현재 인터넷을 통한 상대방의 신뢰성을 보장해 주는 인증서 사용이 빈번해지고 있다. 그러나 오프라인상의 전자문서는 상대방의 신뢰성 보장과 전자문서의 위변조의 위험성이 많다. 또한 전자문서는 오프라인상의 이동성에 제악을 받게 된다. 본 논문에서는 이러한 문제를 해결하고, 사용자가 온라인과 오프라인에서도 사용하게 편리하도록 문서내용, 문서작성자의 인증서 그리고 전자서명값을 이차원 바코드로 변환하여 출력된 전자문서에서 상대방의 신뢰성과 문서의 무결성을 보장하도록 제안한다. 제안된 시스템에서는 문서내용, 문서작성자의 인증서, 전자서명값을 변환해 출력문서에 이차원 바코드를 첨부하게 된다. 출력된 문서에서 첨부된 이차원 바코드를 스케닝 하고 문서내용, 문서작성자의 인증서 그리고 전자서명값을 얻어오고 검증을 통해 위변조 여부 판단하여 상대방의 신뢰성과 문서의 무결성을 확인 하도록 한다.

  • PDF

XML Document Analysis based on Similarity (유사성 기반 XML 문서 분석 기법)

  • Lee, Jung-Won;Lee, Ki-Ho
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.6
    • /
    • pp.367-376
    • /
    • 2002
  • XML allows users to define elements using arbitrary words and organize them in a nested structure. These features of XML offer both challenges and opportunities in information retrieval and document management. In this paper, we propose a new methodology for computing similarity considering XML semantics - meanings of the elements and nested structures of XML documents. We generate extended-element vectors, using thesaurus, to normalize synonyms, compound words, and abbreviations and build similarity matrix using them. And then we compute similarity between XML elements. We also discover and minimize XML structure using automata(NFA(Nondeterministic Finite Automata) and DFA(Deterministic Finite automata). We compute similarity between XML structures using similarity matrix between elements and minimized XML structures. Our methodology considering XML semantics shows 100% accuracy in identifying the category of real documents from on-line bookstore.

Combined Feature Set and Hybrid Feature Selection Method for Effective Document Classification (효율적인 문서 분류를 위한 혼합 특징 집합과 하이브리드 특징 선택 기법)

  • In, Joo-Ho;Kim, Jung-Ho;Chae, Soo-Hoan
    • Journal of Internet Computing and Services
    • /
    • v.14 no.5
    • /
    • pp.49-57
    • /
    • 2013
  • A novel approach for the feature selection is proposed, which is the important preprocessing task of on-line document classification. In previous researches, the features based on information from their single population for feature selection task have been selected. In this paper, a mixed feature set is constructed by selecting features from multi-population as well as single population based on various information. The mixed feature set consists of two feature sets: the original feature set that is made up of words on documents and the transformed feature set that is made up of features generated by LSA. The hybrid feature selection method using both filter and wrapper method is used to obtain optimal features set from the mixed feature set. We performed classification experiments using the obtained optimal feature sets. As a result of the experiments, our expectation that our approach makes better performance of classification is verified, which is over 90% accuracy. In particular, it is confirmed that our approach has over 90% recall and precision that have a low deviation between categories.

A Study on Establishing Online Document Communication System by Means of Intranet Web Site (ODCS(Online Document Communication System)인트라넷 웹사이트 구축과정 및 사용자 효과 연구)

  • 양초산
    • Archives of design research
    • /
    • v.17 no.3
    • /
    • pp.167-178
    • /
    • 2004
  • The purpose of this treatise is to show merits and method of establishing Lotte department store design division Online Documents Communication System through illustration of examples of intranet in which internet environment convenient to use for its openness is applied for establishing Design Online Documents Communication System for fundamentals of organization. In this connection merits and effect attainable from establishing Design Outline Documents Communication System of the enterprise as found were as follows: Firstly, it brought about reduction in workload of staffs through sharing various existing resources. It reduced redundant works and enables speedy handling of works. Secondly, it was possible to exchange viewpoints and share information by pertinent parties. Thirdly, by expediting information exchange and communication among persons in charge it was possible to improve work efficiency. Fourthly, it was possible to build and operate such system at relatively low cost on the basis of web browser. Without using any other significant instrument or equipment but by linking it to business network and using existing computer system operation was possible. Fifthly, by common sharing of work exclusive to design room through on-line it was possible to improve professionalism and convenience in data preservation. Through this treatise and survey and study on process for establishing intranet it was possible to find that there were sharing work, improving work efficiency, reducing workload, saving cost and expediting communication to a significant degree.

  • PDF

A Design on Informal Big Data Topic Extraction System Based on Spark Framework (Spark 프레임워크 기반 비정형 빅데이터 토픽 추출 시스템 설계)

  • Park, Kiejin
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.521-526
    • /
    • 2016
  • As on-line informal text data have massive in its volume and have unstructured characteristics in nature, there are limitations in applying traditional relational data model technologies for data storage and data analysis jobs. Moreover, using dynamically generating massive social data, social user's real-time reaction analysis tasks is hard to accomplish. In the paper, to capture easily the semantics of massive and informal on-line documents with unsupervised learning mechanism, we design and implement automatic topic extraction systems according to the mass of the words that consists a document. The input data set to the proposed system are generated first, using N-gram algorithm to build multiple words to capture the meaning of the sentences precisely, and Hadoop and Spark (In-memory distributed computing framework) are adopted to run topic model. In the experiment phases, TB level input data are processed for data preprocessing and proposed topic extraction steps are applied. We conclude that the proposed system shows good performance in extracting meaningful topics in time as the intermediate results come from main memories directly instead of an HDD reading.

Design and Implementation of Adaptive Interaction-based Video Syllabus (적응적 상호작용기반 동영상 강의계획서 설계 및 구현)

  • Sim, Hyun;Choi, Won-Ho
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.12 no.4
    • /
    • pp.663-670
    • /
    • 2017
  • The purpose of this study is to define On-line Video Syllabus Template which is based on adaptive mode with interaction.A syllabus has the significance as a teaching and learning plan. However, it has not only been considered as a formal document, has also been limited into a simple query since it has been made into a fragmentary structure, lacking of link between other services and reuse. Additionally, this paper is aimed to design three-dimensional syllabus which makes it possible to provide students with practical information related to teaching and learning and share it with teachers and students. The following is the technique proposed in this paper. It is made to be served for the Syllabus centered on teaching and learning, which is including the definition of hierarchy structure, the media contents application according to the learner's preference and real-time variation function. On-line Video Syllabus provided through LMS has availability and credibility of teaching and learning, in that it enable increased utilization by strengthening convenience.