• Title/Summary/Keyword: 파싱 알고리즘

Search Result 42, Processing Time 0.023 seconds

Design and frnplernentation of a Query Processing Algorithm for Dtstributed Semistructlred Documents Retrieval with Metadata hterface (메타데이타 인터페이스를 이용한 분산된 반구조적 문서 검색을 위한 질의처리 알고리즘 설계 및 구현)

  • Choe Cuija;Nam Young-Kwang
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.6
    • /
    • pp.554-569
    • /
    • 2005
  • In the semistructured distributed documents, it is very difficult to formalize and implement the query processing system due to the lack of structure and rule of the data. In order to precisely retrieve and process the heterogeneous semistructured documents, it is required to handle multiple mappings such as 1:1, 1:W and W:1 on an element simultaneously and to generate the schema from the distributed documents. In this paper, we have proposed an query processing algorithm for querying and answering on the heterogeneous semistructured data or documents over distributed systems and implemented with a metadata interface. The algorithm for generating local queries from the global query consists of mapping between g1oba1 and local nodes, data transformation according to the mapping types, path substitution, and resolving the heterogeneity among nodes on a global input query with metadata information. The mapping, transformation, and path substitution algorithms between the global schema and the local schemas have been implemented the metadata interface called DBXMI (for Distributed Documents XML Metadata Interface). The nodes with the same node name and different mapping or meanings is resolved by automatically extracting node identification information from the local schema automatically. The system uses Quilt as its XML query language. An experiment testing is reported over 3 different OEM model semistructured restaurant documents. The prototype system is developed under Windows system with Java and JavaCC compiler.

A Program Similarity Evaluation Algorithm (프로그램 유사도 평가 알고리즘)

  • Kim Young-Chul;Hwang Seog-Chan;Choi Jaeyoung
    • Journal of Internet Computing and Services
    • /
    • v.6 no.1
    • /
    • pp.51-64
    • /
    • 2005
  • In this paper, we introduce a system for evaluating similarity of C program source code using method which compares syntax-trees each others. This method supposes two characteristic features as against other systems. It is not sensitive for program style such as indentation, white space, and comments, and changing order of control structure like sentences, code block, procedures, and so on. Another is that it can detect a syntax-error cause of using paring technique, We introduce algorithms for similarity evaluation method and grouping method that reduces the number of comparison, In the examination section, we show a test result of program similarity evaluation and its reduced iteration by grouping algorithm.

  • PDF

Algorithm Embodiment for XQuery2SQL Converter (XQuery2SQL 변환기 위한 알고리즘 구현)

  • 서현호;김영국;김덕만
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2004.05a
    • /
    • pp.335-341
    • /
    • 2004
  • HTML that is language that web technology is center expression these day that use of internet and quantity of information by fast development increase rapidly brought limit to use information of web and XML that express meaning or corelation of data itself in W3C by standard for free document transmission and exchange in World Wide Web by the alternative as long as is deviation appeared. There is many efforts to use storing this XML document in RDBMS but to relation style DB because XML document is tree structure structurally data SQL and perfect disaster caused by things that is language to ask a question accomplish. In this paper XML document XML informations that is stored to RDBMS via Parsing and DOM tree process SQL quality through converter called XQuery2SQL of by change and embody XQuery2SQL conversion algorithm that draw information in RDBMS.

  • PDF

Document Filtering Algorithm for Efficient Preprocessing of XML Information Retrieval (XML 정보검색의 효율적 전처리를 위한 문서여과 알고리즘)

  • Kong Yong-Hae;Kim Myung-Sook
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.6 no.1
    • /
    • pp.1-11
    • /
    • 2005
  • The paper proposes a preprocessing method for efficient processing of XML queries in information retrieval with a large amount of XML documents. The conventional preprocessing methods filter out XML documents by parsing XML document for keyword of query or by comparing query signatures with signatures of XML document to be generated. But these methods are dependent on a query and are very in efficient for a large amount of XML documents. For this, we generate a universal DTD based on ontology of a domain. The universal DTD is applicable to the XML documents when they contain information of a same domain even when they have different structures and attributes. Then, using the universal DTD, we filter out the XML documents that are not bounded in the domain. We evaluate the performance of this method through experiments.

  • PDF

Implementation of a Korean Grammar Checker using Partial Sentence Analysis (부분 문장 분석을 이용한 한국어 문법 검사기 구현)

  • Kim, Hyun-Jin;Sim, Chul-Min;Kwon, Hyuk-Chul
    • Annual Conference on Human and Language Technology
    • /
    • 1996.10a
    • /
    • pp.469-475
    • /
    • 1996
  • 이 논문은 형태소 사이의 문법 관계(Grammar Relation)에 기반하여 형태소 간의 의존 관계를 규정하고, 이를 바탕으로 의미 오류와 문체를 검증하는 문법 검사기를 제시한다. 이 방법으로 다수 어절에 걸친 의미적 오류 뿐만 아니라 번역체 문구와 뜻의 전달을 어렵게 하는 문구 등과 같이 문장을 힘없게 만드는 문체 오류를 검증한다. 또한 이러한 오류를 검증하기 위한 지식베이스의 구현과 의존 문법(Dependency Structure Grammar)을 이용한 부분 문장 분석 알고리즘을 제시한다. 이 논문에서 제시한 문법 검사기는 향후 파싱 등의 문장 분석에 중요한 자료로 이용될 것으로 기대한다.

  • PDF

Probabilistic Dependency Grammar Induction using Internal Dependency Relation in Words (어절 내부 의존관계를 고려한 확률 의존 문법 학습)

  • Choi, Seon-Hwa;Park, Hyuk-Ro
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2001.10a
    • /
    • pp.507-510
    • /
    • 2001
  • 본 논문에서는 코퍼스를 이용한 확률 의존문법 자동 생성 기술을 다룬다. 특히 의존 문법 생성을 위해 확률 재추정 알고리즘을 의존문법생성에 맞도록 변형하여 학습하였으며 정확한 문법 생성 및 회귀데이터(Data Sparseness)문제 해결을 위해서 구성요소의 대표 지배소들 간의 의존관계 만을 학습했던 기존 연구와는 달리 구성요소 내부의 의존관계까지 학습하는 방법을 제안한다. KAIST 의 트리 부착 코퍼스 31,086 문장에서 추출한 25,000 문장의 Tagged Corpus 을 가지고 한국어 확률 의존 문법 학습을 시도 하였다. 그 결과 초기문법을 10.97% 에서 23.73% 까지 줄인 2,349 개의 정확한 문법을 얻을 수 있었다. 문법의 정확성을 실험 하기 위해 350 개의 실험문장을 Parsing 한 결과 69.61%의 파싱 정확도를 보였다. 이로서 구성요소 내부의 의존관계 학습으로 얻어진 의존문법이 더 정확했으며, 회귀데이터 문제 또한 극복할 수 있음을 알 수 있었다.

  • PDF

An Efficient Algorithm for HL7 Message Parsing (효율적인 HL7 메시지 파싱 알고리즘)

  • Tran, Tung;Kim, Hyung-Hoi;Cho, Hune;Kim, Hwa-Sun
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.55 no.6
    • /
    • pp.274-278
    • /
    • 2006
  • An upgraded algorithm that in proves the performance of existing interfacing software for parsing HL7 messages is introduced. It incorporates stack operations on objects to guarantee segment order while parsing messages. This object-oriented design greatly facilitates the complicated process of validating, parsing, and creating HL7 messages in the clinical setting. The new interface engine can manage all HL7 messages corresponding to admission and registration, discharge and transfer, laboratory results, clinical images, and clinical reports. The international version of this engine, currently under development, will be tested in Asian countries using standard character code such as Unicode (ISO 10646).

Design and Implementation of Advanced Web Log Preprocess Algorithm for Rule based Web IDS (룰 기반 웹 IDS 시스템을 위한 효율적인 웹 로그 전처리 기법 설계 및 구현)

  • Lee, Hyung-Woo
    • Journal of Internet Computing and Services
    • /
    • v.9 no.5
    • /
    • pp.23-34
    • /
    • 2008
  • The number of web service user is increasing steadily as web-based service is offered in various form. But, web service has a vulnerability such as SQL Injection, Parameter Injection and DoS attack. Therefore, it is required for us to develop Web IDS system and additionally to offer Rule-base intrusion detection/response mechanism against those attacks. However, existing Web IDS system didn't correspond properly on recent web attack mechanism because they didn't including suitable pre-processing procedure on huge web log data. Therfore, we propose an efficient web log pre-processing mechanism for enhancing rule based detection and improving the performance of web IDS base attack response system. Proposed algorithm provides both a field unit parsing and a duplicated string elimination procedure on web log data. And it is also possible for us to construct improved web IDS system.

  • PDF

The signal processing algorithm of the Missile Flight Test Launch Control System (비행시험 발사통제 시스템의 신호처리 알고리즘)

  • Oh, Jino
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.8
    • /
    • pp.1965-1972
    • /
    • 2015
  • The Missile Flight Test Launch Control System is to operate in conjunction with the Fire Control System during flight test to guided weapons. Also, this is a system for the test control and situation monitoring depending on the type of guided weapons and testing purposes. Message structure, communication protocols, such as data types for interworking with the fire control system and the Missile Flight Test Launch Control System are defined in the Launch Control ICD(Interface Control Document). ICD are composed differently of each guided weapons system and each test object. Previously, in order to interwork with the Fire Control System, the interlocking software was developed, which had a variety of problems. Therefore, we developed a new parsing algorithm in order to recognize the variety of Launch Control ICD and verified that the algorithm operates normally by checking transmitting and receiving various message in conjunction with the fire control system.

Semantic Scenes Classification of Sports News Video for Sports Genre Analysis (스포츠 장르 분석을 위한 스포츠 뉴스 비디오의 의미적 장면 분류)

  • Song, Mi-Young
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.5
    • /
    • pp.559-568
    • /
    • 2007
  • Anchor-person scene detection is of significance for video shot semantic parsing and indexing clues extraction in content-based news video indexing and retrieval system. This paper proposes an efficient algorithm extracting anchor ranges that exist in sports news video for unit structuring of sports news. To detect anchor person scenes, first, anchor person candidate scene is decided by DCT coefficients and motion vector information in the MPEG4 compressed video. Then, from the candidate anchor scenes, image processing method is utilized to classify the news video into anchor-person scenes and non-anchor(sports) scenes. The proposed scheme achieves a mean precision and recall of 98% in the anchor-person scenes detection experiment.

  • PDF