• Title/Summary/Keyword: 데이터 파싱 알고리즘

Search Result 9, Processing Time 0.024 seconds

Automatic Data Augmentation for Korean AMR Sembanking & Parsing (한국어 의미 자원 구축 및 의미 파싱을 위한 Korean AMR 데이터 자동 증강)

  • Choe, Hyonsu;Min, Jinwoo;Na, Seung-Hoon;Kim, Hansaem
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.287-291
    • /
    • 2020
  • 본 연구에서는 한국어 의미 표상 자원 구축과 의미 파싱 성능 향상을 위한 데이터 자동 증강 방법을 제안하고 수동 구축 결과 대비 자동 변환 정확도를 보인다. 지도 학습 기반의 AMR 파싱 모델이 유의미한 성능에 도달하려면 대량의 주석 데이터가 반드시 필요하다. 본 연구에서는 기성 언어 분석 기술 또는 기존에 구축된 말뭉치의 주석 정보를 바탕으로 Semi-AMR 데이터를 변환해내는 알고리즘을 제시하며, 자동 변환 결과는 Gold-standard 데이터에 대해 Smatch F1 0.46의 일치도를 보였다. 일정 수준 이상의 정확도를 보이는 자동 증강 데이터는 주석 프로젝트에 소요되는 비용을 경감시키는 데에 활용될 수 있다.

  • PDF

Customized Search System using Real-time Contexts of User (사용자의 실시간 상황정보를 이용한 사용자 맞춤 검색 시스템)

  • Kwon, Mi-Rim;Hong, Kwang-Jin;Jung, Kee-Chul
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.21 no.5
    • /
    • pp.19-30
    • /
    • 2016
  • In these days, people get information from internet easily. However, there are too many information. It makes interrupt and inefficient for searching data. Therefore, we need user customized web search system which provides appropriate information. In this paper, we propose a searching system that can collect semi-automatically conditions of users such as weather, location and time and provide essential information to users. Using these context data, the proposed system can understand what information users want in specific situations and can provide more useful information to users than existing systems. The proposed system based on 'Production/Sharing Service of Personal Korean Contents with Voluntary Sharing Economy System' and we add data parsing algorithm in each input, store and search part. In the experiments, we compare and analyze the results of existing system and the proposed system using some general key words.

Probabilistic Dependency Grammar Induction using Internal Dependency Relation in Words (어절 내부 의존관계를 고려한 확률 의존 문법 학습)

  • Choi, Seon-Hwa;Park, Hyuk-Ro
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2001.10a
    • /
    • pp.507-510
    • /
    • 2001
  • 본 논문에서는 코퍼스를 이용한 확률 의존문법 자동 생성 기술을 다룬다. 특히 의존 문법 생성을 위해 확률 재추정 알고리즘을 의존문법생성에 맞도록 변형하여 학습하였으며 정확한 문법 생성 및 회귀데이터(Data Sparseness)문제 해결을 위해서 구성요소의 대표 지배소들 간의 의존관계 만을 학습했던 기존 연구와는 달리 구성요소 내부의 의존관계까지 학습하는 방법을 제안한다. KAIST 의 트리 부착 코퍼스 31,086 문장에서 추출한 25,000 문장의 Tagged Corpus 을 가지고 한국어 확률 의존 문법 학습을 시도 하였다. 그 결과 초기문법을 10.97% 에서 23.73% 까지 줄인 2,349 개의 정확한 문법을 얻을 수 있었다. 문법의 정확성을 실험 하기 위해 350 개의 실험문장을 Parsing 한 결과 69.61%의 파싱 정확도를 보였다. 이로서 구성요소 내부의 의존관계 학습으로 얻어진 의존문법이 더 정확했으며, 회귀데이터 문제 또한 극복할 수 있음을 알 수 있었다.

  • PDF

Algorithm Embodiment for XQuery2SQL Converter (XQuery2SQL 변환기 위한 알고리즘 구현)

  • 서현호;김영국;김덕만
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2004.05a
    • /
    • pp.335-341
    • /
    • 2004
  • HTML that is language that web technology is center expression these day that use of internet and quantity of information by fast development increase rapidly brought limit to use information of web and XML that express meaning or corelation of data itself in W3C by standard for free document transmission and exchange in World Wide Web by the alternative as long as is deviation appeared. There is many efforts to use storing this XML document in RDBMS but to relation style DB because XML document is tree structure structurally data SQL and perfect disaster caused by things that is language to ask a question accomplish. In this paper XML document XML informations that is stored to RDBMS via Parsing and DOM tree process SQL quality through converter called XQuery2SQL of by change and embody XQuery2SQL conversion algorithm that draw information in RDBMS.

  • PDF

The signal processing algorithm of the Missile Flight Test Launch Control System (비행시험 발사통제 시스템의 신호처리 알고리즘)

  • Oh, Jino
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.8
    • /
    • pp.1965-1972
    • /
    • 2015
  • The Missile Flight Test Launch Control System is to operate in conjunction with the Fire Control System during flight test to guided weapons. Also, this is a system for the test control and situation monitoring depending on the type of guided weapons and testing purposes. Message structure, communication protocols, such as data types for interworking with the fire control system and the Missile Flight Test Launch Control System are defined in the Launch Control ICD(Interface Control Document). ICD are composed differently of each guided weapons system and each test object. Previously, in order to interwork with the Fire Control System, the interlocking software was developed, which had a variety of problems. Therefore, we developed a new parsing algorithm in order to recognize the variety of Launch Control ICD and verified that the algorithm operates normally by checking transmitting and receiving various message in conjunction with the fire control system.

A Study to Improve Recovery Ratio of Deleted File Using the Parsing Algorithm of the HFS + Journal File (HFS+ 저널 파일 파싱 알고리즘을 이용한 삭제된 파일 복구 기법 향상 방안)

  • Bang, Seung Gyu;Jeon, Sang Jun;Kim, Do Hyun;Lee, Sang Jin
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.12
    • /
    • pp.463-470
    • /
    • 2016
  • With the growing demand for MAC-based system, the need for digital forensic techniques of these system has been increasing. In the digital forensic analysis process, sometimes analysts have recovered the deleted files when they prove the allegations if system user try to remove the evidence deliberately. Research and analysis that recover the deleted files from a file system constantly been made and HFS+ that is a file system of MAC-based system also has been researched. Carving techniques primarily has been used to recover the deleted file from HFS+ a file system because metadata of folder or file overwrite metadata of a deleted file when file is deleted from a file system on HFS+ characteristic. But if the file content is saved by separated state in a file system, Carving techniques also can't recover the whole or a part of the deleted file. In this paper we describe technique the deleted file recovery technique using HFS+ file system a journal. This technique that is suggested by existing research and analysis result is the technique that recover the deleted file by metadata that is maintained in a journal on HFS+ file system. but this technique excludes specific files and this problem needs to be reformed. In this paper we suggest algorithm that analysis a journal of HFS+ file system in detail. And we demonstrate that the deleted file cat be recovered from the extracted metadata by this algorithm without the excluded file.

The Research on Data Concealing and Detection of SQLite Database (SQLite 데이터베이스 파일에 대한 데이터 은닉 및 탐지 기법 연구)

  • Lee, Jae-hyoung;Cho, Jaehyung;Hong, Kiwon;Kim, Jongsung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.27 no.6
    • /
    • pp.1347-1359
    • /
    • 2017
  • SQLite database is a file-based DBMS(Database Management System) that provides transactions, and it is loaded on smartphone because it is appropriate for lightweight platform. AS the usage of smartphone increased, SQLite-related crimes can occur. In this paper, we proposed a new concealing method for SQLite db file and a detection method against it. As a result of concealing experiments, it is possible to intentionally conceal 70bytes in the DB file header and conceal original data by inserting artificial pages. But it can be detected by parsing 70bytes based on SQLite structure or using the number of record and index. After that, we proposed detection algorithm for concealed data.

Three-Phase English Syntactic Analysis for Improving the Parsing Efficiency (영어 구문 분석의 효율 개선을 위한 3단계 구문 분석)

  • Kim, Sung-Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.1
    • /
    • pp.21-28
    • /
    • 2016
  • The performance of an English-Korean machine translation system depends heavily on its English parser. The parser in this paper is a part of the rule-based English-Korean MT system, which includes many syntactic rules and performs the chart-based parsing. The parser generates too many structures due to many syntactic rules, so much time and memory are required. The rule-based parser has difficulty in analyzing and translating the long sentences including the commas because they cause high parsing complexity. In this paper, we propose the 3-phase parsing method with sentence segmentation to efficiently translate the long sentences appearing in usual. Each phase of the syntactic analysis applies its own independent syntactic rules in order to reduce parsing complexity. For the purpose, we classify the syntactic rules into 3 classes and design the 3-phase parsing algorithm. Especially, the syntactic rules in the 3rd class are for the sentence structures composed with commas. We present the automatic rule acquisition method for 3rd class rules from the syntactic analysis of the corpus, with which we aim to continuously improve the coverage of the parsing. The experimental results shows that the proposed 3-phase parsing method is superior to the prior parsing method using only intra-sentence segmentation in terms of the parsing speed/memory efficiency with keeping the translation quality.

Development of Geocoding and Reverse Geocoding Method Implemented for Street-based Addresses in Korea (우리나라 도로명주소를 활용한 지오코딩 및 역 지오코딩 기법 개발)

  • Seok, Sangmuk;Lee, Jiyeong
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.34 no.1
    • /
    • pp.33-42
    • /
    • 2016
  • In Korea, the address-point matching technique has been used to provide geocoding services. In fact, this technique brings the high positional accuracy. However, the quality of geocoding result can be limited, since it is significantly affected by data quality. Also, it cannot be used for the 3D address geocoding and the reverse geocoding. In order to alleviate issues, the paper has implemeted proposed geocoding methods, based on street-based addresses matching technique developed by US census bureau, for street-based addresses in Korea. Those proposed geocoding methods are illustrated in two ways; (1) street address-matching method, which of being used for not only 2D addresses representing a single building but also 3D addresses representing indoor space or underground building, and (2) reverse geocoding method, whichas converting a location point to a readable address. The result of street-based address geocoding shows 82.63% match rates, while the result of reverse geocoding shows 98.5% match rates within approximately 1.7(m) the average position error. According to the results, we could conclude that the proposed geocoding techniques enable to provide the LBS(Location Based Service). To develop the geocoding methods, the study has perfoermed by ignoring the parsing algorithms for address standardization as well as the several areas with unusual addresses, such as sub-urban areas or subordinate areas to the roads, etc. In the future, we are planning the improved geocoding methods for considering these cases.