• Title/Summary/Keyword: data dictionary

Search Result 346, Processing Time 0.029 seconds

A study on the Character Correction of the Wrongly Recognized Sentence Marks, Japanese, English, and Chinese Character in the Off-line printed Character Recognition (오프라인 인쇄체 문장부호, 일본 문자, 영문자, 한자 인식에서의 오인식 문자 교 정에 관한 연구)

  • Lee, Byeong-Hui;Kim, Tae-Gyun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.1
    • /
    • pp.184-194
    • /
    • 1997
  • In the recent years number of commercial off-line character recognition systems have been appeared in the Korean market. This paper describes a "self -organizing" data structure for representing a large dictionary which can be searched in real time and uses a practical amount of memory, and presents a study on the character correction for off-line printed sentence marks, Japanese, English, and Chinese character recognition. Self-organizing algorithm can be recommenced as particularly appropriate when we have reasons to suspect that the accessing probabilities for individual words will change with time and theme. The wrongly recognized characters generated by OCR systems are collected and analyzed Error types of English characters are reclassified and 0.5% errors are corrected using an English character confusion table with a self-organizing dictionary containing 25,145 English words. And also error types of Chinese characters are classified and 6.1% errors are corrected using a Chinese character confusion table with a self-organizing dictionary carrying 34,593 Chinese words.ese words.

  • PDF

Korean Compound Noun Decomposition and Semantic Tagging System using User-Word Intelligent Network (U-WIN을 이용한 한국어 복합명사 분해 및 의미태깅 시스템)

  • Lee, Yong-Hoon;Ock, Cheol-Young;Lee, Eung-Bong
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.63-76
    • /
    • 2012
  • We propose a Korean compound noun semantic tagging system using statistical compound noun decomposition and semantic relation information extracted from a lexical semantic network(U-WIN) and dictionary definitions. The system consists of three phases including compound noun decomposition, semantic constraint, and semantic tagging. In compound noun decomposition, best candidates are selected using noun location frequencies extracted from a Sejong corpus, and re-decomposes noun for semantic constraint and restores foreign nouns. The semantic constraints phase finds possible semantic combinations by using origin information in dictionary and Naive Bayes Classifier, in order to decrease the computation time and increase the accuracy of semantic tagging. The semantic tagging phase calculates the semantic similarity between decomposed nouns and decides the semantic tags. We have constructed 40,717 experimental compound nouns data set from Standard Korean Language Dictionary, which consists of more than 3 characters and is semantically tagged. From the experiments, the accuracy of compound noun decomposition is 99.26%, and the accuracy of semantic tagging is 95.38% respectively.

Implementation of Augmentative and Alternative Communication System Using Image Dictionary and Verbal based Sentence Generation Rule (이미지 사전과 동사기반 문장 생성 규칙을 활용한 보완대체 의사소통 시스템 구현)

  • Ryu, Je;Han, Kwang-Rok
    • The KIPS Transactions:PartB
    • /
    • v.13B no.5 s.108
    • /
    • pp.569-578
    • /
    • 2006
  • The present study implemented AAC(Augmentative and Alternative Communication) system using images that speech defectives can easily understand. In particular, the implementation was focused on the portability and mobility of the AAC system as well as communication system of a more flexible form. For mobility and portability, we implemented a system operable in mobile devices such as PDA so that speech defectives can communicate as food as ordinary People at any Place using the system Moreover, in order to overcome the limitation of storage space for a large volume of image data, we implemented the AAC system in client/server structure in mobile environment. What is more, for more flexible communication, we built an image dictionary by taking verbs as the base and sub-categorizing nouns according to their corresponding verbs, and regularized the types of sentences generated according to the type of verb, centering on verbs that play the most important role in composing a sentence.

Named Entity Recognition and Dictionary Construction for Korean Title: Books, Movies, Music and TV Programs (한국어 제목 개체명 인식 및 사전 구축: 도서, 영화, 음악, TV프로그램)

  • Park, Yongmin;Lee, Jae Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.7
    • /
    • pp.285-292
    • /
    • 2014
  • A named entity recognition method is used to improve the performance of information retrieval systems, question answering systems, machine translation systems and so on. The targets of the named entity recognition are usually PLOs (persons, locations and organizations). They are usually proper nouns or unregistered words, and traditional named entity recognizers use these characteristics to find out named entity candidates. The titles of books, movies and TV programs have different characteristics than PLO entities. They are sometimes multiple phrases, one sentence, or special characters. This makes it difficult to find the named entity candidates. In this paper we propose a method to quickly extract title named entities from news articles and automatically build a named entity dictionary for the titles. For the candidates identification, the word phrases enclosed with special symbols in a sentence are firstly extracted, and then verified by the SVM with using feature words and their distances. For the classification of the extracted title candidates, SVM is used with the mutual information of word contexts.

First Order Predicate Logic Representation and Management for Information Resource Dictionary (정보자원사전에 대한 서술논리 표현과 관리)

  • 김창화
    • The Journal of Information Technology and Database
    • /
    • v.5 no.1
    • /
    • pp.13-37
    • /
    • 1998
  • 인터넷 등의 컴퓨터 통신 네트워크의 발달로 인하여 분산된 정보자원의 공유를 통한 자원에 대한 재사용성의 필요성이 대두되었다. IRD(Information Resource Dictionary)는 조직 내에서 관련된 모든 정보에 대한 데이터가 논리적으로 중앙화된 정보저장소(repository)이다. IRD 내의 데이터는 다른 데이터를 기술하므로 이른바 메타 데이터라고 하기도 한다. IRD의 사전(dictionary) 요소는 정보자원의 종류, 정보자원의 의미, 정보자원의 논리적 구조, 정보자원의 위치, 그리고 정보자원의 접근방법 등을 기술한다. FIPS ANSI의 IRDS는 이항 관계를 이용하여 무결성 제약조건을 표현하므로 제약조건 규칙의 표현과 일반적인 추론 규칙의 표현이 제한되어 있으며, 다양한 형태의 무결성 제약조건의 표현과 IRD와 관련된 여러 정보의 도출 또는 추론 및 관리에 관한 사항은 IRD 응용 고유의 문제로 간주하여 언급하고 있지 않다. 한편, FIPS IRDS는 사용자가 SQL 및 IRD에 대한 전문적 지식이 없이는 사용자 질의 작성이 어려운 점등에 대한 문제점을 안고 있다. 본 논문은 FIPS IRDS의 기본모델에서 정보자원 표현, 정보자원들간의 관계, 정보자원의 관리 정보 구분을 명확히 하기 위해 정보자원 모델을 정보자원 표현요소와 정보자원 관리요소의 두 부류로 나누어 구분하고, 각 부류에 대한 자격 질의(competency question)를 통하여 유추된 요소들을 FIPS ANSI IRDS 기본 모델의 스키마 기술 레벨과 스키마 레벨에 첨가함으로써 그 기본 모델을 확장한다. 그리고, FIPS ANSI IRDS가 제공하는 IRD 기술과 관리 기능을 그대로 포함하면서 앞에서 문제점으로 지적된 제약조건 표현과 추론규칙 표현을 위하여 확장된 기본 모델을 중심으로 각 레벨의 구성 요소들의 형식적 의미(formal semantics)와 레벨 내 혹은 레벨 구성요소들간의 관계성(relationship), 그리고 제약조건의 표현과 질의 추론 규칙들을 식별하여 FOPL(First Order Predicate Logic)로 표현한다. 또한, 본 논문은 FOPL로 표현된 predicate들과 규칙들을 구현하기 위하여 Prolog로 변환하기 위한 이론적 방법론을 제시하고 정보자원 관리를 위한 기본 함수들과 스키마 진화(schema evolution)를 위한 방법론을 제안한다.

  • PDF

Searchable Encrypted String for Query Support on Different Encrypted Data Types

  • Azizi, Shahrzad;Mohammadpur, Davud
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.10
    • /
    • pp.4198-4213
    • /
    • 2020
  • Data encryption, particularly application-level data encryption, is a common solution to protect data confidentiality and deal with security threats. Application-level encryption is a process in which data is encrypted before being sent to the database. However, cryptography transforms data and makes the query difficult to execute. Various studies have been carried out to find ways in order to implement a searchable encrypted database. In the current paper, we provide a new encrypting method and querying on encrypted data (ZSDB) for different data types. It is worth mentioning that the proposed method is based on secret sharing. ZSDB provides data confidentiality by dividing sensitive data into two parts and using the additional server as Dictionary Server. In addition, it supports required operations on various types of data, especially LIKE operator functioning on string data type. ZSDB dedicates the largest volume of execution tasks on queries to the server. Therefore, the data owner only needs to encrypt and decrypt data.

A Study on Smartwatch review data of SNS and sentiment analytical using opinion mining (스마트워치 SNS 리뷰 데이터와 오피니언 마이닝을 통한 감성 분석 처리에 대한 연구)

  • Shin, Donghyun;Choi, YongLak
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.1047-1050
    • /
    • 2015
  • Wearable device, along with IoT(Internet of Things), is considered the core of upcoming generation's convergence technology. Companies are intensely competing one another for prior occupation in the smartwatch market. Consumers that use smartwatch express their preferences by sharing their opinions through SNS(Social Networking Service). Through this study, emotions dictionary is built, which consists of attributes and emotional words related to smartwatch. Based on the emotions dictionary, SNS data has been categorized according to the attributes through opinion data model. Afterwards, overall polarity and attribute polarity of collected data are distinguished through natural language parsing, followed by an analysis of smartwatch reviews. This study will contribute to determination of which attributes of smartwatch to be improved, to arise consumer's interest for individual smartwatch.

  • PDF

Application of Data Dictionary to BIM for Small and Medium Project (중소규모 사업용 BIM을 위한 데이터 사전의 활용)

  • Lee, Hwan Woo;Lee, Kyung Sub;Kim, Kwang Yang
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.26 no.6
    • /
    • pp.431-438
    • /
    • 2013
  • The systemization of construction information is required over whole life cycle of facilities to improve productivity of construction industry. BIM(Building Information Modeling) is a technology to manage information based on 3D information model. It has been actively suggested as one of the alternatives. However, it may be currently concentrated on the large project while the small and medium project based on BIM are slightly treated in indifference. In the case of small and medium project, the loss of information has been occurred more seriously than large project. However, it is hard to introduce BIM to the small and medium companies due to the lack of investment resources. This study has been performed to set up information management system based on BIM considering characteristics of small and medium project without excessive investment. In this study, pseudo BIM is defined as BIM for small and medium project. The concept of pseudo BIM has been suggested. The PLIB of ISO and construction information classification system of MOLIT in Korea are used to construct data dictionary for pseudo BIM. A pilot test is performed to verify the effectiveness of pseudo BIM.

A Study on Preservation Metadata for Long Term Preservation of Electronic Records (전자기록의 장기적 보존을 위한 보존메타데이터 요소 분석)

  • Lee, Kyung-Nam
    • The Korean Journal of Archival Studies
    • /
    • no.14
    • /
    • pp.191-240
    • /
    • 2006
  • For long-term preservation of electronic records, the information on the whole processes of management from the time of creation of the electronic information should be captured and managed together. Such information is supported by preservation metadata thus the implementation of preservation metadata is important for preservation of electronic records maintaining the record-ness. Preservation metadata is the information that supports the process of digital preservation and functions th maintain long-term viability, renderability, understandability, authenticity and identity of digital resources. Preservation metadata should be developed applying the international standard Reference Model for an Open Archival Information System(OAIS) to have international interoperability for exchange and reuse. Initial international preservation metadata schemas were developed standardizing the OAIS Reference Model. But the preservation metadata schema of Victorian Electronic Records Strategy(VERS) and recently published Data Dictionary of PREMIS Working Group were developed in advanced types that are different from the existing framework. Those were advanced th practical ones from conceptual one. Comparing these two cases, proposed the elements of integral preservation metadata for long-term preservation of electronic records. This thesis has the significance that it has suggested the direction for future development of the elements of preservation metadata by setting the past discussions related to preservation metadata in order and proposing integral preservation metadata elements for long-term preservation of electronic records.

A Generation and Matching Method of Normal-Transient Dictionary for Realtime Topic Detection (실시간 이슈 탐지를 위한 일반-급상승 단어사전 생성 및 매칭 기법)

  • Choi, Bongjun;Lee, Hanjoo;Yong, Wooseok;Lee, Wonsuk
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.5
    • /
    • pp.7-18
    • /
    • 2017
  • Recently, the number of SNS user has rapidly increased due to smart device industry development and also the amount of generated data is exponentially increasing. In the twitter, Text data generated by user is a key issue to research because it involves events, accidents, reputations of products, and brand images. Twitter has become a channel for users to receive and exchange information. An important characteristic of Twitter is its realtime. Earthquakes, floods and suicides event among the various events should be analyzed rapidly for immediately applying to events. It is necessary to collect tweets related to the event in order to analyze the events. But it is difficult to find all tweets related to the event using normal keywords. In order to solve such a mentioned above, this paper proposes A Generation and Matching Method of Normal-Transient Dictionary for realtime topic detection. Normal dictionaries consist of general keywords(event: suicide-death-loop, death, die, hang oneself, etc) related to events. Whereas transient dictionaries consist of transient keywords(event: suicide-names and information of celebrities, information of social issues) related to events. Experimental results show that matching method using two dictionary finds more tweets related to the event than a simple keyword search.