• Title/Summary/Keyword: character encoding

Search Result 22, Processing Time 0.021 seconds

Guided Sequence Generation using Trie-based Dictionary for ASR Error Correction (음성 인식 오류 수정을 위한 Trie 기반 사전을 이용한 Guided Sequence Generation)

  • Choi, Junhwi;Ryu, Seonghan;Yu, Hwanjo;Lee, Gary Geunbae
    • Annual Conference on Human and Language Technology
    • /
    • 2016.10a
    • /
    • pp.211-216
    • /
    • 2016
  • 현재 나오는 많은 음성 인식기가 대체로 높은 정확도를 가지고 있더라도, 음성 인식 오류는 여전히 빈번하게 발생한다. 음성 인식 오류는 관련 어플리케이션에 있어 많은 오동작의 원인이 되므로, 음성 인식 오류는 고쳐져야 한다. 본 논문에서는 Trie 기반 사전을 이용한 Guided Sequence Generation을 제안한다. 제안하는 모델은 목표 단어와 그 단어의 문맥을 Encoding하고, 그로부터 단어를 Character 단위로 Decoding하며 단어를 Generation한다. 올바른 단어를 생성하기 위하여, Generation 시에 Trie 기반 사전을 통해 유도한다. 실험을 위해 모델은 영어 TV 가이드 도메인의 말뭉치의 음성 인식 오류를 단순히 Simulation하여 만들어진 말뭉치로부터 훈련되고, 같은 도메인의 음성 인식 문장과 결과로 이루어진 병렬 말뭉치에서 성능을 평가하였다. Guided Generation은 Unguided Generation에 비해 14.9% 정도의 오류를 줄였다.

  • PDF

Design and Implementation Automatic Character Set Encoding Recognition Method for Document File (문서 파일의 문자 인코딩 자동 인식 기법의 설계 및 구현)

  • Seo, Min-Ji;Kim, Myung-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.10a
    • /
    • pp.95-98
    • /
    • 2015
  • 문자 인코딩은 컴퓨터에 저장하거나 네트워크상에서 전송하기 위해 문서를 이진화 하는 방법이다. 문자 인코딩은 고유의 문자 코드 테이블을 이용하여 문서를 이진화 하기 때문에, 문서에 적용된 문자 인코딩과 다른 문자 인코딩을 이용하여 디코딩 하면 원본과 다른 문서가 출력되어 문서를 읽을 수 없게 된다. 따라서 문서를 읽기 위해서는 문서에 적용된 문자 인코딩을 알아내야 한다. 본 논문에서는 문서의 문자 인코딩을 자동으로 판별하는 방법을 제시한다. 제안하는 방법은 이스케이프 문자를 이용한 판별법, 문서에 나타난 코드 값 범위 판별법, 문서에 나타난 코드 값의 특징 판별법, 단어 데이터베이스를 이용한 판별법과 같은 여러 단계를 걸쳐 문서에 적용된 문자 인코딩을 판별한다. 제안하는 방법은 문서를 언어별로 분류하여 문자 인코딩을 판별하기 때문에, 높은 문자 인코딩 인식률을 보인다.

Korean automatic spacing using pretrained transformer encoder and analysis

  • Hwang, Taewook;Jung, Sangkeun;Roh, Yoon-Hyung
    • ETRI Journal
    • /
    • v.43 no.6
    • /
    • pp.1049-1057
    • /
    • 2021
  • Automatic spacing in Korean is used to correct spacing units in a given input sentence. The demand for automatic spacing has been increasing owing to frequent incorrect spacing in recent media, such as the Internet and mobile networks. Therefore, herein, we propose a transformer encoder that reads a sentence bidirectionally and can be pretrained using an out-of-task corpus. Notably, our model exhibited the highest character accuracy (98.42%) among the existing automatic spacing models for Korean. We experimentally validated the effectiveness of bidirectional encoding and pretraining for automatic spacing in Korean. Moreover, we conclude that pretraining is more important than fine-tuning and data size.

Technique for production and encoding of New dot-type Print Watermark Pattern (새로운 도트형 프린트 워터마크 패턴의 생성 및 부호화 기법)

  • Lee, Boo-Hyung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.5
    • /
    • pp.979-984
    • /
    • 2009
  • In this paper, the technique for production and encoding of new dot-type print watermark is proposed. The print watermark has characteristics similar to those of the digital watermark and function as link which change various first informations(texts, symbols, figures, etc) on the printed matter to secondary contents (sound, video, character, etc) corresponding each to informations on the printed matter. The proposed dot-type print watermark pattern is represented as $16{\times}16$ matrix in $0.4mm^2$ area and dots are printed on only 23 elements of $16{\times}16$ matrix. The size of each dot is so small(0.02mm)that it can not be seen. Because position of printed dots correspond to the position of each digit in binary notation in this paper, they are encoded easily and there are about 8,000,000 watermark patterns enough to express first information of printed matters. It was showed that the proposed print watermark patterns are recognized without difficulty by the own recognition device.

A Novel Scalable and Storage-Efficient Architecture for High Speed Exact String Matching

  • Peiravi, Ali;Rahimzadeh, Mohammad Javad
    • ETRI Journal
    • /
    • v.31 no.5
    • /
    • pp.545-553
    • /
    • 2009
  • String matching is a fundamental element of an important category of modern packet processing applications which involve scanning the content flowing through a network for thousands of strings at the line rate. To keep pace with high network speeds, specialized hardware-based solutions are needed which should be efficient enough to maintain scalability in terms of speed and the number of strings. In this paper, a novel architecture based upon a recently proposed data structure called the Bloomier filter is proposed which can successfully support scalability. The Bloomier filter is a compact data structure for encoding arbitrary functions, and it supports approximate evaluation queries. By eliminating the Bloomier filter's false positives in a space efficient way, a simple yet powerful exact string matching architecture is proposed that can handle several thousand strings at high rates and is amenable to on-chip realization. The proposed scheme is implemented in reconfigurable hardware and we compare it with existing solutions. The results show that the proposed approach achieves better performance compared to other existing architectures measured in terms of throughput per logic cells per character as a metric.

The proposal of improved secure cookies system based on public-key certificate (인증서 기반의 개선된 보안 쿠키의 설계와 구현)

  • 양종필;이경현
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.11C
    • /
    • pp.1090-1096
    • /
    • 2002
  • The HTTP does not support continuity for browser-server interaction between successive visits or a user due to a stateless feature. Cookies were invented to maintain continuity and state on the Web. Because cookies are transmitted in plain and contain text-character strings encoding relevant information about the user, the attacker can easily copy and modify them for his undue profit. In this paper, we design a secure cookies scheme based on X.509 public key certificate for solving these security weakness of typical web cookies. Our secure cookies scheme provides not only mutual authentication between client and server but also confidentiality and integrity of user information. Additionally, we implement our secure cookies scheme and compare it to the performance with SSL(Secure Socket Layer) protocol that is widely used for security of HTTP environment.

Content-Based Retrieval System Design over the Internet (인터넷에 기반한 내용기반 검색 시스템 설계)

  • Kim Young Ho;Kang Dae-Seong
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.11 no.5
    • /
    • pp.471-475
    • /
    • 2005
  • Recently, development of digital technology is occupying a large part of multimedia information like character, voice, image, video, etc. Research about video indexing and retrieval progresses especially in research relative to video. This paper proposes the novel notation in order to retrieve MPEG video in the international standards of moving picture encoding For realizing the retrieval-system, we detect DCT DC coefficient, and then we obtain shot to apply MVC(Mean Value Comparative) notation to image constructed DC coefficient. We choose the key frame for start-frame of a shot, and we have the codebook index generating it using feature of DC image and applying PCA(principal Component Analysis) to the key frame. Also, we realize the retrieval-system through similarity after indexing. We could reduce error detection due to distinguish shot from conventional shot detection algorithm. In the mean time, speed of indexing is faster by PCA due to perform it in the compressed domain, and it has an advantage which is to generate codebook due to use statistical features. Finally, we could realize efficient retrieval-system using MVC and PCA to shot detection and indexing which is important step of retrieval-system, and we using retrieval-system over the internet.

A FRINGE CHARACTER ANALYSIS OF FRINGE IMAGE (Fringe 영상의 주파수 특성 분석)

  • Seo Young-Ho;Choi Hyun-Jun;Kim Dong-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.11C
    • /
    • pp.1053-1059
    • /
    • 2005
  • The computer generated hologram (CGH) designs and produces digital information for generating 3-D (3-Dimension) image using computer and software instead of optically-sensed hologram of light interference, and it can synthesis a virtual object which is physically not in existence. Since digital hologram includes an amount of data as can be seen at the process of digitization, it is necessary that the data representing digital hologram is reduced for storing, transmission, and processing. As the efforts that are to handle hologram with a type of digital information have been increased, various methods to compress digital hologram called by fringe pattern are groped. Suitable proposal is encoding of hologram. In this paper, we analyzed the properties of CGH using tools of frequency transform, assuming that a generated CGH is a 2D image by introducing DWT that is known as the better tool than DCT for frequency transform. The compression and reconstruction result which was extracted from the wavelet-based codecs illustrates that it has better properties for reconstruction at the maximum 2 times higher compression rate than the Previous researches of Yoshikawa[2] and Thomas[3].

PPNC: Privacy Preserving Scheme for Random Linear Network Coding in Smart Grid

  • He, Shiming;Zeng, Weini;Xie, Kun;Yang, Hongming;Lai, Mingyong;Su, Xin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.3
    • /
    • pp.1510-1532
    • /
    • 2017
  • In smart grid, privacy implications to individuals and their families are an important issue because of the fine-grained usage data collection. Wireless communications are utilized by many utility companies to obtain information. Network coding is exploited in smart grids, to enhance network performance in terms of throughput, delay, robustness, and energy consumption. However, random linear network coding introduces a new challenge for privacy preserving due to the encoding of data and updating of coefficients in forwarder nodes. We propose a distributed privacy preserving scheme for random linear network coding in smart grid that considers the converged flows character of the smart grid and exploits a homomorphic encryption function to decrease the complexities in the forwarder node. It offers a data confidentiality privacy preserving feature, which can efficiently thwart traffic analysis. The data of the packet is encrypted and the tag of the packet is encrypted by a homomorphic encryption function. The forwarder node random linearly codes the encrypted data and directly processes the cryptotext tags based on the homomorphism feature. Extensive security analysis and performance evaluations demonstrate the validity and efficiency of the proposed scheme.

Genomics Reveals Traces of Fungal Phenylpropanoid-flavonoid Metabolic Pathway in the Filamentous Fungus Aspergillus oryzae

  • Juvvadi Praveen Rao;Seshime Yasuyo;Kitamoto Katsuhiko
    • Journal of Microbiology
    • /
    • v.43 no.6
    • /
    • pp.475-486
    • /
    • 2005
  • Fungal secondary metabolites constitute a wide variety of compounds which either playa vital role in agricultural, pharmaceutical and industrial contexts, or have devastating effects on agriculture, animal and human affairs by virtue of their toxigenicity. Owing to their beneficial and deleterious characteristics, these complex compounds and the genes responsible for their synthesis have been the subjects of extensive investigation by microbiologists and pharmacologists. A majority of the fungal secondary metabolic genes are classified as type I polyketide synthases (PKS) which are often clustered with other secondary metabolism related genes. In this review we discuss on the significance of our recent discovery of chalcone synthase (CHS) genes belonging to the type III PKS superfamily in an industrially important fungus, Aspergillus oryzae. CHS genes are known to playa vital role in the biosynthesis of flavonoids in plants. A comparative genome analyses revealed the unique character of A. oryzae with four CHS-like genes (csyA, csyB, csyC and csyD) amongst other Aspergilli (Aspergillus nidulans and Aspergillus fumigatus) which contained none of the CHS-like genes. Some other fungi such as Neurospora crassa, Fusarium graminearum, Magnaporthe grisea, Podospora anserina and Phanerochaete chrysosporium also contained putative type III PKSs, with a phylogenic distinction from bacteria and plants. The enzymatically active nature of these newly discovered homologues is expected owing to the conservation in the catalytic residues across the different species of plants and fungi, and also by the fact that a majority of these genes (csyA, csyB and csyD) were expressed in A. oryzae. While this finding brings filamentous fungi closer to plants and bacteria which until recently were the only ones considered to possess the type III PKSs, the presence of putative genes encoding other principal enzymes involved in the phenylpropanoid and flavonoid biosynthesis (viz., phenylalanine ammonia-lyase, cinnamic acid hydroxylase and p-coumarate CoA ligase) in the A. oryzae genome undoubtedly prove the extent of its metabolic diversity. Since many of these genes have not been identified earlier, knowledge on their corresponding products or activities remain undeciphered. In future, it is anticipated that these enzymes may be reasonable targets for metabolic engineering in fungi to produce agriculturally and nutritionally important metabolites.