• Title/Summary/Keyword: Text structure

Search Result 775, Processing Time 0.036 seconds

Development of Retrieval Model Using Structure Information and Term Information (구조적 정보와 색인어 정보를 결합한 검색 모델 개발)

  • 임성신;한기덕;권혁철
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.799-801
    • /
    • 2004
  • 인터넷 정보의 축적량이 증가함으로 인해 사용자는 원하는 정보를 찾기가 더욱 어려워졌다 따라서 수많은 문서들 중에서 원하는 정보를 효과적으로 찾아주는 정보검색 시스템의 중요성이 증가하게 되었으며 이에 대한 연구도 활발히 진행되었다. 인터넷 문서에서 추출할 수 있는 정보들은 링크 정보, Anchor Text 정보, Title Text 정보, 본문 Text 정보 등이 있으며, 이런 정보들을 이용한 수많은 정보검색 시스템이 개발되거나 모델이 연구되고 있다 본 논문에서는 기존에 이용되어 왔던 일반적인 추출 점보들을 정제 및 처리를 통해 성능을 높일 수 있는 방안을 연구했던 선행 연구를 기반으로 한 실험 결과 및 사이트 가중치를 추가한 모델을 제시한다.

  • PDF

Convolutional Neural Networks for Character-level Classification

  • Ko, Dae-Gun;Song, Su-Han;Kang, Ki-Min;Han, Seong-Wook
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.1
    • /
    • pp.53-59
    • /
    • 2017
  • Optical character recognition (OCR) automatically recognizes text in an image. OCR is still a challenging problem in computer vision. A successful solution to OCR has important device applications, such as text-to-speech conversion and automatic document classification. In this work, we analyze character recognition performance using the current state-of-the-art deep-learning structures. One is the AlexNet structure, another is the LeNet structure, and the other one is the SPNet structure. For this, we have built our own dataset that contains digits and upper- and lower-case characters. We experiment in the presence of salt-and-pepper noise or Gaussian noise, and report the performance comparison in terms of recognition error. Experimental results indicate by five-fold cross-validation that the SPNet structure (our approach) outperforms AlexNet and LeNet in recognition error.

The DTD Development through Document Structure Analysis of Journals (학술지 논문기사의 문헌구조 분석을 통한 DTD개발)

  • Yoon, So-Young
    • Journal of Information Management
    • /
    • v.28 no.2
    • /
    • pp.20-53
    • /
    • 1997
  • To use SGML, which is international standard of markup language to construct fulltext database in digital libraries, the DTD is developed first. It is based on structure analysis of document. This study develops the SGML DTD for Korean document through document structure analysis of the Journal of the Korean Society for Information Management.

  • PDF

Crystal Structure and Biochemical Characterization of Xylose Isomerase from Piromyces sp. E2

  • Son, Hyeoncheol Francis;Lee, Sun-Mi;Kim, Kyung-Jin
    • Journal of Microbiology and Biotechnology
    • /
    • v.28 no.4
    • /
    • pp.571-578
    • /
    • 2018
  • Biofuel production using lignocellulosic biomass is gaining attention because it can be substituted for fossil fuels without competing with edible resources. However, because Saccharomyces cerevisiae does not have a ${\text\tiny{D}}$-xylose metabolic pathway, oxidoreductase or isomerase pathways must be introduced to utilize ${\text\tiny{D}}$-xylose from lignocellulosic biomass in S. cerevisiae. To elucidate the biochemical properties of xylose isomerase (XI) from Piromyces sp. E2 (PsXI), we determine its crystal structure in complex with substrate mimic glycerol. An amino-acid sequence comparison with other reported XIs and relative activity measurements using five kinds of divalent metal ions confirmed that PsXI belongs to class II XIs. Moreover kinetic analysis of PsXI was also performed using $Mn^{2+}$, the preferred divalent metal ion for PsXI. In addition, the substrate-binding mode of PsXI could be predicted with the substrate mimic glycerol bound to the active site. These studies may provide structural information to enhance ${\text\tiny{D}}$-xylose utilization for biofuel production.

A Display Method of Image Information and URL Using the Message Structures of Emergency Alert Broadcasts for 5G Cellular Communications (5G 이동통신 용 재난경보 방송의 메시지 구조를 이용한 이미지 정보 및 URL 표출기법)

  • Chang, Sekchin
    • Journal of Broadcast Engineering
    • /
    • v.26 no.5
    • /
    • pp.592-598
    • /
    • 2021
  • Current cellular systems rely on a CBS protocol for emergency alert broadcast services. However, the CBS protocol just specifies the delivery of a limited text message. Therefore, foreigners, who are unfamiliar with local characters, may have some difficulties in understanding the received CBS text message. The CBS protocol also reveals a distinct restriction in delivering abundant information because of a limited number of text characters. In order to overcome the weak points of the current CBS protocol, we propose a display method of image information and URL on the screens of mobile terminals for the received CBS text message in this paper. The presented approach effectively utilizes the message structure of CBS for 5G cellular systems.

A Study on the Construction of an Efficient Text-Based User Interface (효율적 문자 기반의 사용자 인터폐이스 구축에 관한 연구)

  • 허진석;서장춘
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2000.10a
    • /
    • pp.289-289
    • /
    • 2000
  • In this paper, a new text-based method is suggested for the user-system interaction. The use of text-based user interface is mote efficient under situation which don't be introduced the GUI because of the limitation of hardware cost or improvement of system performance. The dialogical method using suggested hierarchical structure is the easier for a convenience of usage and the method in this paper is the more useful as considering knowledgeable background and environment of task for user As a practical example, the method for the proposed text-based user interface construction is applied to Double-Lift Open Shedding Electronic Jacquard.

  • PDF

Improving Transformer with Dynamic Convolution and Shortcut for Video-Text Retrieval

  • Liu, Zhi;Cai, Jincen;Zhang, Mengmeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.7
    • /
    • pp.2407-2424
    • /
    • 2022
  • Recently, Transformer has made great progress in video retrieval tasks due to its high representation capability. For the structure of a Transformer, the cascaded self-attention modules are capable of capturing long-distance feature dependencies. However, the local feature details are likely to have deteriorated. In addition, increasing the depth of the structure is likely to produce learning bias in the learned features. In this paper, an improved Transformer structure named TransDCS (Transformer with Dynamic Convolution and Shortcut) is proposed. A Multi-head Conv-Self-Attention module is introduced to model the local dependencies and improve the efficiency of local features extraction. Meanwhile, the augmented shortcuts module based on a dual identity matrix is applied to enhance the conduction of input features, and mitigate the learning bias. The proposed model is tested on MSRVTT, LSMDC and Activity-Net benchmarks, and it surpasses all previous solutions for the video-text retrieval task. For example, on the LSMDC benchmark, a gain of about 2.3% MdR and 6.1% MnR is obtained over recently proposed multimodal-based methods.

A Study of Automatic Indexing Technique based on Logical Structure of SGML Hangul Document (SGML 한글문서의 논리적 구조에 근거한 색인기법에 관한 연구)

  • 유석종
    • Journal of the Korean Society for information Management
    • /
    • v.12 no.2
    • /
    • pp.85-101
    • /
    • 1995
  • Conventional indexing sytstems support only full-text indexing method for electronic documents and do not use logical structure of documents in retrieval. Most electronic documents are in different formats depending on various systems. Also, they only indicate physical style of the document without considering any logical structure. Thus, in the effort to standardize the exchange of documents. IS0 developed SGML(Stadard Generalized Markup Language) which contains information about logical structure of the documents. In this paper, to resolve the disadvantages of full-text indexing method and to use standard document format. indexing system for SGML document is designed and implemented. In this system, user can assign indexing domain on elements, thus the logical structure of document is reflected in retrieving information. Various retrieval methods can be implemented by using the structural information of the document. In addition, automatic indexing for SGML Hangul document is supported in this system

  • PDF

New Text Steganography Technique Based on Part-of-Speech Tagging and Format-Preserving Encryption

  • Mohammed Abdul Majeed;Rossilawati Sulaiman;Zarina Shukur
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.1
    • /
    • pp.170-191
    • /
    • 2024
  • The transmission of confidential data using cover media is called steganography. The three requirements of any effective steganography system are high embedding capacity, security, and imperceptibility. The text file's structure, which makes syntax and grammar more visually obvious than in other media, contributes to its poor imperceptibility. Text steganography is regarded as the most challenging carrier to hide secret data because of its insufficient redundant data compared to other digital objects. Unicode characters, especially non-printing or invisible, are employed for hiding data by mapping a specific amount of secret data bits in each character and inserting the character into cover text spaces. These characters are known with limited spaces to embed secret data. Current studies that used Unicode characters in text steganography focused on increasing the data hiding capacity with insufficient redundant data in a text file. A sequential embedding pattern is often selected and included in all available positions in the cover text. This embedding pattern negatively affects the text steganography system's imperceptibility and security. Thus, this study attempts to solve these limitations using the Part-of-speech (POS) tagging technique combined with the randomization concept in data hiding. Combining these two techniques allows inserting the Unicode characters in randomized patterns with specific positions in the cover text to increase data hiding capacity with minimum effects on imperceptibility and security. Format-preserving encryption (FPE) is also used to encrypt a secret message without changing its size before the embedding processes. By comparing the proposed technique to already existing ones, the results demonstrate that it fulfils the cover file's capacity, imperceptibility, and security requirements.