• Title/Summary/Keyword: document analysis

Search Result 1,202, Processing Time 0.028 seconds

Trends in Genomics & Informatics: a statistical review of publications from 2003 to 2018 focusing on the most-studied genes and document clusters

  • Kim, Ji-Hyeon;Nam, Hee-Jo;Park, Hyun-Seok
    • Genomics & Informatics
    • /
    • v.17 no.3
    • /
    • pp.25.1-25.6
    • /
    • 2019
  • Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Herein, we conduct a statistical analysis of the publications of Genomics & Informatics over the 16 years since its inception, with a particular focus on issues relating to article categories, word clouds, and the most-studied genes, drawing on recent reviews of the use of word frequencies in journal articles. Trends in the studies published in Genomics & Informatics are discussed both individually and collectively.

Document Embedding and Image Content Analysis for Improving News Clustering System (뉴스 클러스터링 개선을 위한 문서 임베딩 및 이미지 분석 자질의 활용)

  • Kim, Siyeon;Kim, Sang-Bum
    • Annual Conference on Human and Language Technology
    • /
    • 2015.10a
    • /
    • pp.104-108
    • /
    • 2015
  • 많은 양의 뉴스가 생성됨에 따라 이를 효과적으로 정리하는 기법이 최근 활발히 연구되어왔다. 그 중 뉴스클러스터링은 두 뉴스가 동일사건을 다루는지를 판정하는 분류기의 성능에 의존적인데, 대부분의 경우 BoW(Bag-of-Words)기반 벡터유사도를 사용하고 있다. 본 논문에서는 BoW기반의 벡터유사도 뿐 아니라 두 문서에 포함된 사진들의 유사성 및 주제의 관련성을 측정, 이를 분류기의 자질로 추가하여 두 뉴스가 동일사건을 다루는지 판정하는 분류기의 성능을 개선하는 방법을 제안한다. 사진들의 유사성 및 주제의 관련성은 최근 각광을 받는 딥러닝기반 CNN과 신경망기반 문서임베딩을 통해 측정하였다. 실험결과 기존의 BoW기반 벡터유사도에 의한 분류기의 성능에 비해 제안하는 두 자질을 사용하였을 경우 3.4%의 성능 향상을 보여주었다.

  • PDF

An overview and analysis of commercial document delivery systems (국내외 상업적 문헌제공시스템의 현황파악과 비교분석)

  • 윤희윤
    • Journal of the Korean Society for information Management
    • /
    • v.15 no.2
    • /
    • pp.7-28
    • /
    • 1998
  • The purpose of this study is to overview and analyze the commercial document delivery systems. to this end, the study first compared the current systems under three headings, that is, non-collection-based systems(Infotrieve, OCLC, UnCover, BIDS, Swets & Zeitlinger, Kyobobook), collection-based systems(EBSCO, ISI, UMI, BLDSC, CISTI, INIST, NCSI, JICST, KINITI), and specialized collection-based systems(Engineering Information Inc., IEEE/IEE, BIOSIS, CAS, NAL, RSC, TWI, ADONIS). Next, the study analyzeed the advantages and disadvantages of each system, based on the four performance criteria : scope of inventory/journal coverage, turnaround time, delivery ost and payment options, reliability and satisfaction rate.

  • PDF

Machine Printed and Handwritten Text Discrimination in Korean Document Images

  • Trieu, Son Tung;Lee, Guee Sang
    • Smart Media Journal
    • /
    • v.5 no.3
    • /
    • pp.30-34
    • /
    • 2016
  • Nowadays, there are a lot of Korean documents, which often need to be identified in one of printed or handwritten text. Early methods for the identification use structural features, which can be simple and easy to apply to text of a specific font, but its performance depends on the font type and characteristics of the text. Recently, the bag-of-words model has been used for the identification, which can be invariant to changes in font size, distortions or modifications to the text. The method based on bag-of-words model includes three steps: word segmentation using connected component grouping, feature extraction, and finally classification using SVM(Support Vector Machine). In this paper, bag-of-words model based method is proposed using SURF(Speeded Up Robust Feature) for the identification of machine printed and handwritten text in Korean documents. The experiment shows that the proposed method outperforms methods based on structural features.

Developing A Document-based Work-flow Modeling Support System A Case-based Reasoning Approach

  • Kim, Jaeho;Woojong Suh;Lee, Heeseok
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.06a
    • /
    • pp.445-454
    • /
    • 2001
  • A workflow model is useful fur business process analysis and has often been implemented for office automation through information technology. Accordingly, the results of workflow modeling need to be systematically managed as information assets. In order to manage the modeling process effectively, it is necessary to enhance the efficiency of their reuse. Therefore, this paper creates a Document-barred Workflow Modeling Support System (DWMSS) using a case-based reasoning (CBR) approach. It proposes a system architecture, and the corresponding modeling process is developed. Furthermore, a repository, which consists of a case base and vocabulary base, is built. A carte study is illustrated to demonstrate the usefulness of th is system.

  • PDF

Analysis File Format of Seogwang Document Processor 3.0 in North Korea (북한 서광사무처리 3.0 파일 구조 분석)

  • Choi, Junhyeong;Kang, Dongsu
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.05a
    • /
    • pp.335-338
    • /
    • 2019
  • 북한에서 운영하고 있는 오피스 프로그램인 서광사무처리 3.0은 ODF(Open Document Format) 파일 포맷을 입력으로 받아 문서를 처리한다. ODF는 여러 개의 XML(Extensible Markup Language) 파일로 구성되어 있고, 하위 노드들을 통해서 파일 구조를 정의한다. 이러한 서광사무처리 3.0의 ODF 파일 구조를 하위 프로그램별 입력받는 파일 확장자에 따라 공통 영역과 가변 영역으로 비교하고, CVE(Common Vulnerabilities and Exposures)를 통해 ODF와 XML 주요 취약점을 분석한다.

A Study on Constructing Order-Production System through Integrated E-Mail and Database (E-mail과 DB연계를 통한 주문-생산시스템 구축연구)

  • 정한욱;이창호
    • Journal of the Korea Safety Management & Science
    • /
    • v.2 no.2
    • /
    • pp.155-165
    • /
    • 2000
  • Many enterprises are performing the effective database applications with VAN(Value Added Network) or WAN(Web Added Network) . But it is very difficult and expensive. So we suggest low-cost database system within long distance area through personal computers. This system is very powerful for flexibility. It may be estimated it's value highly because they develop the program without high programming skill. This study would be used between company with company and/or between branch with branch, for example, customer claim information, inventory information, product order etc. It is important not importing document but importing data in document. Then end-user can accomplish analysis and decision-making with their own database. It would enhance productivity in many enterprises.

  • PDF

O Valor Documental dos Balangandãs: Uma Análise Simbólica e Formal

  • Carmo, Sura Souza;Borges, Luiz C.
    • Iberoamérica
    • /
    • v.23 no.1
    • /
    • pp.79-111
    • /
    • 2021
  • The purpose of this article is to present the potential of balangandãs as a documentary source for intersectional studies of gender and slavery, from the analysis of formal and symbolic aspects of the museum objects in the Museu Nacional Nacional (MHN) and Museu Carlos Costa Pinto (MCCP). Balangandãs are a type of creole jewelry, made in gold or silver and used in Brazil since the 18th century by black women - women who worked, especially in the commercialization of foodstuffs in large urban centers. They are described in printed sources and engravings, and preserved in some museum institutions. In the study, the meanings attributed to the object over the centuries are observed: jewelry, amulet, peculium, document, travel memory and heritage. As a result, the article seeks to highlight the objects as a historical and documentary source, verifying similarities between the pieces musealized at the MHN and at the MCCP, also emphasizing the documental power of the pieces produced today.

Multi-task learning with contextual hierarchical attention for Korean coreference resolution

  • Cheoneum Park
    • ETRI Journal
    • /
    • v.45 no.1
    • /
    • pp.93-104
    • /
    • 2023
  • Coreference resolution is a task in discourse analysis that links several headwords used in any document object. We suggest pointer networks-based coreference resolution for Korean using multi-task learning (MTL) with an attention mechanism for a hierarchical structure. As Korean is a head-final language, the head can easily be found. Our model learns the distribution by referring to the same entity position and utilizes a pointer network to conduct coreference resolution depending on the input headword. As the input is a document, the input sequence is very long. Thus, the core idea is to learn the word- and sentence-level distributions in parallel with MTL, while using a shared representation to address the long sequence problem. The suggested technique is used to generate word representations for Korean based on contextual information using pre-trained language models for Korean. In the same experimental conditions, our model performed roughly 1.8% better on CoNLL F1 than previous research without hierarchical structure.

Addressing Emerging Threats: An Analysis of AI Adversarial Attacks and Security Implications

  • HoonJae Lee;ByungGook Lee
    • International journal of advanced smart convergence
    • /
    • v.13 no.2
    • /
    • pp.69-79
    • /
    • 2024
  • AI technology is a central focus of the 4th Industrial Revolution. However, compared to some existing non-artificial intelligence technologies, new AI adversarial attacks have become possible in learning data management, input data management, and other areas. These attacks, which exploit weaknesses in AI encryption technology, are not only emerging as social issues but are also expected to have a significant negative impact on existing IT and convergence industries. This paper examines various cases of AI adversarial attacks developed recently, categorizes them into five groups, and provides a foundational document for developing security guidelines to verify their safety. The findings of this study confirm AI adversarial attacks that can be applied to various types of cryptographic modules (such as hardware cryptographic modules, software cryptographic modules, firmware cryptographic modules, hybrid software cryptographic modules, hybrid firmware cryptographic modules, etc.) incorporating AI technology. The aim is to offer a foundational document for the development of standardized protocols, believed to play a crucial role in rejuvenating the information security industry in the future.