Search | Korea Science

A Study on Intelligent Document Processing Management using Unstructured Data (비정형 데이터를 활용한 지능형 문서 처리 관리에 관한 연구)

Kyoung Hoon Park;Kwang-Kyu Seo
- Journal of the Semiconductor & Display Technology
- /
- v.23 no.2
- /
- pp.71-75
- /
- 2024
This research focuses on processing unstructured data efficiently, containing various formulas in document processing and management regarding the terms and rules of domestic insurance documents using text mining techniques. Through parsing and compilation technology, document context, content, constants, and variables are automatically separated, and errors are verified in order of the document and logic to improve document accuracy accordingly. Through document debugging technology, errors in the document are identified in real time. Furthermore, it is necessary to predict the changes that intelligent document processing will bring to document management work, in particular, the impact on documents and utilization tasks that are double managed due to various formulas and prepare necessary capabilities in the future.
PDF

K-Trade : Data-driven Digital Trade Framework (K-Trade : 데이터 주도형 디지털 무역 프레임워크)

Kim, Chaemee;Loh, Woong-Kee
- Journal of Information Technology Services
- /
- v.19 no.6
- /
- pp.177-189
- /
- 2020
The OECD has assessed Korea as the third highest in trade facilitation worldwide. The paperless trade of Korea is world class based on uTradeHub : national e-trade service's infrastructure for trade community. Over 800 trade-related document standards provide interoperability of message exchange and trade process automation among exporters, importers, banks, customs, airlines, shippers, forwarders and trade authorities. Most one-to-one unit processes are perfectly paperless & online; however, from the perspective of process flow, there is a lack of streamlining end-to-end trade processes spread over many different parties. This situation causes the trade community to endure repetitive-redundant load for handling trade documents. The trade community has a strong demand for seamless trade flow. For streamlining the trade process, processes with data should flow seamlessly to multilateral parties. Flowing data with an optimized process is the critical success factor to accomplish seamless trade. This study proposes four critical digital trade infrastructures as a platform service : (1) data-centric Intelligent Document Recognition(IDR), (2) data-driven Digital Document Flow (DDF), (3) platform based Digital Collaboration & Communication(DCC), and (4) new digital Trade Facilitation Index (dTFI) for precise assessment of K-Trade Digital Trade Framework. The results of new dTFI analyses showed that redundant reentry load was reduced significantly over the whole trade and logistics process. This study leads to the belief that if put into real-world application can provide huge economic gains by building a new global value chain of the K-trade eco network. A new digital trade framework will be invaluable in promoting national soft power for enhancing global competitiveness of the trade community. It could become the advanced reference model of next trade facilitation infrastructure for developing countries.
https://doi.org/10.9716/KITS.2020.19.6.177 인용 PDF KSCI

Application of Standard Terminologies for the Development of a Customized Healthcare Service based on a PHR Platform

Jung, Hyun Jung;Park, Hyun Sang;Kim, Hyun Young;Kim, Hwa Sun
- Journal of Multimedia Information System
- /
- v.6 no.4
- /
- pp.303-308
- /
- 2019
The personal health record platform can store and manage medical records, health-monitoring data such as blood pressure and blood sugar, and life logs generated from various wearable devices. It provides services such as international standard-based medical document management, data pattern analysis and an intelligent inference engine, and disease prediction and domain contents. This study aims to construct a foundation for the transmission of international standard-based medical documents by mapping the diagnosis items of a general health examination, special health examination, life logs, health data, and life habits with the international standard terminology systems. The results of mapping with international standard terminology systems show a high mapping rate of 95.6%, with 78.8% for LOINC, 10.3% for SNOMED, and 6.5% when mapped with both LOINC and SNOMED.
https://doi.org/10.33851/JMIS.2019.6.4.303 인용 PDF KSCI HTML

Trends of Cloud and Virtualization in Broadcast Infra (방송 인프라의 클라우드 및 가상화 동향)

Kim, S.C.;Oh, H.J.;Yim, H.J.;Hyun, E.H.;Choi, D.J.
- Electronics and Telecommunications Trends
- /
- v.34 no.3
- /
- pp.23-33
- /
- 2019
Broadcast is evolving into media service aimed at user customization, personalization, and participation with high-quality broadcasting contents (4K/8K/AR/VR). A broadcast infrastructure is needed to engage with the competition for providing large-scaled media traffic process, platform performance for adaptive transcoding to diverse receivers, and intelligent service. Cloud service and virtualization in broadcast are becoming more valuable as the broadcasting environment changes and new high-level broadcasting services emerge. This document describes the examples of cloud and virtualization in the broadcast industry, and prospects the network virtualization of broadcast transmission infrastructure, especially terrestrial and cable networks.
https://doi.org/10.22648/ETRI.2019.J.340303 인용 PDF HTML

Deep Learning OCR based document processing platform and its application in financial domain (금융 특화 딥러닝 광학문자인식 기반 문서 처리 플랫폼 구축 및 금융권 내 활용)

Dongyoung Kim;Doohyung Kim;Myungsung Kwak;Hyunsoo Son;Dongwon Sohn;Mingi Lim;Yeji Shin;Hyeonjung Lee;Chandong Park;Mihyang Kim;Dongwon Choi
- Journal of Intelligence and Information Systems
- /
- v.29 no.1
- /
- pp.143-174
- /
- 2023
With the development of deep learning technologies, Artificial Intelligence powered Optical Character Recognition (AI-OCR) has evolved to read multiple languages from various forms of images accurately. For the financial industry, where a large number of diverse documents are processed through manpower, the potential for using AI-OCR is great. In this study, we present a configuration and a design of an AI-OCR modality for use in the financial industry and discuss the platform construction with application cases. Since the use of financial domain data is prohibited under the Personal Information Protection Act, we developed a deep learning-based data generation approach and used it to train the AI-OCR models. The AI-OCR models are trained for image preprocessing, text recognition, and language processing and are configured as a microservice architected platform to process a broad variety of documents. We have demonstrated the AI-OCR platform by applying it to financial domain tasks of document sorting, document verification, and typing assistance The demonstrations confirm the increasing work efficiency and conveniences.
https://doi.org/10.13088/jiis.2023.29.1.143 인용 PDF

A Study on the Intelligent Document Processing Platform for Document Data Informatization (문서 데이터 정보화를 위한 지능형 문서처리 플랫폼에 관한 연구)

Hee-Do Heo;Dong-Koo Kang;Young-Soo Kim;Sam-Hyun Chun
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.24 no.1
- /
- pp.89-95
- /
- 2024
Nowadays, the competitiveness of a company depends on the ability of all organizational members to share and utilize the organizational knowledge accumulated by the organization. As if to prove this, the world is now focusing on ChetGPT service using generative AI technology based on LLM (Large Language Model). However, it is still difficult to apply the ChetGPT service to work because there are many hallucinogenic problems. To solve this problem, sLLM (Lightweight Large Language Model) technology is being proposed as an alternative. In order to construct sLLM, corporate data is essential. Corporate data is the organization's ERP data and the company's office document knowledge data preserved by the organization. ERP Data can be used by directly connecting to sLLM, but office documents are stored in file format and must be converted to data format to be used by connecting to sLLM. In addition, there are too many technical limitations to utilize office documents stored in file format as organizational knowledge information. This study proposes a method of storing office documents in DB format rather than file format, allowing companies to utilize already accumulated office documents as an organizational knowledge system, and providing office documents in data form to the company's SLLM. We aim to contribute to improving corporate competitiveness by combining AI technology.
https://doi.org/10.7236/JIIBC.2024.24.1.89 인용 PDF HTML

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
- Journal of Intelligence and Information Systems
- /
- v.20 no.2
- /
- pp.109-122
- /
- 2014
People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.
https://doi.org/10.13088/jiis.2014.20.2.109 인용 PDF KSCI

Social Tagging-based Recommendation Platform for Patented Technology Transfer (특허의 기술이전 활성화를 위한 소셜 태깅기반 지적재산권 추천플랫폼)

Park, Yoon-Joo
- Journal of Intelligence and Information Systems
- /
- v.21 no.3
- /
- pp.53-77
- /
- 2015
Korea has witnessed an increasing number of domestic patent applications, but a majority of them are not utilized to their maximum potential but end up becoming obsolete. According to the 2012 National Congress' Inspection of Administration, about 73% of patents possessed by universities and public-funded research institutions failed to lead to creating social values, but remain latent. One of the main problem of this issue is that patent creators such as individual researcher, university, or research institution lack abilities to commercialize their patents into viable businesses with those enterprises that are in need of them. Also, for enterprises side, it is hard to find the appropriate patents by searching keywords on all such occasions. This system proposes a patent recommendation system that can identify and recommend intellectual rights appropriate to users' interested fields among a rapidly accumulating number of patent assets in a more easy and efficient manner. The proposed system extracts core contents and technology sectors from the existing pool of patents, and combines it with secondary social knowledge, which derives from tags information created by users, in order to find the best patents recommended for users. That is to say, in an early stage where there is no accumulated tag information, the recommendation is done by utilizing content characteristics, which are identified through an analysis of key words contained in such parameters as 'Title of Invention' and 'Claim' among the various patent attributes. In order to do this, the suggested system extracts only nouns from patents and assigns a weight to each noun according to the importance of it in all patents by performing TF-IDF analysis. After that, it finds patents which have similar weights with preferred patents by a user. In this paper, this similarity is called a "Domain Similarity". Next, the suggested system extract technology sector's characteristics from patent document by analyzing the international technology classification code (International Patent Classification, IPC). Every patents have more than one IPC, and each user can attach more than one tag to the patents they like. Thus, each user has a set of IPC codes included in tagged patents. The suggested system manages this IPC set to analyze technology preference of each user and find the well-fitted patents for them. In order to do this, the suggeted system calcuates a 'Technology_Similarity' between a set of IPC codes and IPC codes contained in all other patents. After that, when the tag information of multiple users are accumulated, the system expands the recommendations in consideration of other users' social tag information relating to the patent that is tagged by a concerned user. The similarity between tag information of perferred 'patents by user and other patents are called a 'Social Simialrity' in this paper. Lastly, a 'Total Similarity' are calculated by adding these three differenent similarites and patents having the highest 'Total Similarity' are recommended to each user. The suggested system are applied to a total of 1,638 korean patents obtained from the Korea Industrial Property Rights Information Service (KIPRIS) run by the Korea Intellectual Property Office. However, since this original dataset does not include tag information, we create virtual tag information and utilized this to construct the semi-virtual dataset. The proposed recommendation algorithm was implemented with JAVA, a computer programming language, and a prototype graphic user interface was also designed for this study. As the proposed system did not have dependent variables and uses virtual data, it is impossible to verify the recommendation system with a statistical method. Therefore, the study uses a scenario test method to verify the operational feasibility and recommendation effectiveness of the system. The results of this study are expected to improve the possibility of matching promising patents with the best suitable businesses. It is assumed that users' experiential knowledge can be accumulated, managed, and utilized in the As-Is patent system, which currently only manages standardized patent information.
https://doi.org/10.13088/jiis.2015.21.3.53 인용 PDF KSCI

Search Result 8, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)