• Title/Summary/Keyword: Big data storage

Search Result 207, Processing Time 0.026 seconds

Development of Intelligent OCR Technology to Utilize Document Image Data (문서 이미지 데이터 활용을 위한 지능형 OCR 기술 개발)

  • Kim, Sangjun;Yu, Donghui;Hwang, Soyoung;Kim, Minho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.212-215
    • /
    • 2022
  • In the era of so-called digital transformation today, the need for the construction and utilization of big data in various fields has increased. Today, a lot of data is produced and stored in a digital device and media-friendly manner, but the production and storage of data for a long time in the past has been dominated by print books. Therefore, the need for Optical Character Recognition (OCR) technology to utilize the vast amount of print books accumulated for a long time as big data was also required in line with the need for big data. In this study, a system for digitizing the structure and content of a document object inside a scanned book image is proposed. The proposal system largely consists of the following three steps. 1) Recognition of area information by document objects (table, equation, picture, text body) in scanned book image. 2) OCR processing for each area of the text body-table-formula module according to recognized document object areas. 3) The processed document informations gather up and returned to the JSON format. The model proposed in this study uses an open-source project that additional learning and improvement. Intelligent OCR proposed as a system in this study showed commercial OCR software-level performance in processing four types of document objects(table, equation, image, text body).

  • PDF

Model Predictive Control for Distributed Storage Facilities and Sewer Network Systems via PSO (분산형 저류시설-하수관망 네트워크 시스템의 입자군집최적화 기반 모델 예측 제어)

  • Baek, Hyunwook;Ryu, Jaena;Kim, Tea-Hyoung;Oh, Jeill
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.6
    • /
    • pp.722-728
    • /
    • 2012
  • Urban sewer systems has a limitation of capacity of rainwater storage and problem of occurrence of untreated sewage, so adopting a storage facility for sewer flooding prevention and urban non-point pollution reduction has a big attention. The Korea Ministry of Environment has recently introduced a new concept of "multi-functional storage facility", which is crucial not only in preventive stormwater management but also in dealing with combined sewer overflow and sanitary sewer discharge, and also has been promoting its adoption. However, reserving a space for a single large-scale storage facility might be difficult especially in urban areas. Thus, decentralized construction of small- and midium-sized storage facilities and its operation have been introduced as an alternative way. In this paper, we propose a model predictive control scheme for an optimized operation of distributed storage facilities and sewer networks. To this aim, we first describe the mathematical model of each component of networks system which enables us to analyze its detailed dynamic behavior. Second, overflow locations and volumes will be predicted based on the developed network model with data on the external inflow occurred at specific locations of the network. MPC scheme based on the introduced particle swarm optimization technique then produces the optimized the gate setting for sewer network flow control, which minimizes sewer flooding and maximizes the potential storage capacity. Finally, the operational efficacy of the proposed control scheme is demonstrated by simulation study with virtual rainstorm event.

A Study on the Development Direction of Medical Image Information System Using Big Data and AI (빅데이터와 AI를 활용한 의료영상 정보 시스템 발전 방향에 대한 연구)

  • Yoo, Se Jong;Han, Seong Soo;Jeon, Mi-Hyang;Han, Man Seok
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.9
    • /
    • pp.317-322
    • /
    • 2022
  • The rapid development of information technology is also bringing about many changes in the medical environment. In particular, it is leading the rapid change of medical image information systems using big data and artificial intelligence (AI). The prescription delivery system (OCS), which consists of an electronic medical record (EMR) and a medical image storage and transmission system (PACS), has rapidly changed the medical environment from analog to digital. When combined with multiple solutions, PACS represents a new direction for advancement in security, interoperability, efficiency and automation. Among them, the combination with artificial intelligence (AI) using big data that can improve the quality of images is actively progressing. In particular, AI PACS, a system that can assist in reading medical images using deep learning technology, was developed in cooperation with universities and industries and is being used in hospitals. As such, in line with the rapid changes in the medical image information system in the medical environment, structural changes in the medical market and changes in medical policies to cope with them are also necessary. On the other hand, medical image information is based on a digital medical image transmission device (DICOM) format method, and is divided into a tomographic volume image, a volume image, and a cross-sectional image, a two-dimensional image, according to a generation method. In addition, recently, many medical institutions are rushing to introduce the next-generation integrated medical information system by promoting smart hospital services. The next-generation integrated medical information system is built as a solution that integrates EMR, electronic consent, big data, AI, precision medicine, and interworking with external institutions. It aims to realize research. Korea's medical image information system is at a world-class level thanks to advanced IT technology and government policies. In particular, the PACS solution is the only field exporting medical information technology to the world. In this study, along with the analysis of the medical image information system using big data, the current trend was grasped based on the historical background of the introduction of the medical image information system in Korea, and the future development direction was predicted. In the future, based on DICOM big data accumulated over 20 years, we plan to conduct research that can increase the image read rate by using AI and deep learning algorithms.

Analysis of Assembling Tolerance of Optical Components in NFR System (NFR 시스템 헤드의 광 부품 조립 정밀도 분석)

  • 오형렬;권대갑;이준희;윤형길;김진용;김수경;김영식
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2001.05a
    • /
    • pp.718-721
    • /
    • 2001
  • For higher recording density in optical data storage, near field optics is being actively researched as one of the promising alternatives. But the tight assembling tolerance in NFR is one of big barriers to overcome for the realization of it. In this paper, the tolerances in assembling optic components of NFR system are analyzed. Some of key tolerances can be loosened by the optimization of objective lens design. But one of them become too tight by the optimization and should be controlled by other means. One of possible methods to control the tolerance is discussed.

  • PDF

Implementation and Performance Measuring of Erasure Coding of Distributed File System (분산 파일시스템의 소거 코딩 구현 및 성능 비교)

  • Kim, Cheiyol;Kim, Youngchul;Kim, Dongoh;Kim, Hongyeon;Kim, Youngkyun;Seo, Daewha
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.11
    • /
    • pp.1515-1527
    • /
    • 2016
  • With the growth of big data, machine learning, and cloud computing, the importance of storage that can store large amounts of unstructured data is growing recently. So the commodity hardware based distributed file systems such as MAHA-FS, GlusterFS, and Ceph file system have received a lot of attention because of their scale-out and low-cost property. For the data fault tolerance, most of these file systems uses replication in the beginning. But as storage size is growing to tens or hundreds of petabytes, the low space efficiency of the replication has been considered as a problem. This paper applied erasure coding data fault tolerance policy to MAHA-FS for high space efficiency and introduces VDelta technique to solve data consistency problem. In this paper, we compares the performance of two file systems, MAHA-FS and GlusterFS. They have different IO processing architecture, the former is server centric and the latter is client centric architecture. We found the erasure coding performance of MAHA-FS is better than GlusterFS.

A Selective Compression Strategy for Performance Improvement of Database Compression (데이터베이스 압축 성능 향상을 위한 선택적 압축 전략)

  • Lee, Ki-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.9
    • /
    • pp.371-376
    • /
    • 2015
  • The Internet of Things (IoT) significantly increases the amount of data. Database compression is important for big data because it can reduce costs for storage systems and save I/O bandwidth. However, it could show low performance for write-intensive workloads such as OLTP due to the updates of compressed pages. In this paper, we present practical guidelines for the performance improvement of database compression. Especially, we propose the SELECTIVE strategy, which compresses only tables whose space savings are close to the expected space savings calculated by the compressed page size. Experimental results using the TPC-C benchmark and MySQL show that the strategy can achieve 1.1 times better performance than the uncompressed counterpart with 17.3% space savings.

AI Smart Factory Model for Integrated Management of Packaging Container Production Process

  • Kim, Chigon;Park, Deawoo
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.3
    • /
    • pp.148-154
    • /
    • 2021
  • We propose the AI Smart Factory Model for integrated management of production processes in this paper .It is an integrated platform system for the production of food packaging containers, consisting of a platform system for the main producer, one or more production partner platform systems, and one or more raw material partner platform systems while each subsystem of the three systems consists of an integrated storage server platform that can be expanded infinitely with flexible systems that can extend client PCs and main servers according to size and integrated management of overall raw materials and production-related information. The hardware collects production site information in real time by using various equipment such as PLCs, on-site PCs, barcode printers, and wireless APs at the production site. MES and e-SCM data are stored in the cloud database server to ensure security and high availability of data, and accumulated as big data. It was built based on the project focused on dissemination and diffusion of the smart factory construction, advancement, and easy maintenance system promoted by the Ministry of SMEs and Startups to enhance the competitiveness of small and medium-sized enterprises (SMEs) manufacturing sites while we plan to propose this model in the paper to state funding projects for SMEs.

Analysis of Current Situation of University Student Loans Based on Bigdata (빅데이터 기반 대학생 학자금 대출 현황 분석)

  • Kim, Jeong-Joon;Jang, Sung-Jun;Lee, Yong-Soo
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.5
    • /
    • pp.229-238
    • /
    • 2019
  • Before the scholarship loan system was implemented at the Korea Scholarship Foundation, the government's role was strengthened by the direct lending of student funds to banks and other financial institutions. However, the low repayment performance of student loans has raised concerns over the future of student loans and the government's financial burden. Moreover, since student loans are repaid even after graduating from college to support low-income families, it is highly unlikely that the repayment rate of student loans will improve unless the employment rate and income level of the borrower improve. In this paper, the final visualization graph is presented of the repayment amount of the student loan through the collection, storage, processing and analysis phase in the Big Data-based system. This could be the basis for visually checking the amount of student loans to come up with various ways to reduce the burden on the current student loan system.

A proposal on a proactive crawling approach with analysis of state-of-the-art web crawling algorithms (최신 웹 크롤링 알고리즘 분석 및 선제적인 크롤링 기법 제안)

  • Na, Chul-Won;On, Byung-Won
    • Journal of Internet Computing and Services
    • /
    • v.20 no.3
    • /
    • pp.43-59
    • /
    • 2019
  • Today, with the spread of smartphones and the development of social networking services, structured and unstructured big data have stored exponentially. If we analyze them well, we will get useful information to be able to predict data for the future. Large amounts of data need to be collected first in order to analyze big data. The web is repository where these data are most stored. However, because the data size is large, there are also many data that have information that is not needed as much as there are data that have useful information. This has made it important to collect data efficiently, where data with unnecessary information is filtered and only collected data with useful information. Web crawlers cannot download all pages due to some constraints such as network bandwidth, operational time, and data storage. This is why we should avoid visiting many pages that are not relevant to what we want and download only important pages as soon as possible. This paper seeks to help resolve the above issues. First, We introduce basic web-crawling algorithms. For each algorithm, the time-complexity and pros and cons are described, and compared and analyzed. Next, we introduce the state-of-the-art web crawling algorithms that have improved the shortcomings of the basic web crawling algorithms. In addition, recent research trends show that the web crawling algorithms with special purposes such as collecting sentiment words are actively studied. We will one of the introduce Sentiment-aware web crawling techniques that is a proactive web crawling technique as a study of web crawling algorithms with special purpose. The result showed that the larger the data are, the higher the performance is and the more space is saved.

The study of Defense Artificial Intelligence and Block-chain Convergence (국방분야 인공지능과 블록체인 융합방안 연구)

  • Kim, Seyong;Kwon, Hyukjin;Choi, Minwoo
    • Journal of Internet Computing and Services
    • /
    • v.21 no.2
    • /
    • pp.81-90
    • /
    • 2020
  • The purpose of this study is to study how to apply block-chain technology to prevent data forgery and alteration in the defense sector of AI(Artificial intelligence). AI is a technology for predicting big data by clustering or classifying it by applying various machine learning methodologies, and military powers including the U.S. have reached the completion stage of technology. If data-based AI's data forgery and modulation occurs, the processing process of the data, even if it is perfect, could be the biggest enemy risk factor, and the falsification and modification of the data can be too easy in the form of hacking. Unexpected attacks could occur if data used by weaponized AI is hacked and manipulated by North Korea. Therefore, a technology that prevents data from being falsified and altered is essential for the use of AI. It is expected that data forgery prevention will solve the problem by applying block-chain, a technology that does not damage data, unless more than half of the connected computers agree, even if a single computer is hacked by a distributed storage of encrypted data as a function of seawater.