• Title/Summary/Keyword: 비정형데이터

Search Result 583, Processing Time 0.024 seconds

Proposal of Standardization Plan for Defense Unstructured Datasets based on Unstructured Dataset Standard Format (비정형 데이터셋 표준포맷 기반 국방 비정형 데이터셋 표준화 방안 제안)

  • Yun-Young Hwang;Jiseong Son
    • Journal of Internet Computing and Services
    • /
    • v.25 no.1
    • /
    • pp.189-198
    • /
    • 2024
  • AI is accepted not only in the private sector but also in the defense sector as a cutting-edge technology that must be introduced for the development of national defense. In particular, artificial intelligence has been selected as a key task in defense science and technology innovation, and the importance of data is increasing. As the national defense department shifts from a closed data policy to data sharing and activation, efforts are being made to secure high-quality data necessary for the development of national defense. In particular, we are promoting a review of the business budget system to secure data so that related procedures can be improved to reflect the unique characteristics of AI and big data, and research and development can begin with sufficient large quantities and high-quality data. However, there is a need to establish standardization and quality standards for structured data and unstructured data at the national defense level, but the defense department is still proposing standardization and quality standards for structured data, so this needs to be supplemented. In this paper, we propose an unstructured data set standard format for defense unstructured data sets, which are most needed in defense artificial intelligence, and based on this, we propose a standardization method for defense unstructured data sets.

Unstructured Data Analysis using Equipment Check Ledger: A Case Study in Telecom Domain (장비점검 일지의 비정형 데이터분석을 통한 고장 대응 효율화 사례 연구)

  • Ju, Yeonjin;Kim, Yoosin;Jeong, Seung Ryul
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.127-135
    • /
    • 2020
  • As the importance of the use and analysis of big data is emerging, there is a growing interest in natural language processing techniques for unstructured data such as news articles and comments. Particularly, as the collection of big data becomes possible, data mining techniques capable of pre-processing and analyzing data are emerging. In this case study with a telecom company, we propose a methodology how to formalize unstructured data using text mining. The domain is determined as equipment failure and the data is about 2.2 million equipment check ledger data. Data on equipment failures by 800,000 per year is accumulated in the equipment check ledger. The equipment check ledger coexist with both formal and unstructured data. Although formal data can be easily used for analysis, unstructured data is difficult to be used immediately for analysis. However, in unstructured data, there is a high possibility that important information. Because it can be contained that is not written in a formal. Therefore, in this study, we study to develop digital transformation method for unstructured data in equipment check ledger.

Fat Client-Based Abstraction Model of Unstructured Data for Context-Aware Service in Edge Computing Environment (에지 컴퓨팅 환경에서의 상황인지 서비스를 위한 팻 클라이언트 기반 비정형 데이터 추상화 방법)

  • Kim, Do Hyung;Mun, Jong Hyeok;Park, Yoo Sang;Choi, Jong Sun;Choi, Jae Young
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.10 no.3
    • /
    • pp.59-70
    • /
    • 2021
  • With the recent advancements in the Internet of Things, context-aware system that provides customized services become important to consider. The existing context-aware systems analyze data generated around the user and abstract the context information that expresses the state of situations. However, these datasets is mostly unstructured and have difficulty in processing with simple approaches. Therefore, providing context-aware services using the datasets should be managed in simplified method. One of examples that should be considered as the unstructured datasets is a deep learning application. Processes in deep learning applications have a strong coupling in a way of abstracting dataset from the acquisition to analysis phases, it has less flexible when the target analysis model or applications are modified in functional scalability. Therefore, an abstraction model that separates the phases and process the unstructured dataset for analysis is proposed. The proposed abstraction utilizes a description name Analysis Model Description Language(AMDL) to deploy the analysis phases by each fat client is a specifically designed instance for resource-oriented tasks in edge computing environments how to handle different analysis applications and its factors using the AMDL and Fat client profiles. The experiment shows functional scalability through examples of AMDL and Fat client profiles targeting a vehicle image recognition model for vehicle access control notification service, and conducts process-by-process monitoring for collection-preprocessing-analysis of unstructured data.

A Study of improving reliability on prediction model by analyzing method Big data (빅데이터 분석방법을 이용한 예측모형의 신뢰도 향상에 관한 연구)

  • Song, Min-Gu;Kim, Sun-Bae
    • Journal of Digital Convergence
    • /
    • v.11 no.6
    • /
    • pp.103-112
    • /
    • 2013
  • Traditional method of establishing prediction model is usually using formal data stored in Data Base. However, nowadays advent of "smart" era brought by ground-breaking development of communication system makes informal data to dominate overall data, such 80% in total. Therefore, conventional method using formal data as establishing predicting model would be untrustworthy means in present. In other words, it is indispensible to make prediction model credible including informal data(SNS, image, video) and semi-formal data(log data). In this study, we increase credibility of predicting model adapting Bigdata method and comparing reliability of conventional measurement to real-data.

Mathematical Algorithms for the Automatic Generation of Production Data of Free-Form Concrete Panels (비정형 콘크리트 패널의 생산데이터 자동생성을 위한 수학적 알고리즘)

  • Kim, Doyeong;Kim, Sunkuk;Son, Seunghyun
    • Journal of the Korea Institute of Building Construction
    • /
    • v.22 no.6
    • /
    • pp.565-575
    • /
    • 2022
  • Thanks to the latest developments in digital architectural technologies, free-form designs that maximize the creativity of architects have rapidly increased. However, there are a lot of difficulties in forming various free-form curved surfaces. In panelizing to produce free forms, the methods of mesh, developable surface, tessellation and subdivision are applied. The process of applying such panelizing methods when producing free-form panels is complex, time-consuming and requires a vast amount of manpower when extracting production data. Therefore, algorithms are needed to quickly and systematically extract production data that are needed for panel production after a free-form building is designed. In this respect, the purpose of this study is to propose mathematical algorithms for the automatic generation of production data of free-form panels in consideration of the building model, performance of production equipment and pattern information. To accomplish this, mathematical algorithms were suggested upon panelizing, and production data for a CNC machine were extracted by mapping as free-form curved surfaces. The study's findings may contribute to improved productivity and reduced cost by realizing the automatic generation of data for production of free-form concrete panels.

A Design and Implementation for processing Query Links in Virtual Documents (가상문서에서 질의 링크 처리를 위한 설계 및 구현)

  • 강민구;김철수;강지훈
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10a
    • /
    • pp.169-171
    • /
    • 2001
  • XML을 기반으로 하는 가상문서는 인터넷 상의 정보 공유를 가능하게 하여 새로운 지식을 생성할 수 있도록 한다. 가상문서에서는 비정형(텍스트, 이미지, 멀티미디어 데이터)과 준정형(HTML, XML) 데이터를 링크로 연결하여 정보를 제공하고 있다. HTML과 같은 기존의 웹 문서에서는 스크립트나 CGI 같은 것을 통하여 정형 데이터(데이터베이스)의 정보를 제공하고 있으므로 가상문서에서도 자연스럽게 정형 데이터를 연결하여 사용이 가능하다. 본 논문에서는 디지털 도서관 시스템에서 정형 데이터를 지원하기 위해 질의 링크를 포함하는 가상문서를 효율적으로 처리할 수 있도록 데이터베이스를 설계 및 구현하며, 데이터베이스 스키마 정보를 관리하여 필요한 데이터베이스를 검색하여 질의 링크의 생성을 돕도록 하였다.

  • PDF

Cost Performance Evaluation Framework through Analysis of Unstructured Construction Supervision Documents using Binomial Logistic Regression (비정형 공사감리문서 정보와 이항 로지스틱 회귀분석을 이용한 건축 현장 비용성과 평가 프레임워크 개발)

  • Kim, Chang-Won;Song, Taegeun;Lee, Kiseok;Yoo, Wi Sung
    • Journal of the Korea Institute of Building Construction
    • /
    • v.24 no.1
    • /
    • pp.121-131
    • /
    • 2024
  • This research explores the potential of leveraging unstructured data from construction supervision documents, which contain detailed inspection insights from independent third-party monitors of building construction processes. With the evolution of analytical methodologies, such unstructured data has been recognized as a valuable source of information, offering diverse insights. The study introduces a framework designed to assess cost performance by applying advanced analytical methods to the unstructured data found in final construction supervision reports. Specifically, key phrases were identified using text mining and social network analysis techniques, and these phrases were then analyzed through binomial logistic regression to assess cost performance. The study found that predictions of cost performance based on unstructured data from supervision documents achieved an accuracy rate of approximately 73%. The findings of this research are anticipated to serve as a foundational resource for analyzing various forms of unstructured data generated within the construction sector in future projects.

Design of Streaming based Unstructured-Data Collecting Framework in IoT Environment (IoT 환경에서 스트리밍 기반의 비정형 데이터 수집 프레임워크 설계)

  • Lee, Hoo-Young;Park, Koo-Rack;Kim, Dong-Hyun
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2017.01a
    • /
    • pp.57-58
    • /
    • 2017
  • 사물인터넷 환경의 다양한 기기에서는 매초마다 시스템 로그 데이터, 온도, 습도, 조도 및 위치 정보 등과 같은 데이터를 지속적으로 생성한다. 이렇게 생성된 데이터는 기기 안에서 대부분 소멸되거나 수집된다 하더라도 시스템 개선의 일부 목적으로 활용하는데 그칠 뿐이다. 본 논문에서는 각각의 사물인터넷 기기에서 발생하는 비정형 데이터를 스트리밍 방식을 통해 수집 서버로 전송하고 이를 유연한 스키마 구조를 가지는 NoSQL 데이터베이스에 적재하는 프레임워크 설계를 제안한다. 이렇게 수많은 장비로부터 수집된 로그 및 센싱 데이터는 빅데이터 분석을 통해 산업의 현장에서 생산성 향상을 위해 사용할 수 있으며 공공의 목적으로 도심지의 교통문제 해소와 재난 및 재해 예측에 활용될 수 있다.

  • PDF

Design and Implementation of Input and Output System for Unstructured Big Data (비정형 대용량 데이터 입력 및 출력 시스템 설계 및 구현)

  • Kim, Chang-Su;Shim, Kyu-Chul;Kang, Byoung-Jun;Kim, Kyung-Hwan;Jung, Hoe-Kyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.2
    • /
    • pp.387-393
    • /
    • 2014
  • In recent years, the spread of computers is increasing, and efficient processing effort for unstructured Big Data is required. In this paper, we are proposed a system to extract the data typed in a word processor quickly by user creating and XML mapping file after converting XML data that has been entered in the office file(HWP, MS-office). In addition, we proposed a system is able to lookup the necessary data from a database by entered form in advance and convert word processor document to office files by the application program. The unstructured big data will be available to be used.

De-identifying Unstructured Medical Text and Attribute-based Utility Measurement (의료 비정형 텍스트 비식별화 및 속성기반 유용도 측정 기법)

  • Ro, Gun;Chun, Jonghoon
    • The Journal of Society for e-Business Studies
    • /
    • v.24 no.1
    • /
    • pp.121-137
    • /
    • 2019
  • De-identification is a method by which the remaining information can not be referred to a specific individual by removing the personal information from the data set. As a result, de-identification can lower the exposure risk of personal information that may occur in the process of collecting, processing, storing and distributing information. Although there have been many studies in de-identification algorithms, protection models, and etc., most of them are limited to structured data, and there are relatively few considerations on de-identification of unstructured data. Especially, in the medical field where the unstructured text is frequently used, many people simply remove all personally identifiable information in order to lower the exposure risk of personal information, while admitting the fact that the data utility is lowered accordingly. This study proposes a new method to perform de-identification by applying the k-anonymity protection model targeting unstructured text in the medical field in which de-identification is mandatory because privacy protection issues are more critical in comparison to other fields. Also, the goal of this study is to propose a new utility metric so that people can comprehend de-identified data set utility intuitively. Therefore, if the result of this research is applied to various industrial fields where unstructured text is used, we expect that we can increase the utility of the unstructured text which contains personal information.