• Title/Summary/Keyword: 정형 빅데이터

Search Result 251, Processing Time 0.034 seconds

A Quality Evaluation Model for Distributed Processing Systems of Big Data (빅데이터 분산처리시스템의 품질평가모델)

  • Choi, Seung-Jun;Park, Jea-Won;Kim, Jong-Bae;Choi, Jae-Hyun
    • Journal of Digital Contents Society
    • /
    • v.15 no.4
    • /
    • pp.533-545
    • /
    • 2014
  • According to the evolving of IT technologies, the amount of data we are facing increasing exponentially. Thus, the technique for managing and analyzing these vast data that has emerged is a distributed processing system of big data. A quality evaluation for the existing distributed processing systems has been proceeded by the structured data environment. Thus, if we apply this to the evaluation of distributed processing systems of big data which has to focus on the analysis of the unstructured data, a precise quality assessment cannot be made. Therefore, a study of the quality evaluation model for the distributed processing systems is needed, which considers the environment of the analysis of big data. In this paper, we propose a new quality evaluation model by deriving the quality evaluation elements based on the ISO/IEC9126 which is the international standard on software quality, and defining metrics for validating the elements.

Visualizing Unstructured Data using a Big Data Analytical Tool R Language (빅데이터 분석 도구 R 언어를 이용한 비정형 데이터 시각화)

  • Nam, Soo-Tai;Chen, Jinhui;Shin, Seong-Yoon;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.151-154
    • /
    • 2021
  • Big data analysis is the process of discovering meaningful new correlations, patterns, and trends in large volumes of data stored in data stores and creating new value. Thus, most big data analysis technology methods include data mining, machine learning, natural language processing, and pattern recognition used in existing statistical computer science. Also, using the R language, a big data tool, we can express analysis results through various visualization functions using pre-processing text data. The data used in this study was analyzed for 21 papers in the March 2021 among the journals of the Korea Institute of Information and Communication Engineering. In the final analysis results, the most frequently mentioned keyword was "Data", which ranked first 305 times. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

  • PDF

Relationship between Big Data and Analysis Prediction (빅데이터와 분석예측의 관계)

  • Kang, Sun-Kyoung;Lee, Hyun-Chang;Shin, Seong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.167-168
    • /
    • 2017
  • In this paper, we discuss the importance of what to analyze and what to predict using Big Data. The issue of how and where to apply a large amount of data that is accumulated in my daily life and which I am not aware of is a very important factor. There are many kinds of tasks that specify what to predict and how to use these data. Finding the most appropriate one is the way to increase the prediction probability. In addition, the data that are analyzed and predicted should be useful in real life to make meaningful data.

  • PDF

A Meta Analysis of the Edible Insects (식용곤충 연구 메타 분석)

  • Yu, Ok-Kyeong;Jin, Chan-Yong;Nam, Soo-Tai;Lee, Hyun-Chang
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.182-183
    • /
    • 2018
  • Big data analysis is the process of discovering a meaningful correlation, pattern, and trends in large data set stored in existing data warehouse management tools and creating new values. In addition, by extracts new value from structured and unstructured data set in big volume means a technology to analyze the results. Most of the methods of Big data analysis technology are data mining, machine learning, natural language processing, pattern recognition, etc. used in existing statistical computer science. Global research institutes have identified Big data as the most notable new technology since 2011.

  • PDF

A Study on the Use of Stopword Corpus for Cleansing Unstructured Text Data (비정형 텍스트 데이터 정제를 위한 불용어 코퍼스의 활용에 관한 연구)

  • Lee, Won-Jo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.891-897
    • /
    • 2022
  • In big data analysis, raw text data mostly exists in various unstructured data forms, so it becomes a structured data form that can be analyzed only after undergoing heuristic pre-processing and computer post-processing cleansing. Therefore, in this study, unnecessary elements are purified through pre-processing of the collected raw data in order to apply the wordcloud of R program, which is one of the text data analysis techniques, and stopwords are removed in the post-processing process. Then, a case study of wordcloud analysis was conducted, which calculates the frequency of occurrence of words and expresses words with high frequency as key issues. In this study, to improve the problems of the "nested stopword source code" method, which is the existing stopword processing method, using the word cloud technique of R, we propose the use of "general stopword corpus" and "user-defined stopword corpus" and conduct case analysis. The advantages and disadvantages of the proposed "unstructured data cleansing process model" are comparatively verified and presented, and the practical application of word cloud visualization analysis using the "proposed external corpus cleansing technique" is presented.

Design of the Medical Bigdata Processing and Management System (의료 빅데이터 처리 및 관리 시스템 설계)

  • Lee, Seung-Jin;Shin, Young-Rok;Park, Jun-Young;Huh, Eui-Nam
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.05a
    • /
    • pp.431-434
    • /
    • 2013
  • 최근에는 네트워크가 진화하고 데이터 처리기술이 발달하여 디지털 데이터가 활성화되면서, 기존 데이터 처리 방식으로 감당하기 힘든 규모의 데이터인 빅데이터가 매일 생산되고 있다. 이러한 대규모 데이터는 분석 및 관리를 하는데 어렵고 시간이 많이 걸리지만, 분석을 함으로써 새롭고 유용한 많은 정보를 얻을 수가 있다. 이처럼 빅데이터 분석을 통해 얻어지는 정보가 기존 분석 방식에서 얻어지는 정보와 다른 새로운 정보이기에 많은 산업분야에서 빅데이터 처리에 대한 관심이 많아지고 있다. 이러한 흐름에 따라, 의료분야에서도 빅데이터를 효율적으로 처리 및 관리하기 위한 시스템 구축을 시도하고 있다. 즉, 기존에 정형화 되어 있는 의료 데이터를 분석하여 얻는 정보에 비정형화 되어있는 의료 데이터를 추가하여 새로운 정보를 도출하려 시도하고 있다. 하지만, 여러 병원에서 서로 호환이 가능한 의료 빅데이터 처리 및 관리 시스템을 사용하기 위해서는 명확한 의료 빅데이터 처리 및 관리에 대한 요구사항과 기능정의가 필요하다. 이에 본 논문에서는 의료 빅데이터 처리 및 관리를 위한 요구사항과 기능정의를 하고 의료 빅데이터 처리 및 관리 시스템 구조를 구축하고자한다.

Analysis of the Influence Factors of Data Loading Performance Using Apache Sqoop (아파치 스쿱을 사용한 하둡의 데이터 적재 성능 영향 요인 분석)

  • Chen, Liu;Ko, Junghyun;Yeo, Jeongmo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.2
    • /
    • pp.77-82
    • /
    • 2015
  • Big Data technology has been attracted much attention in aspect of fast data processing. Research of practicing Big Data technology is also ongoing to process large-scale structured data much faster in Relatioinal Database(RDB). Although there are lots of studies about measuring analyzing performance, studies about structured data loading performance, prior step of analyzing, is very rare. Thus, in this study, structured data in RDB is tested the performance that loads distributed processing platform Hadoop using Apache sqoop. Also in order to analyze the influence factors of data loading, it is tested repeatedly with different options of data loading and compared with data loading performance among RDB based servers. Although data loading performance of Apache Sqoop in test environment was low, but in large-scale Hadoop cluster environment we can expect much better performance because of getting more hardware resources. It is expected to be based on study improving data loading performance and whole steps of performance analyzing structured data in Hadoop Platform.

Big Data using Artificial Intelligence CNN on Unstructured Financial Data (비정형 금융 데이터에 관한 인공지능 CNN 활용 빅데이터 연구)

  • Ko, Young-Bong;Park, Dea-Woo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.232-234
    • /
    • 2022
  • Big data is widely used in customer relationship management, relationship marketing, financial business improvement, credit information and risk management. Moreover, as non-face-to-face financial transactions have become more active recently due to the COVID-19 virus, the use of financial big data is more demanded in terms of relationships with customers. In terms of customer relationship, financial big data has arrived at a time that requires an emotional rather than a technical approach. In relational marketing, it was necessary to emphasize the emotional aspect rather than the cognitive, rational, and rational aspects. Existing traditional financial data was collected and utilized through text-type customer transaction data, corporate financial information, and questionnaires. In this study, the customer's emotional image data, that is, atypical data based on the customer's cultural and leisure activities, is acquired through SNS and the customer's activity image is analyzed with an artificial intelligence CNN algorithm. Activity analysis is again applied to the annotated AI, and the AI big data model is designed to analyze the behavior model shown in the annotation.

  • PDF

Application Method of Big-Data for Improvement for Construction Project Management System (빅 데이터 기반 건설사업정보시스템 기능 개선 방안 연구)

  • Kim, Jin-Uk;Kim, Young-Jin;Ok, Hyun;Yang, Sung-Hoon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2015.07a
    • /
    • pp.301-303
    • /
    • 2015
  • 국내 건설행정 투명화 및 경쟁력 향상 목적으로 개발된 건설사업정보시스템에 정부와 운영주체는 다양한 기능개선 방안과 관련 연구를 수행하며 시스템 성능을 개선시켜왔다. 그러나 기 추진된 성능향상 방안이 공공업무 처리에 중점 되어 대국민 사용자를 위한 콘텐츠 및 기능 등의 서비스가 미흡한 상황이다. 이에 본 논문에서는 건설사업정보 건설인허가시스템의 도로점용장소별 허가현황 기능을 중심으로 빅 데이터를 이용한 허가현황 정보 제공 방안을 제안하였다. 제안한 기능개선 방안은 기 구축된 비정형 데이터를 빅 데이터 기반으로 재분석하여 구글 맵에 가시화함으로써 공공업무 데이터 처리 뿐만 아니라 대국민 서비스를 위한 콘텐츠 제공이 가능하도록 하였다. 뿐만 아니라 그동안 축적된 15TB이상의 건설관련 데이터의 재활용 가능성을 시사함으로써 시스템 활용성 증대 및 개편 방향에 도움이 될 것으로 판단된다.

  • PDF

Design of Streaming based Unstructured-Data Collecting Framework in IoT Environment (IoT 환경에서 스트리밍 기반의 비정형 데이터 수집 프레임워크 설계)

  • Lee, Hoo-Young;Park, Koo-Rack;Kim, Dong-Hyun
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2017.01a
    • /
    • pp.57-58
    • /
    • 2017
  • 사물인터넷 환경의 다양한 기기에서는 매초마다 시스템 로그 데이터, 온도, 습도, 조도 및 위치 정보 등과 같은 데이터를 지속적으로 생성한다. 이렇게 생성된 데이터는 기기 안에서 대부분 소멸되거나 수집된다 하더라도 시스템 개선의 일부 목적으로 활용하는데 그칠 뿐이다. 본 논문에서는 각각의 사물인터넷 기기에서 발생하는 비정형 데이터를 스트리밍 방식을 통해 수집 서버로 전송하고 이를 유연한 스키마 구조를 가지는 NoSQL 데이터베이스에 적재하는 프레임워크 설계를 제안한다. 이렇게 수많은 장비로부터 수집된 로그 및 센싱 데이터는 빅데이터 분석을 통해 산업의 현장에서 생산성 향상을 위해 사용할 수 있으며 공공의 목적으로 도심지의 교통문제 해소와 재난 및 재해 예측에 활용될 수 있다.

  • PDF