• Title/Summary/Keyword: Data 분석

Search Result 63,486, Processing Time 0.074 seconds

An Insight Study on Keyword of IoT Utilizing Big Data Analysis (빅데이터 분석을 활용한 사물인터넷 키워드에 관한 조망)

  • Nam, Soo-Tai;Kim, Do-Goan;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.146-147
    • /
    • 2017
  • Big data analysis is a technique for effectively analyzing unstructured data such as the Internet, social network services, web documents generated in the mobile environment, e-mail, and social data, as well as well formed structured data in a database. The most big data analysis techniques are data mining, machine learning, natural language processing, and pattern recognition, which were used in existing statistics and computer science. Global research institutes have identified analysis of big data as the most noteworthy new technology since 2011. Therefore, companies in most industries are making efforts to create new value through the application of big data. In this study, we analyzed using the Social Matrics which a big data analysis tool of Daum communications. We analyzed public perceptions of "Internet of things" keyword, one month as of october 8, 2017. The results of the big data analysis are as follows. First, the 1st related search keyword of the keyword of the "Internet of things" has been found to be technology (995). This study suggests theoretical implications based on the results.

  • PDF

Analysis of latent growth model using repeated measures ANOVA in the data from KYPS (청소년패널자료 분석에서의 반복측정분산분석을 활용한 잠재성장모형)

  • Lee, Hwa-Jung;Kang, Suk-Bok
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1409-1419
    • /
    • 2013
  • We analyzed the data from KYPS using the latent growth model which has been widely studied as an analysis method of longitudinal data. In this study, we applied repeated measures ANOVA to unconditional model in order for faster decision of the unconditional model of the latent growth model. Also, we compared the six-type models, the quadratic model and the model of which repeated measures ANOVA is applied.

A Study on the Compression and Major Pattern Extraction Method of Origin-Destination Data with Principal Component Analysis (주성분분석을 이용한 기종점 데이터의 압축 및 주요 패턴 도출에 관한 연구)

  • Kim, Jeongyun;Tak, Sehyun;Yoon, Jinwon;Yeo, Hwasoo
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.19 no.4
    • /
    • pp.81-99
    • /
    • 2020
  • Origin-destination data have been collected and utilized for demand analysis and service design in various fields such as public transportation and traffic operation. As the utilization of big data becomes important, there are increasing needs to store raw origin-destination data for big data analysis. However, it is not practical to store and analyze the raw data for a long period of time since the size of the data increases by the power of the number of the collection points. To overcome this storage limitation and long-period pattern analysis, this study proposes a methodology for compression and origin-destination data analysis with the compressed data. The proposed methodology is applied to public transit data of Sejong and Seoul. We first measure the reconstruction error and the data size for each truncated matrix. Then, to determine a range of principal components for removing random data, we measure the level of the regularity based on covariance coefficients of the demand data reconstructed with each range of principal components. Based on the distribution of the covariance coefficients, we found the range of principal components that covers the regular demand. The ranges are determined as 1~60 and 1~80 for Sejong and Seoul respectively.

Fast Visualization Technique and Visual Analytics System for Real-time Analyzing Stream Data (실시간 스트림 데이터 분석을 위한 시각화 가속 기술 및 시각적 분석 시스템)

  • Jeong, Seongmin;Yeon, Hanbyul;Jeong, Daekyo;Yoo, Sangbong;Kim, Seokyeon;Jang, Yun
    • Journal of the Korea Computer Graphics Society
    • /
    • v.22 no.4
    • /
    • pp.21-30
    • /
    • 2016
  • Risk management system should be able to support a decision making within a short time to analyze stream data in real time. Many analytical systems consist of CPU computation and disk based database. However, it is more problematic when existing system analyzes stream data in real time. Stream data has various production periods from 1ms to 1 hour, 1day. One sensor generates small data but tens of thousands sensors generate huge amount of data. If hundreds of thousands sensors generate 1GB data per second, CPU based system cannot analyze the data in real time. For this reason, it requires fast processing speed and scalability for analyze stream data. In this paper, we present a fast visualization technique that consists of hybrid database and GPU computation. In order to evaluate our technique, we demonstrate a visual analytics system that analyzes pipeline leak using sensor and tweet data.

Visualizing Unstructured Data using a Big Data Analytical Tool R Language (빅데이터 분석 도구 R 언어를 이용한 비정형 데이터 시각화)

  • Nam, Soo-Tai;Chen, Jinhui;Shin, Seong-Yoon;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.151-154
    • /
    • 2021
  • Big data analysis is the process of discovering meaningful new correlations, patterns, and trends in large volumes of data stored in data stores and creating new value. Thus, most big data analysis technology methods include data mining, machine learning, natural language processing, and pattern recognition used in existing statistical computer science. Also, using the R language, a big data tool, we can express analysis results through various visualization functions using pre-processing text data. The data used in this study was analyzed for 21 papers in the March 2021 among the journals of the Korea Institute of Information and Communication Engineering. In the final analysis results, the most frequently mentioned keyword was "Data", which ranked first 305 times. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

  • PDF

An Investigation of Intellectual Structure on Data Papers Published in Data Journals in Web of Science (Web of Science 데이터학술지 게재 데이터논문의 지적구조 규명)

  • Chung, EunKyung
    • Journal of the Korean Society for information Management
    • /
    • v.37 no.1
    • /
    • pp.153-177
    • /
    • 2020
  • In the context of open science, data sharing and reuse are becoming important researchers' activities. Among the discussions about data sharing and reuse, data journals and data papers shows visible results. Data journals are published in many academic fields, and the number of papers is increasing. Unlike the data itself, data papers contain activities that cite and receive citations, thus creating their own intellectual structures. This study analyzed 14 data journals indexed by Web of Science, 6,086 data papers and 84,908 cited references to examine the intellectual structure of data journals and data papers in academic community. Along with the author's details, the co-citation analysis and bibliographic coupling analysis were visualized in network to identify the detailed subject areas. The results of the analysis show that the frequent authors, affiliated institutions, and countries are different from that of traditional journal papers. These results can be interpreted as mainly because the authors who can easily produce data publish data papers. In both co-citation and bibliographic analysis, analytical tools, databases, and genome composition were the main subtopic areas. The co-citation analysis resulted in nine clusters, with specific subject areas being water quality and climate. The bibliographic analysis consisted of a total of 27 components, and detailed subject areas such as ocean and atmosphere were identified in addition to water quality and climate. Notably, the subject areas of the social sciences have also emerged.

A Study on the Job Duties and Competencies of Data Librarians: Using Job Advertisement Analysis in the United States (데이터사서의 직무와 역량에 관한 연구 - 미국 구인광고 분석을 이용하여 -)

  • Park, Jiin;Park, Ji-Hong
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.32 no.3
    • /
    • pp.145-162
    • /
    • 2021
  • To identify key job duties and core competencies of data librarians, this study conducted content analyses of 75 U.S. job advertisements of data librarians and statistical analyses of 105 responses from incumbent data librarians. As a result, the key job duties identified are jobs related to collaboration, workshops, trainings, conferences, data service, research consultation, and research support. Core competencies identified are communication skills, teaching, diversity, inclusion, and equality, data management, and data tool. This study is significant in that it analyzed the key duties and core competencies of the data librarians using the most up-to-date data and opinions collected from incumbent people. It can be a basis for future study such as on job satisfaction, user satisfaction, and perception surveys for data librarians.

Determinant Whether the Data Fragment in Unallocated Space is Compressed or Not and Decompressing of Compressed Data Fragment (비할당 영역 데이터 파편의 압축 여부 판단과 압축 해제)

  • Park, Bo-Ra;Lee, Sang-Jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.18 no.4
    • /
    • pp.175-185
    • /
    • 2008
  • It is meaningful to investigate data in unallocated space because we can investigate the deleted data. However the data in unallocated space is formed to fragmented and it cannot be read by application in most cases. Especially in case of being compressed or encrypted, the data is more difficult to be read. If the fragmented data is encrypted and damaged, it is almost impossible to be read. If the fragmented data is compressed and damaged, it is very difficult to be read but we can read and interpret it sometimes. Therefore if the computer forensic investigator wants to investigate data in unallocated space, formal work of determining the data is encrypted of compressed and decompressing the damaged compressed data. In this paper, I suggest the method of analyzing data in unallocated space from a viewpoint of computer forensics.

An Analysis of Data Science Curriculum in Korea (데이터과학 교육과정에 대한 분석적 연구)

  • Lee, Hyewon;Han, Seunghee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.54 no.1
    • /
    • pp.365-385
    • /
    • 2020
  • In this study, in order to analyze the current status of the data science curriculum in Korea as of October 2019, we conducted an analysis of the prior studies on the curriculum in the data science field and the competencies required for data professional. This study was conducted on 80 curricula and 2,041 courses, and analyzed from the following perspectives; 1) the analysis of the characteristics of data science domain, 2) the analysis of key competencies in data science, 3) the content analysis of the course titles. As a result, data science program in Korea has become a research-oriented professional curriculum based on an academic approach rather than a technical, vocational, and practitional view. In addition, it was confirmed that various courses were established with a focus on statistical analysis competency, and interdisciplinary characteristics based on information technology, statistics, and business administration were reflected in the curriculum.

Analysis of massive data in astronomy (천문학에서의 대용량 자료 분석)

  • Shin, Min-Su
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1107-1116
    • /
    • 2016
  • Recent astronomical survey observations have produced substantial amounts of data as well as completely changed conventional methods of analyzing astronomical data. Both classical statistical inference and modern machine learning methods have been used in every step of data analysis that range from data calibration to inferences of physical models. We are seeing the growing popularity of using machine learning methods in classical problems of astronomical data analysis due to low-cost data acquisition using cheap large-scale detectors and fast computer networks that enable us to share large volumes of data. It is common to consider the effects of inhomogeneous spatial and temporal coverage in the analysis of big astronomical data. The growing size of the data requires us to use parallel distributed computing environments as well as machine learning algorithms. Distributed data analysis systems have not been adopted widely for the general analysis of massive astronomical data. Gathering adequate training data is expensive in observation and learning data are generally collected from multiple data sources in astronomy; therefore, semi-supervised and ensemble machine learning methods will become important for the analysis of big astronomical data.