• Title/Summary/Keyword: Unstructured data analysis

Search Result 422, Processing Time 0.029 seconds

Crafting a Quality Performance Evaluation Model Leveraging Unstructured Data (비정형데이터를 활용한 건축현장 품질성과 평가 모델 개발)

  • Lee, Kiseok;Song, Taegeun;Yoo, Wi Sung
    • Journal of the Korea Institute of Building Construction
    • /
    • v.24 no.1
    • /
    • pp.157-168
    • /
    • 2024
  • The frequent occurrence of structural failures at building construction sites in Korea has underscored the critical role of rigorous oversight in the inspection and management of construction projects. As mandated by prevailing regulations and standards, onsite supervision by designated supervisors encompasses thorough documentation of construction quality, material standards, and the history of any reconstructions, among other factors. These reports, predominantly consisting of unstructured data, constitute approximately 80% of the data amassed at construction sites and serve as a comprehensive repository of quality-related information. This research introduces the SL-QPA model, which employs text mining techniques to preprocess supervision reports and establish a sentiment dictionary, thereby enabling the quantification of quality performance. The study's findings, demonstrating a statistically significant Pearson correlation between the quality performance scores derived from the SL-QPA model and various legally defined indicators, were substantiated through a one-way analysis of variance of the correlation coefficients. The SL-QPA model, as developed in this study, offers a supplementary approach to evaluating the quality performance of building construction projects. It holds the promise of enhancing quality inspection and management practices by harnessing the wealth of unstructured data generated throughout the lifecycle of construction projects.

A Study on Patent Data Analysis and Competitive Advantage Strategy using TF-IDF and Network Analysis (TF-IDF와 네트워크분석을 이용한 특허 데이터 분석과 경쟁우위 전략수립에 관한 연구)

  • Yun, Seok-Yong;Han, Kyeong-Seok
    • Journal of Digital Contents Society
    • /
    • v.19 no.3
    • /
    • pp.529-535
    • /
    • 2018
  • Data is explosively growing, but many companies are still using data analysis only for descriptive analysis or diagnostic analysis, and not appropriately for predictive analysis or enterprise technology strategy analysis. In this study, we analyze the structured & unstructured patent data such as IPC code, inventor, filing date and so on by using big data analysis techniques such as network analysis and TF-IDF. Through this analysis, we propose analysis process to understand the core technology and technology distribution of competitors and prove it through data analysis.

Expression and Purification of Unstructured Protein, IMUP-1, using Chaperone Co-expression System for NMR Study

  • Yi, Jong-Jae;Yoo, Jung Ki;Kim, Jin Kyeoung;Son, Woo Sung
    • Journal of the Korean Magnetic Resonance Society
    • /
    • v.17 no.1
    • /
    • pp.30-39
    • /
    • 2013
  • Immortalization-upregulated protein-1 (IMUP-1) genes have been cloned and are known to be involved in SV40-mediated immortalization. IMUP-1 gene is highly expressed in various cancer cell lines and tumors, suggesting the possibility that they might be involved in tumorigenicity. Previously, there were several problems for overexpression of IMUP-1 in bacterial expression systems including low solubility and aggregation due to unstructured property. To investigate the structural properties, it is necessary to obtain lots of pure and soluble proteins. Accordingly, the co-expression systems of bacterial chaperone proteins, GroEL-GroES, were used to increase solubility of IMUP-1. From the analysis of NMR and CD experiment data, it is suggested that the protein adopt typical the random coil properties in solution.

Convergence Analysis on Policy Decision Making Factor of Local Construction Planning Phase by Using Unstructured Data in point of the Technology and Culture (비정형 데이터 분석을 통한 기술과 문화의 융합적 관점의 지역 건설기획단계 정책의사결정 영향요인 분석)

  • Park, Eun Soo;Kim, Ji Eun
    • Korea Science and Art Forum
    • /
    • v.23
    • /
    • pp.149-162
    • /
    • 2016
  • Here are background, method, scope, main contents of this research. As the interests increased in recent about the construction in complex and diverse areas, construction is locally connected to human life like to coexistence of the technology and culture. The local development should not be fragmentary construction to improve local recycling ability. Local society should be inherited by modern cultural perspective through a variety of local culture and coexistence. Effective decision making analysis is necessary to build a livable area with a combination of high-tech industry. For this reason, this paper will study the political analysis for decision making at the planning stage of construction in point of fusion of technology and culture by using unstructured data analysis. Conclusion is as in the following. Local planning stage of construction describes diverse meanings of intangible and intangible factors as political factor. Technology factors have various qualitative and quantitative factors in construction field. Understanding decision making at the planning stage of construction means not only visible 'technology factor' such as structure, method, shape, and so on, but also invisible 'culture factor' such as spirit of age, religion, learning, and life-style reflected in formation process of space, and insight of brain power about art.

Optimal cluster formation in cluster-basedmobile P2P algorithm (클러스터 기반 모바일 P2P 알고리즘의 최적 클러스터 구성)

  • Wu, Hyuk;Lee, Dong-Jun
    • Journal of Advanced Navigation Technology
    • /
    • v.15 no.2
    • /
    • pp.204-212
    • /
    • 2011
  • Mobile P2P(Peer-to-Peer) protocols in MANET(mobile ad-hoc networks) have gained much attention recently. Existing P2P protocols can be categorized into structured and unstructured ones. In MANET, structured P2P protocols show large control traffic because they does not consider the locality of P2P data and unstructured P2P protocols have a scalability problem with respect to the number of nodes. Hybrid P2P protocols combine advantages of the structured and unstructured P2P protocols. Cluster-based P2P protocol is one of the hybrid P2P protocols. Our study makes an analysis of the cluster-based P2P protocol and derives the optimal cluster formation in MANET. In the derived optimal cluster formation, the cluster-based P2P protocol shows better performance than Gnutella protocol with respect to control traffic.

Comparison of Neural Network Techniques for Text Data Analysis

  • Kim, Munhee;Kang, Kee-Hoon
    • International Journal of Advanced Culture Technology
    • /
    • v.8 no.2
    • /
    • pp.231-238
    • /
    • 2020
  • Generally, sequential data refers to data having continuity. Text data, which is a representative type of unstructured data, is also sequential data in that it is necessary to know the meaning of the preceding word in order to know the meaning of the following word or context. So far, many techniques for analyzing sequential data such as text data have been proposed. In this paper, four methods of 1d-CNN, LSTM, BiLSTM, and C-LSTM are introduced, focusing on neural network techniques. In addition, by using this, IMDb movie review data was classified into two classes to compare the performance of the techniques in terms of accuracy and analysis time.

Feature-selection algorithm based on genetic algorithms using unstructured data for attack mail identification (공격 메일 식별을 위한 비정형 데이터를 사용한 유전자 알고리즘 기반의 특징선택 알고리즘)

  • Hong, Sung-Sam;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.1-10
    • /
    • 2019
  • Since big-data text mining extracts many features and data, clustering and classification can result in high computational complexity and low reliability of the analysis results. In particular, a term document matrix obtained through text mining represents term-document features, but produces a sparse matrix. We designed an advanced genetic algorithm (GA) to extract features in text mining for detection model. Term frequency inverse document frequency (TF-IDF) is used to reflect the document-term relationships in feature extraction. Through a repetitive process, a predetermined number of features are selected. And, we used the sparsity score to improve the performance of detection model. If a spam mail data set has the high sparsity, detection model have low performance and is difficult to search the optimization detection model. In addition, we find a low sparsity model that have also high TF-IDF score by using s(F) where the numerator in fitness function. We also verified its performance by applying the proposed algorithm to text classification. As a result, we have found that our algorithm shows higher performance (speed and accuracy) in attack mail classification.

Big Data Analysis of the Women Who Score Goal Sports Entertainment Program: Focusing on Text Mining and Semantic Network Analysis.

  • Hyun-Myung, Kim;Kyung-Won, Byun
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.1
    • /
    • pp.222-230
    • /
    • 2023
  • The purpose of this study is to provide basic data on sports entertainment programs by collecting data on unstructured data generated by Naver and Google for SBS entertainment program 'Women Who Score Goal', which began regular broadcast in June 2021, and analyzing public perceptions through data mining, semantic matrix, and CONCOR analysis. Data collection was conducted using Textom, and 27,911 cases of data accumulated for 16 months from June 16, 2021 to October 15, 2022. For the collected data, 80 key keywords related to 'Kick a Goal' were derived through simple frequency and TF-IDF analysis through data mining. Semantic network analysis was conducted to analyze the relationship between the top 80 keywords analyzed through this process. The centrality was derived through the UCINET 6.0 program using NetDraw of UCINET 6.0, understanding the characteristics of the network, and visualizing the connection relationship between keywords to express it clearly. CONCOR analysis was conducted to derive a cluster of words with similar characteristics based on the semantic network. As a result of the analysis, it was analyzed as a 'program' cluster related to the broadcast content of 'Kick a Goal' and a 'Soccer' cluster, a sports event of 'Kick a Goal'. In addition to the scenes about the game of the cast, it was analyzed as an 'Everyday Life' cluster about training and daily life, and a cluster about 'Broadcast Manipulation' that disappointed viewers with manipulation of the game content.

A Content Analysis of the Trends in Vision Research With Focus on Visual Search, Eye Movement, and Eye Track

  • Rhie, Ye Lim;Lim, Ji Hyoun;Yun, Myung Hwan
    • Journal of the Ergonomics Society of Korea
    • /
    • v.33 no.1
    • /
    • pp.69-76
    • /
    • 2014
  • Objective: This study aims to present literature providing researchers with insights on specific fields of research and highlighting the major issues in the research topics. A systematic review is suggested using content analysis on literatures regarding "visual search", "eye movement", and "eye track". Background: Literature review can be classified as "narrative" or "systematic" depending on its approach in structuring the content of the research. Narrative review is a traditional approach that describes the current state of a study field and discusses relevant topics. However, since literatures on specific area cover a broad range, reviewers inherently give subjective weight on specific issues. On the contrary, systematic review applies explicit structured methodology to observe the study trends quantitatively. Method: We collected meta-data of journal papers using three search keywords: visual search, eye movement, and eye track. The collected information contains an unstructured data set including many natural languages which compose titles and abstracts, while the keyword of the journal paper is the only structured one. Based on the collected terms, seven categories were evaluated by inductive categorization and quantitative analysis from the chronological trend of the research area. Results: Unstructured information contains heavier content on "stimuli" and "condition" categories as compared with structured information. Studies on visual search cover a wide range of cognitive area whereas studies on eye movement and eye track are closely related to the physiological aspect. In addition, experimental studies show an increasing trend as opposed to the theoretical studies. Conclusion: By systematic review, we could quantitatively identify the characteristic of the research keyword which presented specific research topics. We also found out that the structured information was more suitable to observe the aim of the research. Chronological analysis on the structured keyword data showed that studies on "physical eye movement" and "cognitive process" were jointly studied in increasing fashion. Application: While conventional narrative literature reviews were largely dependent on authors' instinct, quantitative approach enabled more objective and macroscopic views. Moreover, the characteristics of information type were specified by comparing unstructured and structured information. Systematic literature review also could be used to support the authors' instinct in narrative literature reviews.

Personal Sentiment Analysis and Opinion Mining (개인감정분석과 마이닝)

  • Lee, Hyun Chang;Shin, Seong Yoon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2017.07a
    • /
    • pp.344-345
    • /
    • 2017
  • Opinion mining and sentiment analysis(OMSA) as a research discipline has emerged during last 15 years and provides a methodology to computationally process the unstructured data mainly to extract opinions and identify their sentiments. The relatively new but fast growing research discipline has changed a lot during these years. This paper presents a scientometric analysis of research work done on OMSA during 2007-2016. For the literature analysis, research publications indexed in Web of Science (WoS) database are used as input data. The publication data is analyzed computationally to identify year-wise publication pattern, rate of growth of publications, research areas.

  • PDF