• 제목/요약/키워드: Data Generation

검색결과 6,489건 처리시간 0.037초

Enhanced Regular Expression as a DGL for Generation of Synthetic Big Data

  • Kai, Cheng;Keisuke, Abe
    • Journal of Information Processing Systems
    • /
    • 제19권1호
    • /
    • pp.1-16
    • /
    • 2023
  • Synthetic data generation is generally used in performance evaluation and function tests in data-intensive applications, as well as in various areas of data analytics, such as privacy-preserving data publishing (PPDP) and statistical disclosure limit/control. A significant amount of research has been conducted on tools and languages for data generation. However, existing tools and languages have been developed for specific purposes and are unsuitable for other domains. In this article, we propose a regular expression-based data generation language (DGL) for flexible big data generation. To achieve a general-purpose and powerful DGL, we enhanced the standard regular expressions to support the data domain, type/format inference, sequence and random generation, probability distributions, and resource reference. To efficiently implement the proposed language, we propose caching techniques for both the intermediate and database queries. We evaluated the proposed improvement experimentally.

상황 전파 네트워크를 이용한 확률기반 상황생성 모델 (Probability-Based Context-Generation Model with Situation Propagation Network)

  • 천성표;김성신
    • 로봇학회논문지
    • /
    • 제4권1호
    • /
    • pp.56-61
    • /
    • 2009
  • A probability-based data generation is a typical context-generation method that is a not only simple and strong data generation method but also easy to update generation conditions. However, the probability-based context-generation method has been found its natural-born ambiguousness and confliction problems in generated context data. In order to compensate for the disadvantages of the probabilistic random data generation method, a situation propagation network is proposed in this paper. The situation propagating network is designed to update parameters of probability functions are included in probability-based data generation model. The proposed probability-based context-generation model generates two kinds of contexts: one is related to independent contexts, and the other is related to conditional contexts. The results of the proposed model are compared with the results of the probabilitybased model with respect to performance, reduction of ambiguity, and confliction.

  • PDF

Sociodemographic and Health Related Factors Influencing Problem Drinking of the Echo Generation Using Data of the 2018 Korean National Health and Nutrition Examination Survey

  • Kwak, Minyeong
    • International Journal of Contents
    • /
    • 제17권1호
    • /
    • pp.54-60
    • /
    • 2021
  • The aim of this study was to identify factors influencing problem drinking among the Echo Generation in South Korea and provide basic data for early intervention and mediation of problem drinking among the Echo Generation. This descriptive study performed a secondary analysis of raw data from the 2018 Korean National Health and Nutrition Examination Survey and used responses for problem drinking items from 999 Echo Generation participants born between 1979 and 1992. This study comprehensively investigated sociodemographic and health-related factors influencing problem drinking among the Echo Generation. SPSS WIN program (version 26.0) was used for data analysis. Gender (β=-.32, p<.001), education level (β=.10, p=.002), white-collar workers out of job (β=-.09, p=.041), and depression (β=.11, p<.001) were identified as factors that influenced problem drinking among the Echo Generation. Results of this study suggest that in order to prevent problem drinking among the Echo Generation, there should be user-customized prevention education and intervention programs.

Hadoop Based Wavelet Histogram for Big Data in Cloud

  • Kim, Jeong-Joon
    • Journal of Information Processing Systems
    • /
    • 제13권4호
    • /
    • pp.668-676
    • /
    • 2017
  • Recently, the importance of big data has been emphasized with the development of smartphone, web/SNS. As a result, MapReduce, which can efficiently process big data, is receiving worldwide attention because of its excellent scalability and stability. Since big data has a large amount, fast creation speed, and various properties, it is more efficient to process big data summary information than big data itself. Wavelet histogram, which is a typical data summary information generation technique, can generate optimal data summary information that does not cause loss of information of original data. Therefore, a system applying a wavelet histogram generation technique based on MapReduce has been actively studied. However, existing research has a disadvantage in that the generation speed is slow because the wavelet histogram is generated through one or more MapReduce Jobs. And there is a high possibility that the error of the data restored by the wavelet histogram becomes large. However, since the wavelet histogram generation system based on the MapReduce developed in this paper generates the wavelet histogram through one MapReduce Job, the generation speed can be greatly increased. In addition, since the wavelet histogram is generated by adjusting the error boundary specified by the user, the error of the restored data can be adjusted from the wavelet histogram. Finally, we verified the efficiency of the wavelet histogram generation system developed in this paper through performance evaluation.

태양광 발전량 예측 도구별 입력 요소 분석 및 실제 발전량 비교에 관한 연구 (Comparison of Measured and Predicted Photovoltaic Electricity Generation and Input Options of Various Softwares)

  • 노상태
    • KIEAE Journal
    • /
    • 제14권6호
    • /
    • pp.87-92
    • /
    • 2014
  • The objectives of this study are to investigate input variables of photovoltaic generation programs and to compare their prediction to actual generation of photovoltaic system in the C city hall and the C city sewage treatment plant. We investigated the actual amount of generation, the forecast amount of generation, the amount of solar radiation data, and calculated the relative errors. We simulated the photovoltaic system of C city hall and the C city sewage treatment plant located in Chungju using existing programs, such as SAM, RETSCREEN, HOMER, PV SYST, Solar Pro. The result of this study are as follows : Through examining the relative errors of monthly predicted and actual generation data, monthly generation data showed big errors in winter season?. Except winter season, actual amount of generation and the predicted amount of generation showed no large errors.

선박의 전선해석 모델링 시스템을 위한 자료구조와 요소생성 알고리즘 개발 (A Development of Data Structure and Mesh Generation Algorithm for Global Ship Analysis Modeling System)

  • 김인일;최중효;조학종;서흥원
    • 한국CDE학회논문집
    • /
    • 제10권1호
    • /
    • pp.61-69
    • /
    • 2005
  • In the global ship structure and vibration analysis, the FE(finite element) analysis model is required in the early design stage before the 3D CAD model is defined. And the analysis model generation process is a time-consuming job and takes much more time than the engineering work itself. In particular, ship structure has too many associated structural members such as stringers, stiffness and girders etc. These structural members should be satisfied as the constraints in analysis modeling. Therefore it is necessary to support generation of analysis model with satisfying these constraints as an automatic manner. For the effective support of the global ship analysis modeling, a method to generate analysis model using initial design information within ship design process, that hull form offset data and compartment data, is developed. In order to easily handle initial design information and FE model information, flexible data structure is proposed. An automatic quadrilateral mesh generation algorithm using initial design information to satisfy the constraints imposed on the ship structure is also proposed. The proposed data structure and mesh generation algorithm are applied for the various type of vessels for the usability test. Through this test, we have verified the stability and usefulness of this system including mesh generation algorithm.

SAT를 기반으로 하는 플래그 변수가 있는 프로그램 테스팅을 위한 테스트 데이터 자동 생성 (Automated Test Data Generation for Testing Programs with Flag Variables Based on SAT)

  • 정인상
    • 정보처리학회논문지D
    • /
    • 제16D권3호
    • /
    • pp.371-380
    • /
    • 2009
  • 최근에 테스트 데이터를 자동으로 생성하는 방법에 관한 연구가 활발하게 진행되고 있다. 그러나 이러한 방법들은 플래그 변수가 프로그램에 존재하는 경우에는 효과적이지 못함이 밝혀졌다. 이는 엔진 제어기와 같은 내장형 시스템들이 전형적으로 디바이스 관련 상태 정보를 기록하기 위해 플래그 변수를 많이 이용한다는 점을 고려할 때 문제가 된다. 이 논문에서는 플래그 변수가 있는 프로그램에 대하여 효과적으로 테스트 데이터를 생성할 수 있는 방법을 소개한다. 이 방법은 테스트 데이터 생성 문제를 SAT(SATisfiability) 문제로 변환하고 SAT 해결도구를 이용하여 자동으로 테스트 데이터를 생성한다. 이를 위해 프로그램을 1차 관계 논리 언어인 Alloy로 변환하고 Alloy 분석기를 통하여 테스트 데이터를 생성한다.

다중 통합항법 시스템을 위한 랩뷰 기반의 사용자 인터페이스 설계 (LabVIEW-based User Interface Design for Multi-Integrated Navigation Systems)

  • 손재훈;정준우;오상헌;박준민;황동환
    • Journal of Positioning, Navigation, and Timing
    • /
    • 제13권1호
    • /
    • pp.75-83
    • /
    • 2024
  • In order to reduce the time and cost of developing a navigation system, a performance evaluation platform can be used. A User Interface (UI) is required to effectively evaluate the performance, which sets parameters and gives navigation sensor signals and data display, and also displays navigation results. In this paper, a LabVIEW-based UI design method for multi-integrated navigation systems is proposed and implementation results are presented. The UI consists of a signal and data generation part and a signal and data processing part. The signal and data generation part sets parameters for the signal and data generation and displays the navigation sensor signal and data generation results. The signal and data processing part sets parameters for the signal and data processing and displays the navigation results. The signal and data generation part and signal and data processing part are designed to satisfy the requirements of the UI for a performance evaluation of the navigation system. In order to show the usefulness of the proposed UI design method, parameters of the signal and data generation and the signal and data processing are set through the LabVIEW-based UI, and the Global Positioning System (GPS) signal and inertial measurement unit data generation results and the navigation results of a GPS Software Defined Receiver (SDR) and inertial navigation system are confirmed. The implementation results show that the proposed UI design method helps users conduct an effective performance evaluation of navigation systems.

파형 비교를 이용한 데이터 압축 기법 (Data Compression using Waveform Comparison)

  • 성성만;임성일;이승재;배영준;진용우;김정한;김병진
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2003년도 하계학술대회 논문집 A
    • /
    • pp.151-153
    • /
    • 2003
  • This paper was studied on using data of the protective relay by Power Quality Monitor. The protective relay will have problem to save data for PQM(Power Qualify Monitoring) analysis because the protective relay memory is limited. So this paper was proposed new data compression of the data was get from the protective relay. The scheme is compared each cycle after DFT(Discrete Fourier Transform). And scheme is verified through simulation of protective relay data was from real distribution system.

  • PDF

Z세대 패션에 대한 소셜미디어의 빅데이터 분석 (Social media big data analysis of Z-generation fashion)

  • 성광숙
    • 한국의상디자인학회지
    • /
    • 제22권3호
    • /
    • pp.49-61
    • /
    • 2020
  • This study analyzed the social media accounts and performed a Big Data analysis of Z-generation fashion using Textom Text Mining Techniques program and Ucinet Big Data analysis program. The research results are as follows: First, as a result of keyword analysis on 67.646 Z-generation fashion social media posts over the last 5 years, 220,211 keywords were extracted. Among them, 67 major keywords were selected based on the frequency of co-occurrence being greater than more than 250 times. As the top keywords appearing over 1000 times, were the most influential as the number of nodes connected to 'Z generation' (29595 times) are overwhelmingly, and was followed by 'millennials'(18536 times), 'fashion'(17836 times), and 'generation'(13055 times), 'brand'(8325 times) and 'trend'(7310 times) Second, as a result of the analysis of Network Degree Centrality between the key keywords for the Z-generation, the number of nodes connected to the "Z-generation" (29595 times) is overwhelmingly large. Next, many 'millennial'(18536 times), 'fashion'(17836 times), 'generation'(13055 times), 'brand'(8325 times), 'trend'(7310 times), etc. appear. These texts are considered to be important factors in exploring the reaction of social media to the Z-generation. Third, through the analysis of CONCOR, text with the structural equivalence between major keywords for Gen Z fashion was rearranged and clustered. In addition, four clusters were derived by grouping through network semantic network visualization. Group 1 is 54 texts, 'Diverse Characteristics of Z-Generation Fashion Consumers', Group 2 is 7 Texts, 'Z-Generation's teenagers Fashion Powers', Group 3 is 8 Texts, 'Z-Generation's Celebrity Fashions' Interest and Fashion', Group 4 named 'Gucci', the most popular luxury fashion of the Z-generation as one text.