• Title/Summary/Keyword: Data Generation

Search Result 6,463, Processing Time 0.034 seconds

Enhanced Regular Expression as a DGL for Generation of Synthetic Big Data

  • Kai, Cheng;Keisuke, Abe
    • Journal of Information Processing Systems
    • /
    • v.19 no.1
    • /
    • pp.1-16
    • /
    • 2023
  • Synthetic data generation is generally used in performance evaluation and function tests in data-intensive applications, as well as in various areas of data analytics, such as privacy-preserving data publishing (PPDP) and statistical disclosure limit/control. A significant amount of research has been conducted on tools and languages for data generation. However, existing tools and languages have been developed for specific purposes and are unsuitable for other domains. In this article, we propose a regular expression-based data generation language (DGL) for flexible big data generation. To achieve a general-purpose and powerful DGL, we enhanced the standard regular expressions to support the data domain, type/format inference, sequence and random generation, probability distributions, and resource reference. To efficiently implement the proposed language, we propose caching techniques for both the intermediate and database queries. We evaluated the proposed improvement experimentally.

Probability-Based Context-Generation Model with Situation Propagation Network (상황 전파 네트워크를 이용한 확률기반 상황생성 모델)

  • Cheon, Seong-Pyo;Kim, Sung-Shin
    • The Journal of Korea Robotics Society
    • /
    • v.4 no.1
    • /
    • pp.56-61
    • /
    • 2009
  • A probability-based data generation is a typical context-generation method that is a not only simple and strong data generation method but also easy to update generation conditions. However, the probability-based context-generation method has been found its natural-born ambiguousness and confliction problems in generated context data. In order to compensate for the disadvantages of the probabilistic random data generation method, a situation propagation network is proposed in this paper. The situation propagating network is designed to update parameters of probability functions are included in probability-based data generation model. The proposed probability-based context-generation model generates two kinds of contexts: one is related to independent contexts, and the other is related to conditional contexts. The results of the proposed model are compared with the results of the probabilitybased model with respect to performance, reduction of ambiguity, and confliction.

  • PDF

Sociodemographic and Health Related Factors Influencing Problem Drinking of the Echo Generation Using Data of the 2018 Korean National Health and Nutrition Examination Survey

  • Kwak, Minyeong
    • International Journal of Contents
    • /
    • v.17 no.1
    • /
    • pp.54-60
    • /
    • 2021
  • The aim of this study was to identify factors influencing problem drinking among the Echo Generation in South Korea and provide basic data for early intervention and mediation of problem drinking among the Echo Generation. This descriptive study performed a secondary analysis of raw data from the 2018 Korean National Health and Nutrition Examination Survey and used responses for problem drinking items from 999 Echo Generation participants born between 1979 and 1992. This study comprehensively investigated sociodemographic and health-related factors influencing problem drinking among the Echo Generation. SPSS WIN program (version 26.0) was used for data analysis. Gender (β=-.32, p<.001), education level (β=.10, p=.002), white-collar workers out of job (β=-.09, p=.041), and depression (β=.11, p<.001) were identified as factors that influenced problem drinking among the Echo Generation. Results of this study suggest that in order to prevent problem drinking among the Echo Generation, there should be user-customized prevention education and intervention programs.

Hadoop Based Wavelet Histogram for Big Data in Cloud

  • Kim, Jeong-Joon
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.668-676
    • /
    • 2017
  • Recently, the importance of big data has been emphasized with the development of smartphone, web/SNS. As a result, MapReduce, which can efficiently process big data, is receiving worldwide attention because of its excellent scalability and stability. Since big data has a large amount, fast creation speed, and various properties, it is more efficient to process big data summary information than big data itself. Wavelet histogram, which is a typical data summary information generation technique, can generate optimal data summary information that does not cause loss of information of original data. Therefore, a system applying a wavelet histogram generation technique based on MapReduce has been actively studied. However, existing research has a disadvantage in that the generation speed is slow because the wavelet histogram is generated through one or more MapReduce Jobs. And there is a high possibility that the error of the data restored by the wavelet histogram becomes large. However, since the wavelet histogram generation system based on the MapReduce developed in this paper generates the wavelet histogram through one MapReduce Job, the generation speed can be greatly increased. In addition, since the wavelet histogram is generated by adjusting the error boundary specified by the user, the error of the restored data can be adjusted from the wavelet histogram. Finally, we verified the efficiency of the wavelet histogram generation system developed in this paper through performance evaluation.

Comparison of Measured and Predicted Photovoltaic Electricity Generation and Input Options of Various Softwares (태양광 발전량 예측 도구별 입력 요소 분석 및 실제 발전량 비교에 관한 연구)

  • No, Sang-Tae
    • KIEAE Journal
    • /
    • v.14 no.6
    • /
    • pp.87-92
    • /
    • 2014
  • The objectives of this study are to investigate input variables of photovoltaic generation programs and to compare their prediction to actual generation of photovoltaic system in the C city hall and the C city sewage treatment plant. We investigated the actual amount of generation, the forecast amount of generation, the amount of solar radiation data, and calculated the relative errors. We simulated the photovoltaic system of C city hall and the C city sewage treatment plant located in Chungju using existing programs, such as SAM, RETSCREEN, HOMER, PV SYST, Solar Pro. The result of this study are as follows : Through examining the relative errors of monthly predicted and actual generation data, monthly generation data showed big errors in winter season?. Except winter season, actual amount of generation and the predicted amount of generation showed no large errors.

A Development of Data Structure and Mesh Generation Algorithm for Global Ship Analysis Modeling System (선박의 전선해석 모델링 시스템을 위한 자료구조와 요소생성 알고리즘 개발)

  • Kim I.I.;Choi J.H.;Jo H.J.;Suh H.W.
    • Korean Journal of Computational Design and Engineering
    • /
    • v.10 no.1
    • /
    • pp.61-69
    • /
    • 2005
  • In the global ship structure and vibration analysis, the FE(finite element) analysis model is required in the early design stage before the 3D CAD model is defined. And the analysis model generation process is a time-consuming job and takes much more time than the engineering work itself. In particular, ship structure has too many associated structural members such as stringers, stiffness and girders etc. These structural members should be satisfied as the constraints in analysis modeling. Therefore it is necessary to support generation of analysis model with satisfying these constraints as an automatic manner. For the effective support of the global ship analysis modeling, a method to generate analysis model using initial design information within ship design process, that hull form offset data and compartment data, is developed. In order to easily handle initial design information and FE model information, flexible data structure is proposed. An automatic quadrilateral mesh generation algorithm using initial design information to satisfy the constraints imposed on the ship structure is also proposed. The proposed data structure and mesh generation algorithm are applied for the various type of vessels for the usability test. Through this test, we have verified the stability and usefulness of this system including mesh generation algorithm.

Automated Test Data Generation for Testing Programs with Flag Variables Based on SAT (SAT를 기반으로 하는 플래그 변수가 있는 프로그램 테스팅을 위한 테스트 데이터 자동 생성)

  • Chung, In-Sang
    • The KIPS Transactions:PartD
    • /
    • v.16D no.3
    • /
    • pp.371-380
    • /
    • 2009
  • Recently, lots of research on automated test data generation has been actively done. However, techniques for automated test data generation presented so far have been proved ineffective for programs with flag variables. It can present problems when considering embedded systems such as engine controllers that make extensive use of flag variables to record state information concerning devices. This paper introduces a technique for generating test data effectively for programs with flag variables. The presented technique transforms the test data generation problem into a SAT(SATisfiability) problem and makes advantage of SAT solvers for automated test data generation(ATDG). For the ends, we transform a program under test into Alloy which is the first-order relational logic and then produce test data via Alloy analyzer.

LabVIEW-based User Interface Design for Multi-Integrated Navigation Systems (다중 통합항법 시스템을 위한 랩뷰 기반의 사용자 인터페이스 설계)

  • Jae Hoon Son;Junwoo Jung;Sang Heon Oh;JunMin Park;Dong-Hwan Hwang
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.13 no.1
    • /
    • pp.75-83
    • /
    • 2024
  • In order to reduce the time and cost of developing a navigation system, a performance evaluation platform can be used. A User Interface (UI) is required to effectively evaluate the performance, which sets parameters and gives navigation sensor signals and data display, and also displays navigation results. In this paper, a LabVIEW-based UI design method for multi-integrated navigation systems is proposed and implementation results are presented. The UI consists of a signal and data generation part and a signal and data processing part. The signal and data generation part sets parameters for the signal and data generation and displays the navigation sensor signal and data generation results. The signal and data processing part sets parameters for the signal and data processing and displays the navigation results. The signal and data generation part and signal and data processing part are designed to satisfy the requirements of the UI for a performance evaluation of the navigation system. In order to show the usefulness of the proposed UI design method, parameters of the signal and data generation and the signal and data processing are set through the LabVIEW-based UI, and the Global Positioning System (GPS) signal and inertial measurement unit data generation results and the navigation results of a GPS Software Defined Receiver (SDR) and inertial navigation system are confirmed. The implementation results show that the proposed UI design method helps users conduct an effective performance evaluation of navigation systems.

Data Compression using Waveform Comparison (파형 비교를 이용한 데이터 압축 기법)

  • Sung, S.M.;Lim, S.I.;Lee, S.J.;Bae, Y.J.;Jin, Y.W.;Kim, J.H.;Kim, B.J.
    • Proceedings of the KIEE Conference
    • /
    • 2003.07a
    • /
    • pp.151-153
    • /
    • 2003
  • This paper was studied on using data of the protective relay by Power Quality Monitor. The protective relay will have problem to save data for PQM(Power Qualify Monitoring) analysis because the protective relay memory is limited. So this paper was proposed new data compression of the data was get from the protective relay. The scheme is compared each cycle after DFT(Discrete Fourier Transform). And scheme is verified through simulation of protective relay data was from real distribution system.

  • PDF

Social media big data analysis of Z-generation fashion (Z세대 패션에 대한 소셜미디어의 빅데이터 분석)

  • Sung, Kwang-Sook
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.22 no.3
    • /
    • pp.49-61
    • /
    • 2020
  • This study analyzed the social media accounts and performed a Big Data analysis of Z-generation fashion using Textom Text Mining Techniques program and Ucinet Big Data analysis program. The research results are as follows: First, as a result of keyword analysis on 67.646 Z-generation fashion social media posts over the last 5 years, 220,211 keywords were extracted. Among them, 67 major keywords were selected based on the frequency of co-occurrence being greater than more than 250 times. As the top keywords appearing over 1000 times, were the most influential as the number of nodes connected to 'Z generation' (29595 times) are overwhelmingly, and was followed by 'millennials'(18536 times), 'fashion'(17836 times), and 'generation'(13055 times), 'brand'(8325 times) and 'trend'(7310 times) Second, as a result of the analysis of Network Degree Centrality between the key keywords for the Z-generation, the number of nodes connected to the "Z-generation" (29595 times) is overwhelmingly large. Next, many 'millennial'(18536 times), 'fashion'(17836 times), 'generation'(13055 times), 'brand'(8325 times), 'trend'(7310 times), etc. appear. These texts are considered to be important factors in exploring the reaction of social media to the Z-generation. Third, through the analysis of CONCOR, text with the structural equivalence between major keywords for Gen Z fashion was rearranged and clustered. In addition, four clusters were derived by grouping through network semantic network visualization. Group 1 is 54 texts, 'Diverse Characteristics of Z-Generation Fashion Consumers', Group 2 is 7 Texts, 'Z-Generation's teenagers Fashion Powers', Group 3 is 8 Texts, 'Z-Generation's Celebrity Fashions' Interest and Fashion', Group 4 named 'Gucci', the most popular luxury fashion of the Z-generation as one text.