• Title/Summary/Keyword: Generate Data

Search Result 3,066, Processing Time 0.03 seconds

Improvement of topic modeling and case analysis through convergence of Bertopic and TextRank (버토픽과 텍스트랭크의 융합을 통한 토픽모델링의 개선 및 사례 분석)

  • Kim, Keun Hyung;Kang Jae Jung
    • The Journal of Information Systems
    • /
    • v.33 no.3
    • /
    • pp.105-121
    • /
    • 2024
  • Purpose The purpose of this paper is to develop a method to improve topic representation by incorporating the TextRank technique in Bertopic-based topic modeling and additional indicators for determining the optimal number of topics. Design/methodology/approach In this paper, we propose a method to extract important documents from documents assigned to each topic of a topic model using the TextRank technique, and to calculate secondary diversity and generate topic representations based on the results. First, we integrate the TextRank algorithm into the Bertopic-based topic modeling process to set local secondary labels for each topic. The secondary labels of each topic are derived through extractive summarization based on the TextRank algorithm. Second, we improve the accuracy of selecting the optimal number of topics by calculating the secondary diversity index based on the extractive summary results of each topic. Third, we improve the efficiency by utilizing ChatGPT when deriving the labels of each topic. Findings As a result of performing case analysis and analysis evaluation using the proposed method, it was confirmed that topic representation based on TextRank results generated more accurate topic labels and that the secondary diversity index was a more effective index for determining the optimal number of topics.

Finding Weighted Sequential Patterns over Data Streams via a Gap-based Weighting Approach (발생 간격 기반 가중치 부여 기법을 활용한 데이터 스트림에서 가중치 순차패턴 탐색)

  • Chang, Joong-Hyuk
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.55-75
    • /
    • 2010
  • Sequential pattern mining aims to discover interesting sequential patterns in a sequence database, and it is one of the essential data mining tasks widely used in various application fields such as Web access pattern analysis, customer purchase pattern analysis, and DNA sequence analysis. In general sequential pattern mining, only the generation order of data element in a sequence is considered, so that it can easily find simple sequential patterns, but has a limit to find more interesting sequential patterns being widely used in real world applications. One of the essential research topics to compensate the limit is a topic of weighted sequential pattern mining. In weighted sequential pattern mining, not only the generation order of data element but also its weight is considered to get more interesting sequential patterns. In recent, data has been increasingly taking the form of continuous data streams rather than finite stored data sets in various application fields, the database research community has begun focusing its attention on processing over data streams. The data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. In data stream processing, each data element should be examined at most once to analyze the data stream, and the memory usage for data stream analysis should be restricted finitely although new data elements are continuously generated in a data stream. Moreover, newly generated data elements should be processed as fast as possible to produce the up-to-date analysis result of a data stream, so that it can be instantly utilized upon request. To satisfy these requirements, data stream processing sacrifices the correctness of its analysis result by allowing some error. Considering the changes in the form of data generated in real world application fields, many researches have been actively performed to find various kinds of knowledge embedded in data streams. They mainly focus on efficient mining of frequent itemsets and sequential patterns over data streams, which have been proven to be useful in conventional data mining for a finite data set. In addition, mining algorithms have also been proposed to efficiently reflect the changes of data streams over time into their mining results. However, they have been targeting on finding naively interesting patterns such as frequent patterns and simple sequential patterns, which are found intuitively, taking no interest in mining novel interesting patterns that express the characteristics of target data streams better. Therefore, it can be a valuable research topic in the field of mining data streams to define novel interesting patterns and develop a mining method finding the novel patterns, which will be effectively used to analyze recent data streams. This paper proposes a gap-based weighting approach for a sequential pattern and amining method of weighted sequential patterns over sequence data streams via the weighting approach. A gap-based weight of a sequential pattern can be computed from the gaps of data elements in the sequential pattern without any pre-defined weight information. That is, in the approach, the gaps of data elements in each sequential pattern as well as their generation orders are used to get the weight of the sequential pattern, therefore it can help to get more interesting and useful sequential patterns. Recently most of computer application fields generate data as a form of data streams rather than a finite data set. Considering the change of data, the proposed method is mainly focus on sequence data streams.

Correction of the Sea Effect in the Magnetotelluric (MT) Data Using an Iterative Tensor Stripping During Inversion (MT 자료 역산과정에서 반복적인 Tensor Stripping을 통한 해양효과 보정)

  • Yang, Jun-Mo;Lee, Chun-Ki;Yoo, Hai-Soo
    • Geophysics and Geophysical Exploration
    • /
    • v.11 no.4
    • /
    • pp.286-301
    • /
    • 2008
  • When magnetotelluric (MT) data are obtained in vicinity of the coast, the sea can distort observed MT responses, especially those of deep part of subsurface. We introduce an iterative method to correct the sea effect, based on the previous topographic correction method which removes the distortions due to topographic changes in seafloor MT data. The method first corrects the sea effect in observed MT impedance, and then inverts corrected responses in a model space without the sea. Due to mutual coupling between sea and subsurface structure, the correction and inversion steps are iterated until changes in each result become negligible. The method is validated for 1-D and 2-D structure using synthetic MT data produced by 3-D forward modeling including surrounding seas. In all cases, the method closely recovers the given structure after a few iterations. To test the applicability of the proposed method to field data, we generate synthetic MT data for the Jeju Island whose 1-D conductivity structure is well known, using 3-D forward modeling. The distortions due to the surrounding sea start to appear below the frequency about 1 Hz, and are relatively severe in the electrical field perpendicular to the coastline because of the location of the observation sites. The proposed method successfully eliminates the sea effect after three iterations, and both 1-D and 2-D inversion of corrected responses closely recover the given subsurface structure of the Jeju Island model.

A Prototype for Real-time Indoor Evacuation Simulation System using Indoor IR Sensor Information (적외선 센서정보기반 실시간 실내 대피시뮬레이션 시스템 프로토타입)

  • Nam, Hyun-Woo;Kwak, Su-Yeong;Jun, Chul-Min
    • Spatial Information Research
    • /
    • v.20 no.2
    • /
    • pp.155-164
    • /
    • 2012
  • Indoor fire simulators have been used to analyse building safety in the events of emergency evacuation. These applications are primarily focused on simulating evacuation behaviors for the purpose of checking building structural problems in normal time rather than in real time situations. Therefore, they have limitations in handling real-time evacuation events with the following reasons. First, the existing models mostly experiment the artificial situations using randomly generated evacuees while real world requires actual data. Second, they take too long time in operation to generate real time data. Third, they do not produce optimal results to be used in rescueing or evacuation guidance. In order to solve these limitations, we suggest a method to build an evacuation simulation system that can be used in real-world emergency situations. The system performs numerous simulations in advance according to varying distributions of occupants. Then the resulting data are stored in DBMS. The actual person data captured in infrared sensor network are compared with the simulation data in DBMS and the querried data most closely is provided to the user. The developed system is tested using a campus building and the suggested processes are illustrated.

Normalized Digital Surface Model Extraction and Slope Parameter Determination through Region Growing of UAV Data (무인항공기 데이터의 영역 확장법 적용을 통한 정규수치표면모델 추출 및 경사도 파라미터 설정)

  • Yeom, Junho;Lee, Wonhee;Kim, Taeheon;Han, Youkyung
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.6
    • /
    • pp.499-506
    • /
    • 2019
  • NDSM (Normalized Digital Surface Model) is key information for the detailed analysis of remote sensing data. Although NDSM can be simply obtained by subtracting a DTM (Digital Terrain Model) from a DSM (Digital Surface Model), in case of UAV (Unmanned Aerial Vehicle) data, it is difficult to get an accurate DTM due to high resolution characteristics of UAV data containing a large number of complex objects on the ground such as vegetation and urban structures. In this study, RGB-based UAV vegetation index, ExG (Excess Green) was used to extract initial seed points having low ExG values for region growing such that a DTM can be generated cost-effectively based on high resolution UAV data. For this process, local window analysis was applied to resolve the problem of erroneous seed point extraction from local low ExG points. Using the DSM values of seed points, region growing was applied to merge neighboring terrain pixels. Slope criteria were adopted for the region growing process and the seed points were determined as terrain points in case the size of segments is larger than 0.25 ㎡. Various slope criteria were tested to derive the optimized value for UAV data-based NDSM generation. Finally, the extracted terrain points were evaluated and interpolation was performed using the terrain points to generate an NDSM. The proposed method was applied to agricultural area in order to extract the above ground heights of crops and check feasibility of agricultural monitoring.

Comparison of Spatio-temporal Fusion Models of Multiple Satellite Images for Vegetation Monitoring (식생 모니터링을 위한 다중 위성영상의 시공간 융합 모델 비교)

  • Kim, Yeseul;Park, No-Wook
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.6_3
    • /
    • pp.1209-1219
    • /
    • 2019
  • For consistent vegetation monitoring, it is necessary to generate time-series vegetation index datasets at fine temporal and spatial scales by fusing the complementary characteristics between temporal and spatial scales of multiple satellite data. In this study, we quantitatively and qualitatively analyzed the prediction accuracy of time-series change information extracted from spatio-temporal fusion models of multiple satellite data for vegetation monitoring. As for the spatio-temporal fusion models, we applied two models that have been widely employed to vegetation monitoring, including a Spatial and Temporal Adaptive Reflectance Fusion Model (STARFM) and an Enhanced Spatial and Temporal Adaptive Reflectance Fusion Model (ESTARFM). To quantitatively evaluate the prediction accuracy, we first generated simulated data sets from MODIS data with fine temporal scales and then used them as inputs for the spatio-temporal fusion models. We observed from the comparative experiment that ESTARFM showed better prediction performance than STARFM, but the prediction performance for the two models became degraded as the difference between the prediction date and the simultaneous acquisition date of the input data increased. This result indicates that multiple data acquired close to the prediction date should be used to improve the prediction accuracy. When considering the limited availability of optical images, it is necessary to develop an advanced spatio-temporal model that can reflect the suggestions of this study for vegetation monitoring.

Development of a Prediction Model for Advertising Effects of Celebrity Models using Big data Analysis (빅데이터 분석을 통한 유명인 모델의 광고효과 예측 모형 개발)

  • Kim, Yuna;Han, Sangpil
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.8
    • /
    • pp.99-106
    • /
    • 2020
  • The purpose of this study is to find out whether image similarity between celebrities and brands on social network service be a determinant to predict advertising effectiveness. To this end, an advertising effect prediction model for celebrity endorsed advertising was created and its validity was verified through a machine learning method which is a big data analysis technique. Firstly, the celebrity-brand image similarity, which was used as an independent variable, was quantified by the association network theory with social big data, and secondly a multiple regression model which used data representing advertising effects as a dependent variable was repeatedly conducted to generate an advertising effect prediction model. The accuracy of the prediction model was decided by comparing the prediction results with the survey outcomes. As for a result, it was proved that the validity of the predictive modeling of advertising effects was secured since the classification accuracy of 75%, which is a criterion for judging validity, was shown. This study suggested a new methodological alternative and direction for big data-based modeling research through celebrity-brand image similarity structure based on social network theory, and effect prediction modeling by machine learning.

Local Wind Field Simulation over Coastal Areas Using Windprofiler Data (윈드프로파일러 자료를 이용한 연안 지역 국지 바람장 모의)

  • Kim, Min-Seong;Kim, Kwang-Ho;Kim, Park-Sa;Kang, Dong-Hwan;Kwon, Byung Hyuk
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.22 no.2
    • /
    • pp.195-204
    • /
    • 2016
  • In this paper, the applicability and usefulness of windprofiler input data were investigated to generate three dimensional wind field. A logical diagnostic model CALMET with windprofiler data at ten sites and with weather forecasting model WRF output was evaluated by statistically comparing with the radiosonde data at eight sites. The horizontal wind speed from CALMET simulated with hourly windprofiler data is in good agreement with radiosonde observations within 1.5 m/s of the root mean square error, especially local circulation of wind such as sea breeze over the coastal region. The root mean square error of wind direction ranged $50^{\circ}{\sim}70^{\circ}$ is due to the wind direction error from the windprofiler polluted by ground clutters. Since the exact wind can be produced quickly and accurately in most of the altitude with windprofiler data on CALMET, we expect the method presented in this study to be useful for the monitoring of safe environment as well as weather in the coastal zone.

Facial Expression Animation which Applies a Motion Data in the Vector based Caricature (벡터 기반 캐리커처에 모션 데이터를 적용한 얼굴 표정 애니메이션)

  • Kim, Sung-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.5
    • /
    • pp.90-98
    • /
    • 2010
  • This paper describes methodology which enables user in order to generate facial expression animation of caricature which applies a facial motion data in the vector based caricature. This method which sees was embodied with the plug-in of illustrator. And It is equipping the user interface of separate way. The data which is used in experiment attaches 28 small-sized markers in important muscular part of the actor face and captured the multiple many expression which is various with Facial Tracker. The caricature was produced in the bezier curve form which has a respectively control point from location of the important marker which attaches in the face of the actor when motion capturing to connection with motion data and the region which is identical. The facial motion data compares in the caricature and the spatial scale went through a motion calibration process too because of size. And with the user letting the control did possibly at any time. In order connecting the caricature and the markers also, we did possibly with the click the corresponding region of the caricature, after the user selects each name of the face region from the menu. Finally, this paper used a user interface of illustrator and in order for the caricature facial expression animation generation which applies a facial motion data in the vector based caricature to be possible.

RSP-DS: Real Time Sequential Patterns Analysis in Data Streams (RSP-DS: 데이터 스트림에서의 실시간 순차 패턴 분석)

  • Shin Jae-Jyn;Kim Ho-Seok;Kim Kyoung-Bae;Bae Hae-Young
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.9
    • /
    • pp.1118-1130
    • /
    • 2006
  • Existed pattern analysis algorithms in data streams environment have researched performance improvement and effective memory usage. But when new data streams come, existed pattern analysis algorithms have to analyze patterns again and have to generate pattern tree again. This approach needs many calculations in real situation that needs real time pattern analysis. This paper proposes a method that continuously analyzes patterns of incoming data streams in real time. This method analyzes patterns fast, and thereafter obtains real time patterns by updating previously analyzed patterns. The incoming data streams are divided into several sequences based on time based window. Informations of the sequences are inputted into a hash table. When the number of the sequences are over predefined bound, patterns are analyzed from the hash table. The patterns form a pattern tree, and later created new patterns update the pattern tree. In this way, real time patterns are always maintained in the pattern tree. During pattern analysis, suffixes of both new pattern and existed pattern in the tree can be same. Then a pointer is created from the new pattern to the existed pattern. This method reduce calculation time during duplicated pattern analysis. And old patterns in the tree are deleted easily by FIFO method. The advantage of our algorithm is proved by performance comparison with existed method, MILE, in a condition that pattern is changed continuously. And we look around performance variation by changing several variable in the algorithm.

  • PDF