• Title/Summary/Keyword: Research dataset

Search Result 1,350, Processing Time 0.026 seconds

Construction and Effectiveness Evaluation of Multi Camera Dataset Specialized for Autonomous Driving in Domestic Road Environment (국내 도로 환경에 특화된 자율주행을 위한 멀티카메라 데이터 셋 구축 및 유효성 검증)

  • Lee, Jin-Hee;Lee, Jae-Keun;Park, Jaehyeong;Kim, Je-Seok;Kwon, Soon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.5
    • /
    • pp.273-280
    • /
    • 2022
  • Along with the advancement of deep learning technology, securing high-quality dataset for verification of developed technology is emerging as an important issue, and developing robust deep learning models to the domestic road environment is focused by many research groups. Especially, unlike expressways and automobile-only roads, in the complex city driving environment, various dynamic objects such as motorbikes, electric kickboards, large buses/truck, freight cars, pedestrians, and traffic lights are mixed in city road. In this paper, we built our dataset through multi camera-based processing (collection, refinement, and annotation) including the various objects in the city road and estimated quality and validity of our dataset by using YOLO-based model in object detection. Then, quantitative evaluation of our dataset is performed by comparing with the public dataset and qualitative evaluation of it is performed by comparing with experiment results using open platform. We generated our 2D dataset based on annotation rules of KITTI/COCO dataset, and compared the performance with the public dataset using the evaluation rules of KITTI/COCO dataset. As a result of comparison with public dataset, our dataset shows about 3 to 53% higher performance and thus the effectiveness of our dataset was validated.

KMSAV: Korean multi-speaker spontaneous audiovisual dataset

  • Kiyoung Park;Changhan Oh;Sunghee Dong
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.71-81
    • /
    • 2024
  • Recent advances in deep learning for speech and visual recognition have accelerated the development of multimodal speech recognition, yielding many innovative results. We introduce a Korean audiovisual speech recognition corpus. This dataset comprises approximately 150 h of manually transcribed and annotated audiovisual data supplemented with additional 2000 h of untranscribed videos collected from YouTube under the Creative Commons License. The dataset is intended to be freely accessible for unrestricted research purposes. Along with the corpus, we propose an open-source framework for automatic speech recognition (ASR) and audiovisual speech recognition (AVSR). We validate the effectiveness of the corpus with evaluations using state-of-the-art ASR and AVSR techniques, capitalizing on both pretrained models and fine-tuning processes. After fine-tuning, ASR and AVSR achieve character error rates of 11.1% and 18.9%, respectively. This error difference highlights the need for improvement in AVSR techniques. We expect that our corpus will be an instrumental resource to support improvements in AVSR.

Towards Texture-Based Visualization of Multivariate Dataset

  • Mehmood, Raja Majid;Lee, Hyo Jong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.582-585
    • /
    • 2014
  • Visualization is a science which makes the invisible to visible through the techniques of experimental visualization and computer-aided visualization. This paper presents the practical aspects of visualization of multivariate dataset. In this paper, we will briefly discuss a previous research work and introduce a new visualization technique which will help us to design and develop a visualization tool for experimental visualization of multivariate dataset. Our newly developed visualization tool can be used in various domains. In this paper, we have chosen a software industry as an application domain and we used the multivariate dataset of software components computed by VizzMaintenance. VizzMaintenance is software analysis tool which give us multiple software metrics of open source Java based programs. Main objective of this research is to develop a new visualization tool for large multivariate dataset which will be more efficient and easy to perceive by viewer. Perception is very important for our research work and we have decided to test the perception level of our proposed visualization approach by researchers of our research lab.

Development of Korean Medicine Data Center(KDC) Teaching Dataset to Enhance Utilization of KDC (한의임상정보은행 활용도 제고를 위한 교육용 데이터 개발)

  • Baek, Younghwa;Lee, Siwoo
    • Journal of Sasang Constitutional Medicine
    • /
    • v.29 no.3
    • /
    • pp.242-247
    • /
    • 2017
  • Objective Korean medicine Data Center (KDC) has established large-scale biological and clinical data based on Korean medicine to demonstrate and validate its theory. The aim of this study was to develop KDC teaching dataset and user guideline to improve utilization of the KDC. Method KDC teaching dataset were selected using stratified random sampling according to the Sasang constitution (SC). This dataset included 72 variables of 500 sample subjects. The user guideline described how to conducted eight statistical analysis methods using the teaching dataset. Results The KDC teaching dataset was sampled from 200(40%) Taeeumin, 125(25%) Soeumin, and 175(35%) Soyanain. It was consisted of questionnaire (basic, habit, disease, symptom), physical exam (body measurement, blood pressure), blood exam, and expert' SC diagnosis. The usage guidelines provided instruction for users to perform several statistical analysis step by step with KDC teaching dataset. Conclusion We hope that our results will contribute to enhancing KDC utilization and understanding.

Real-world multimodal lifelog dataset for human behavior study

  • Chung, Seungeun;Jeong, Chi Yoon;Lim, Jeong Mook;Lim, Jiyoun;Noh, Kyoung Ju;Kim, Gague;Jeong, Hyuntae
    • ETRI Journal
    • /
    • v.44 no.3
    • /
    • pp.426-437
    • /
    • 2022
  • To understand the multilateral characteristics of human behavior and physiological markers related to physical, emotional, and environmental states, extensive lifelog data collection in a real-world environment is essential. Here, we propose a data collection method using multimodal mobile sensing and present a long-term dataset from 22 subjects and 616 days of experimental sessions. The dataset contains over 10 000 hours of data, including physiological, data such as photoplethysmography, electrodermal activity, and skin temperature in addition to the multivariate behavioral data. Furthermore, it consists of 10 372 user labels with emotional states and 590 days of sleep quality data. To demonstrate feasibility, human activity recognition was applied on the sensor data using a convolutional neural network-based deep learning model with 92.78% recognition accuracy. From the activity recognition result, we extracted the daily behavior pattern and discovered five representative models by applying spectral clustering. This demonstrates that the dataset contributed toward understanding human behavior using multimodal data accumulated throughout daily lives under natural conditions.

Generation of Super-Resolution Benchmark Dataset for Compact Advanced Satellite 500 Imagery and Proof of Concept Results

  • Yonghyun Kim;Jisang Park;Daesub Yoon
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.4
    • /
    • pp.459-466
    • /
    • 2023
  • In the last decade, artificial intelligence's dramatic advancement with the development of various deep learning techniques has significantly contributed to remote sensing fields and satellite image applications. Among many prominent areas, super-resolution research has seen substantial growth with the release of several benchmark datasets and the rise of generative adversarial network-based studies. However, most previously published remote sensing benchmark datasets represent spatial resolution within approximately 10 meters, imposing limitations when directly applying for super-resolution of small objects with cm unit spatial resolution. Furthermore, if the dataset lacks a global spatial distribution and is specialized in particular land covers, the consequent lack of feature diversity can directly impact the quantitative performance and prevent the formation of robust foundation models. To overcome these issues, this paper proposes a method to generate benchmark datasets by simulating the modulation transfer functions of the sensor. The proposed approach leverages the simulation method with a solid theoretical foundation, notably recognized in image fusion. Additionally, the generated benchmark dataset is applied to state-of-the-art super-resolution base models for quantitative and visual analysis and discusses the shortcomings of the existing datasets. Through these efforts, we anticipate that the proposed benchmark dataset will facilitate various super-resolution research shortly in Korea.

ETLi: Efficiently annotated traffic LiDAR dataset using incremental and suggestive annotation

  • Kang, Jungyu;Han, Seung-Jun;Kim, Nahyeon;Min, Kyoung-Wook
    • ETRI Journal
    • /
    • v.43 no.4
    • /
    • pp.630-639
    • /
    • 2021
  • Autonomous driving requires a computerized perception of the environment for safety and machine-learning evaluation. Recognizing semantic information is difficult, as the objective is to instantly recognize and distinguish items in the environment. Training a model with real-time semantic capability and high reliability requires extensive and specialized datasets. However, generalized datasets are unavailable and are typically difficult to construct for specific tasks. Hence, a light detection and ranging semantic dataset suitable for semantic simultaneous localization and mapping and specialized for autonomous driving is proposed. This dataset is provided in a form that can be easily used by users familiar with existing two-dimensional image datasets, and it contains various weather and light conditions collected from a complex and diverse practical setting. An incremental and suggestive annotation routine is proposed to improve annotation efficiency. A model is trained to simultaneously predict segmentation labels and suggest class-representative frames. Experimental results demonstrate that the proposed algorithm yields a more efficient dataset than uniformly sampled datasets.

Comparative Assessment of Typical Year Dataset based on POA Irradiance (태양광 패널 일사량에 기반한 대표연도 데이터 비교 평가)

  • Changyeol Yun;Boyoung Kim;Changki Kim;Hyungoo Kim;Yongheack Kang;Yongil Kim
    • New & Renewable Energy
    • /
    • v.20 no.1
    • /
    • pp.102-109
    • /
    • 2024
  • The Typical Meteorological Year (TMY) dataset compiles 12 months of data that best represent long-term climate patterns, focusing on global horizontal irradiance and other weather-related variables. However, the irradiance measured on the plane of the array (POA) shows certain distinct distribution characteristics compared with the irradiance in the TMY dataset, and this may introduce some biases. Our research recalculated POA irradiance using both the Isotropic and DIRINT models, generating an updated dataset that was tailored to POA characteristics. Our analysis showed a 28% change in the selection of typical meteorological months, an 8% increase in average irradiance, and a 40% reduction in the range of irradiance values, thus indicating a significant shift in irradiance distribution patterns. This research aims to inform stakeholders about accurate use of TMY datasets in potential decision-making. These findings underscore the necessity of creating a typical dataset by using the time series of POA irradiance, which represents the orientation in which PV panels will be deployed.

Knowledge Model for Disaster Dataset Navigation

  • Hwang, Yun-Young;Yuk, Jin-Hee;Shin, Sumi
    • Journal of Information Science Theory and Practice
    • /
    • v.9 no.4
    • /
    • pp.35-49
    • /
    • 2021
  • In a situation where there are multiple diverse datasets, it is essential to have an efficient method to provide users with the datasets they require. To address this suggestion, necessary datasets should be selected on the basis of the relationships between the datasets. In particular, in order to discover the necessary datasets for disaster resolution, we need to consider the disaster resolution stage. In this paper, in order to provide the necessary datasets for each stage of disaster resolution, we constructed a disaster type and disaster management process ontology and designed a method to determine the necessary datasets for each disaster type and disaster management process step. In addition, we introduce a method to determine relationships between datasets necessary for disaster response. We propose a method for discovering datasets based on minimal relationships such as "isA," "sameAs," and "subclassOf." To discover suitable datasets, we designed a knowledge exploration model and collected 651 disaster-related datasets for improving our method. These datasets were categorized by disaster type from the perspective of disaster management. Categorizing actual datasets into disaster types and disaster management types allows a single dataset to be classified as multiple types in both categories. We built a knowledge exploration model on the basis of disaster examples to ensure the configuration of our model.

Derivation of Typical Meteorological Year of Daejeon from Satellite-Based Solar Irradiance (위성영상 기반 일사량을 활용한 대전지역 표준기상년 데이터 생산)

  • Kim, Chang Ki;Kim, Shin-Young;Kim, Hyun-Goo;Kang, Yong-Heack;Yun, Chang-Yeol
    • Journal of the Korean Solar Energy Society
    • /
    • v.38 no.6
    • /
    • pp.27-36
    • /
    • 2018
  • Typical Meteorological Year Dataset is necessary for the renewable energy feasibility study. Since National Renewable Energy Laboratory has been built Typical Meteorological Year Dataset in 1978, gridded datasets taken from numerical weather prediction or satellite imagery are employed to produce Typical Meteorological Year Dataset. In general, Typical Meteorological Year Dataset is generated by using long-term in-situ observations. However, solar insolation is not usually measured at synoptic observing stations and therefore it is limited to build the Typical Meteorological Year Dataset with only in-situ observation. This study attempts to build the Typical Meteorological Year Dataset with satellite derived solar insolation as an alternative and then we evaluate the Typical Meteorological Year Dataset made by using satellite derived solar irradiance at Daejeon ground station. The solar irradiance is underestimated when satellite imagery is employed.