• Title/Summary/Keyword: Research dataset

Search Result 1,350, Processing Time 0.026 seconds

Effects of the External Knowledge Search and Utilization Activities of SMEs on Market Expansion (중소기업의 외부지식 탐색·활용 정도가 신규시장 확대에 미치는 영향)

  • Jung, Jee-Young;Roh, Tae-Woo;Han, Yoo-Jin
    • Knowledge Management Research
    • /
    • v.16 no.1
    • /
    • pp.243-254
    • /
    • 2015
  • To increase their market shares and grow continuously, it is very important for small and medium-sized enterprises (SMEs) to expand their markets. Although various factors may influence an SME's effort to cultivate a new market, this research focused on activities related to the search and utilization of external knowledge. After conducting Tobit analysis based on the dataset of 959 Korean SMEs included in the 2010 Korean Innovation Survey, we found that external knowledge search and utilization activities positively affect the market expansion of SMEs. This result has two implications: (1) SMEs should actively search for appropriate external knowledge sources with which they can expand their markets and reduce their dependence on internal R&D activities; and (2) they should implement an efficient corporate system to effectively absorb and utilize external knowledge inside the firms. Despite these contributions, this research has its shortcoming in that it utilized a cross-sectional dataset, which can be further analyzed by incorporating the dataset from previous and future periods.

S2-Net: Machine reading comprehension with SRU-based self-matching networks

  • Park, Cheoneum;Lee, Changki;Hong, Lynn;Hwang, Yigyu;Yoo, Taejoon;Jang, Jaeyong;Hong, Yunki;Bae, Kyung-Hoon;Kim, Hyun-Ki
    • ETRI Journal
    • /
    • v.41 no.3
    • /
    • pp.371-382
    • /
    • 2019
  • Machine reading comprehension is the task of understanding a given context and finding the correct response in that context. A simple recurrent unit (SRU) is a model that solves the vanishing gradient problem in a recurrent neural network (RNN) using a neural gate, such as a gated recurrent unit (GRU) and long short-term memory (LSTM); moreover, it removes the previous hidden state from the input gate to improve the speed compared to GRU and LSTM. A self-matching network, used in R-Net, can have a similar effect to coreference resolution because the self-matching network can obtain context information of a similar meaning by calculating the attention weight for its own RNN sequence. In this paper, we construct a dataset for Korean machine reading comprehension and propose an $S^2-Net$ model that adds a self-matching layer to an encoder RNN using multilayer SRU. The experimental results show that the proposed $S^2-Net$ model has performance of single 68.82% EM and 81.25% F1, and ensemble 70.81% EM, 82.48% F1 in the Korean machine reading comprehension test dataset, and has single 71.30% EM and 80.37% F1 and ensemble 73.29% EM and 81.54% F1 performance in the SQuAD dev dataset.

KOMUChat: Korean Online Community Dialogue Dataset for AI Learning (KOMUChat : 인공지능 학습을 위한 온라인 커뮤니티 대화 데이터셋 연구)

  • YongSang Yoo;MinHwa Jung;SeungMin Lee;Min Song
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.219-240
    • /
    • 2023
  • Conversational AI which allows users to interact with satisfaction is a long-standing research topic. To develop conversational AI, it is necessary to build training data that reflects real conversations between people, but current Korean datasets are not in question-answer format or use honorifics, making it difficult for users to feel closeness. In this paper, we propose a conversation dataset (KOMUChat) consisting of 30,767 question-answer sentence pairs collected from online communities. The question-answer pairs were collected from post titles and first comments of love and relationship counsel boards used by men and women. In addition, we removed abuse records through automatic and manual cleansing to build high quality dataset. To verify the validity of KOMUChat, we compared and analyzed the result of generative language model learning KOMUChat and benchmark dataset. The results showed that our dataset outperformed the benchmark dataset in terms of answer appropriateness, user satisfaction, and fulfillment of conversational AI goals. The dataset is the largest open-source single turn text data presented so far and it has the significance of building a more friendly Korean dataset by reflecting the text styles of the online community.

Derivation of Driving Stability Indicators for Autonomous Vehicles Based on Analyzing Waymo Open Dataset (Waymo Open Dataset 기반 자율차의 주행행태분석을 통한 주행안정성 평가지표 도출)

  • Hoyoon Lee;Jeonghoon Jee;Cheol Oh;Hoseon Kim
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.23 no.4
    • /
    • pp.94-109
    • /
    • 2024
  • As autonomous vehicles are allowed to drive on public roads, there is an increasing amount of on-road data available for research. It has therefore become possible to analyze impacts of autonomous vehicles on traffic safety using real-world data. It is necessary to use indicators that are well-representative of the driving behavior of autonomous vehicles to understand the implications of them on traffic safety. This study aims to derive indicators that effectively reflect the driving stability of autonomous vehicles by analyzing the driving behavior using the Waymo Open Dataset. Principal component analysis was adopted to derive indicators with high explanatory capability for the dataset. Driving stability indicators were separated into longitudinal and lateral ones. The road segments on the dataset were divided into four based on the characteristics of each, which were signalized and unsignalized intersections, tangent road section, and curved road section. The longitudinal driving stability was 35.48% higher in the curved road sections compared to the unsignalized intersections. With regard to the lateral driving stability, the driving stability was 76.08% higher in the signalized intersections than in the unsignalized intersections. The comparison between curved and tangent road segments showed that tangent roads are 146.87% higher regarding lateral driving stability. The results of this study are valuable for the further research to analyze the impact of autonomous vehicles on traffic safety using real-world data.

A comparison of deep-learning models to the forecast of the daily solar flare occurrence using various solar images

  • Shin, Seulki;Moon, Yong-Jae;Chu, Hyoungseok
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.42 no.2
    • /
    • pp.61.1-61.1
    • /
    • 2017
  • As the application of deep-learning methods has been succeeded in various fields, they have a high potential to be applied to space weather forecasting. Convolutional neural network, one of deep learning methods, is specialized in image recognition. In this study, we apply the AlexNet architecture, which is a winner of Imagenet Large Scale Virtual Recognition Challenge (ILSVRC) 2012, to the forecast of daily solar flare occurrence using the MatConvNet software of MATLAB. Our input images are SOHO/MDI, EIT $195{\AA}$, and $304{\AA}$ from January 1996 to December 2010, and output ones are yes or no of flare occurrence. We consider other input images which consist of last two images and their difference image. We select training dataset from Jan 1996 to Dec 2000 and from Jan 2003 to Dec 2008. Testing dataset is chosen from Jan 2001 to Dec 2002 and from Jan 2009 to Dec 2010 in order to consider the solar cycle effect. In training dataset, we randomly select one fifth of training data for validation dataset to avoid the over-fitting problem. Our model successfully forecasts the flare occurrence with about 0.90 probability of detection (POD) for common flares (C-, M-, and X-class). While POD of major flares (M- and X-class) forecasting is 0.96, false alarm rate (FAR) also scores relatively high(0.60). We also present several statistical parameters such as critical success index (CSI) and true skill statistics (TSS). All statistical parameters do not strongly depend on the number of input data sets. Our model can immediately be applied to automatic forecasting service when image data are available.

  • PDF

Applicability Analysis of FAO56 Penman-Monteith Methodology for Estimating Potential Evapotranspiration in Andong Dam Watershed Using Limited Meteorological Data (제한적인 기상자료 조건에서의 잠재증발산량 추정을 위한 FAO56 Penman-Monteith 방법의 적용성 분석 - 안동댐 유역을 사례로 -)

  • Kim, Sea Jin;Kim, Moon-il;Lim, Chul-Hee;Lee, Woo-Kyun;Kim, Baek-Jo
    • Journal of Climate Change Research
    • /
    • v.8 no.2
    • /
    • pp.125-143
    • /
    • 2017
  • This study is conducted to estimate potential evapotranspiration of 10 weather observing systems in Andong Dam watershed with FAO56 Penman-Monteith (FAO56 PM) methodology using the meteorological data from 2013 to 2014. Also, assuming that there is no solar radiation data, humidity data or wind speed data, the potential evapotranspiration was estimated by FAO56 PM and the results were evaluated to discuss whether the methodology is applicable when meteorological dataset is not available. Then, the potential evapotranspiration was estimated with Hargreaves method and compared with the potential evapotranspiration estimated by FAO56 PM only with the temperature dataset. As to compare the potential evapotranspiration estimated from the complete meteorological dataset and that estimated from limited dataset, statistical analysis was performed using the Root Mean Square Error (RMSE), the Mean Bias Error (MBE), the Mean Absolute Error (MAE) and the coefficient of determination ($R^2$). Also the Inverse Distance Weighted (IDW) method was performed to conduct spatial analysis. From the result, even when the meteorological data is limited, FAO56 PM showed relatively high accuracy in calculating potential evapotranspiration by estimating the meteorological data.

Investigation of Research Topic and Trends of National ICT Research-Development Using the LDA Model (LDA 토픽모델링을 통한 ICT분야 국가연구개발사업의 주요 연구토픽 및 동향 탐색)

  • Woo, Chang Woo;Lee, Jong Yun
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.7
    • /
    • pp.9-18
    • /
    • 2020
  • The research objectives investigates main research topics and trends in the information and communication technology(ICT) field, Korea using LDA(Latent Dirichlet Allocation), one of the topic modeling techniques. The experimental dataset of ICT research and development(R&D) project of 5,200 was acquired through matching with the EZone system of IITP after downloading R&D project dataset from NTIS(National Science and Technology Information Service) during recent five years. Consequently, our finding was that the majority research topics were found as intelligent information technologies such as AI, big data, and IoT, and the main research trends was hyper realistic media. Finally, it is expected that the research results of topic modeling on the national R&D foundation dataset become the powerful information about establishment of planning and strategy of future's research and development in the ICT field.

Statistical Blade Angular Velocity Information-based Wind Turbine Fault Diagnosis Monitoring System (블레이드 각속도 통계 정보 기반 풍력 발전기 고장 진단 모니터링 시스템)

  • Kim, Byoungjin;Kang, Suk-Ju;Park, Joon-Young
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.2 no.4
    • /
    • pp.619-625
    • /
    • 2016
  • In this paper, we propose a new fault diagnosis monitoring system using gyro sensor-based angular velocity calculation for blades of the wind turbine system. First, the proposed system generates the angular velocity dataset for the rotation speed of the normal blade. Using the dataset, we estimate and evaluate the state of blades for the wind turbine by comparing the current state with the pre-calculated normal state. In the experimental results, the angular velocity of the normal state was higher than $360^{\circ}/s$ while that of the damaged blades was lower than $360^{\circ}/s$ and the standard deviation of the angular velocity was significantly increased.

Construction of a Video Dataset for Face Tracking Benchmarking Using a Ground Truth Generation Tool

  • Do, Luu Ngoc;Yang, Hyung Jeong;Kim, Soo Hyung;Lee, Guee Sang;Na, In Seop;Kim, Sun Hee
    • International Journal of Contents
    • /
    • v.10 no.1
    • /
    • pp.1-11
    • /
    • 2014
  • In the current generation of smart mobile devices, object tracking is one of the most important research topics for computer vision. Because human face tracking can be widely used for many applications, collecting a dataset of face videos is necessary for evaluating the performance of a tracker and for comparing different approaches. Unfortunately, the well-known benchmark datasets of face videos are not sufficiently diverse. As a result, it is difficult to compare the accuracy between different tracking algorithms in various conditions, namely illumination, background complexity, and subject movement. In this paper, we propose a new dataset that includes 91 face video clips that were recorded in different conditions. We also provide a semi-automatic ground-truth generation tool that can easily be used to evaluate the performance of face tracking systems. This tool helps to maintain the consistency of the definitions for the ground-truth in each frame. The resulting video data set is used to evaluate well-known approaches and test their efficiency.

I-QANet: Improved Machine Reading Comprehension using Graph Convolutional Networks (I-QANet: 그래프 컨볼루션 네트워크를 활용한 향상된 기계독해)

  • Kim, Jeong-Hoon;Kim, Jun-Yeong;Park, Jun;Park, Sung-Wook;Jung, Se-Hoon;Sim, Chun-Bo
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.11
    • /
    • pp.1643-1652
    • /
    • 2022
  • Most of the existing machine reading research has used Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) algorithms as networks. Among them, RNN was slow in training, and Question Answering Network (QANet) was announced to improve training speed. QANet is a model composed of CNN and self-attention. CNN extracts semantic and syntactic information well from the local corpus, but there is a limit to extracting the corresponding information from the global corpus. Graph Convolutional Networks (GCN) extracts semantic and syntactic information relatively well from the global corpus. In this paper, to take advantage of this strength of GCN, we propose I-QANet, which changed the CNN of QANet to GCN. The proposed model performed 1.2 times faster than the baseline in the Stanford Question Answering Dataset (SQuAD) dataset and showed 0.2% higher performance in Exact Match (EM) and 0.7% higher in F1. Furthermore, in the Korean Question Answering Dataset (KorQuAD) dataset consisting only of Korean, the learning time was 1.1 times faster than the baseline, and the EM and F1 performance were also 0.9% and 0.7% higher, respectively.