• Title/Summary/Keyword: datasets

Search Result 1,978, Processing Time 0.046 seconds

An Improved Deep Learning Method for Animal Images (동물 이미지를 위한 향상된 딥러닝 학습)

  • Wang, Guangxing;Shin, Seong-Yoon;Shin, Kwang-Weong;Lee, Hyun-Chang
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.01a
    • /
    • pp.123-124
    • /
    • 2019
  • This paper proposes an improved deep learning method based on small data sets for animal image classification. Firstly, we use a CNN to build a training model for small data sets, and use data augmentation to expand the data samples of the training set. Secondly, using the pre-trained network on large-scale datasets, such as VGG16, the bottleneck features in the small dataset are extracted and to be stored in two NumPy files as new training datasets and test datasets. Finally, training a fully connected network with the new datasets. In this paper, we use Kaggle famous Dogs vs Cats dataset as the experimental dataset, which is a two-category classification dataset.

  • PDF

Pattern mining for large distributed dataset: A parallel approach (PMLDD)

  • Pal, Amrit;Kumar, Manish
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.11
    • /
    • pp.5287-5303
    • /
    • 2018
  • Handling vast amount of data found in large transactional datasets is an obvious challenge for the conventional data mining algorithms. Addressing this challenge, our paper proposes a parallel approach for proper decomposition of mining problem into sub-problems in order to find frequent patterns from these datasets. The proposed, Pattern Mining for Large Distributed Dataset (PMLDD) approach, ensures minimum dependencies as well as minimum communications among sub-problems. It establishes a linear aggregation of the intermediate results so that it can be adapted to large-scale programming models like MapReduce. In this context, an algorithmic structure for MapReduce programming model is presented. PMLDD guarantees an efficient load balancing among the sub-problems by a specific selection criterion. Further, it optimizes the number of required iterations over the dataset for mining frequent patterns as compared to the existing approaches. Finally, we believe that our approach is scalable enough to handle larger datasets in terms of performance evaluation, and the result analysis justifies all these mentioned concerns.

Semantic Image Segmentation for Efficiently Adding Recognition Objects

  • Lu, Chengnan;Park, Jinho
    • Journal of Information Processing Systems
    • /
    • v.18 no.5
    • /
    • pp.701-710
    • /
    • 2022
  • With the development of artificial intelligence technology, various methods have been developed for recognizing objects in images using machine learning. Image segmentation is the most effective among these methods for recognizing objects within an image. Conventionally, image datasets of various classes are trained simultaneously. In situations where several classes require segmentation, all datasets have to be trained thoroughly. Such repeated training results in low training efficiency because most of the classes have already been trained. In addition, the number of classes that appear in the datasets affects training. Some classes appear in datasets in remarkably smaller numbers than others, and hence, the training errors will not be properly reflected when all the classes are trained simultaneously. Therefore, a new method that separates some classes from the dataset is proposed to improve efficiency during training. In addition, the accuracies of the conventional and proposed methods are compared.

Semantic Segmentation of Heterogeneous Unmanned Aerial Vehicle Datasets Using Combined Segmentation Network

  • Ahram, Song
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.1
    • /
    • pp.87-97
    • /
    • 2023
  • Unmanned aerial vehicles (UAVs) can capture high-resolution imagery from a variety of viewing angles and altitudes; they are generally limited to collecting images of small scenes from larger regions. To improve the utility of UAV-appropriated datasetsfor use with deep learning applications, multiple datasets created from variousregions under different conditions are needed. To demonstrate a powerful new method for integrating heterogeneous UAV datasets, this paper applies a combined segmentation network (CSN) to share UAVid and semantic drone dataset encoding blocks to learn their general features, whereas its decoding blocks are trained separately on each dataset. Experimental results show that our CSN improves the accuracy of specific classes (e.g., cars), which currently comprise a low ratio in both datasets. From this result, it is expected that the range of UAV dataset utilization will increase.

Three Reanalysis Data Comparison and Monsoon Regional Analysis of Apparent Heat Source and Moisture Sink (겉보기 열원 및 습기 흡원의 세 재분석 자료 비교와 몬순 지역별 분석)

  • Ha, Kyung-Ja;Kim, Seogyeong;Oh, Hyoeun;Moon, Suyeon
    • Atmosphere
    • /
    • v.28 no.4
    • /
    • pp.415-425
    • /
    • 2018
  • The roles of atmospheric heating formation and distribution on the global circulation are of utmost importance, and those are directly related to not only spatial but also temporal characteristics of monsoon system. In this study, before we clarify the characteristics of apparent heat source <$Q_1$> and moisture sink <$Q_2$>, comparisons of three reanalysis datasets (NCEP2, ERA-Interim, and JRA-55) in its global or regional patterns are performed to clearly evaluate differences among datasets. Considering inter-hemispheric difference of global monsoon regions, seasonal means of June-July-August and December-January-February, which is summer (winter) and winter (summer) in the Northern (Southern) Hemisphere are employed respectively. Here we show the characteristics of eight different regional monsoon regions and find contributions of <$Q_2$> to <$Q_1$> for the regional monsoon regions. Each term in apparent heat source and moisture sink is shown to come from the ERA-Interim dataset, since the ERA-Interim could be representative of three datasets. The NCEP2 data has a different characteristic in the ratio of <$Q_2$> and <$Q_1$> because it overestimates <$Q_1$> compared to the other two different datasets. The Australia monsoon has been performing better over time, while some regional monsoons (South America, North America, and North Africa) have been showing increasing data inconsistency. In addition, the three reanalysis datasets are getting different marching with time, in particular since the early 2000s over South America, North America, and North Africa monsoon regions. The recent inconsistency among the three datasets that may be associated with the global warming hiatus remains unexplored.

Semi-automatic Data Fusion Method for Spatial Datasets (공간 정보를 가지는 데이터셋의 준자동 융합 기법)

  • Yoon, Jong-chan;Kim, Han-joon
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.4
    • /
    • pp.1-13
    • /
    • 2021
  • With the development of big data-related technologies, it has become possible to process vast amounts of data that could not be processed before. Accordingly, the establishment of an automated data selection and fusion process for the realization of big data-based services has become a necessity, not an option. In this paper, we propose an automation technique to create meaningful new information by fusing datasets containing spatial information. Firstly, the given datasets are embedded by using the Node2Vec model and the keywords of each dataset. Then, the semantic similarities among all of datasets are obtained by calculating the cosine similarity for the embedding vector of each pair of datasets. In addition, a person intervenes to select some candidate datasets with one or more spatial identifiers from among dataset pairs with a relatively higher similarity, and fuses the dataset pairs to visualize them. Through such semi-automatic data fusion processes, we show that significant fused information that cannot be obtained with a single dataset can be generated.

Deep learning based crack detection from tunnel cement concrete lining (딥러닝 기반 터널 콘크리트 라이닝 균열 탐지)

  • Bae, Soohyeon;Ham, Sangwoo;Lee, Impyeong;Lee, Gyu-Phil;Kim, Donggyou
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.24 no.6
    • /
    • pp.583-598
    • /
    • 2022
  • As human-based tunnel inspections are affected by the subjective judgment of the inspector, making continuous history management difficult. There is a lot of deep learning-based automatic crack detection research recently. However, the large public crack datasets used in most studies differ significantly from those in tunnels. Also, additional work is required to build sophisticated crack labels in current tunnel evaluation. Therefore, we present a method to improve crack detection performance by inputting existing datasets into a deep learning model. We evaluate and compare the performance of deep learning models trained by combining existing tunnel datasets, high-quality tunnel datasets, and public crack datasets. As a result, DeepLabv3+ with Cross-Entropy loss function performed best when trained on both public datasets, patchwise classification, and oversampled tunnel datasets. In the future, we expect to contribute to establishing a plan to efficiently utilize the tunnel image acquisition system's data for deep learning model learning.

An Auto-Labeling based Smart Image Annotation System (자동-레이블링 기반 영상 학습데이터 제작 시스템)

  • Lee, Ryong;Jang, Rae-young;Park, Min-woo;Lee, Gunwoo;Choi, Myung-Seok
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.6
    • /
    • pp.701-715
    • /
    • 2021
  • The drastic advance of recent deep learning technologies is heavily dependent on training datasets which are essential to train models by themselves with less human efforts. In comparison with the work to design deep learning models, preparing datasets is a long haul; at the moment, in the domain of vision intelligent, datasets are still being made by handwork requiring a lot of time and efforts, where workers need to directly make labels on each image usually with GUI-based labeling tools. In this paper, we overview the current status of vision datasets focusing on what datasets are being shared and how they are prepared with various labeling tools. Particularly, in order to relieve the repetitive and tiring labeling work, we present an interactive smart image annotating system with which the annotation work can be transformed from the direct human-only manual labeling to a correction-after-checking by means of a support of automatic labeling. In an experiment, we show that automatic labeling can greatly improve the productivity of datasets especially reducing time and efforts to specify regions of objects found in images. Finally, we discuss critical issues that we faced in the experiment to our annotation system and describe future work to raise the productivity of image datasets creation for accelerating AI technology.

Trusted Fog Based Mashup Service for Multimedia IoT based Smart Environmental Monitoring

  • Elmisery, Ahmed M.;Sertovic, Mirela
    • Journal of Multimedia Information System
    • /
    • v.4 no.4
    • /
    • pp.171-178
    • /
    • 2017
  • Data mashup is a web technology that combines information from multiple sources into a single web application. Mashup applications create a new horizon for new services, like environmental monitoring. Environmental monitoring is a serious tool for the state and private organizations, which are located in regions with environmental hazards and seek to gain insights to detect hazards and locate them clearly. These organizations utilize a data mashup to merge datasets from different Internet of multimedia things (IoMT) context-based services in order to leverage its data analytics performance and the accuracy of the predictions. However, mashup different datasets from multiple sources is a privacy hazard as it might reveal citizens specific behaviors in different regions. The ability to preserve privacy in mashuped datasets and at the same time provide accurate insights becomes a key success for the spread of mashup services. In this paper, we present our efforts to build a fog-based middleware for private data mashup (FMPM) to serve a centralized environmental monitoring service. The proposed middleware is equipped with concealment mechanisms to preserve the privacy of the merged datasets from multiple IoMT networks involved in the mashup application. Also, these mechanisms preserve the aggregates in the dataset to maximize the usability of information to attain accurate analytical results. We also provide a scenario for IoMT-enabled data mashup service and experimentation results.

Accuracy Assessment of Global Land Cover Datasets in South Korea

  • Son, Sanghun;Kim, Jinsoo
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.4
    • /
    • pp.601-610
    • /
    • 2018
  • The national accuracy of global land cover (GLC) products is of great importance to ecological and environmental research. However, GLC products that are derived from different satellite sensors, with differing spatial resolutions, classification methods, and classification schemes are certain to show some discrepancies. The goal of this study is to assess the accuracy of four commonly used GLC datasets in South Korea, GLC2000, GlobCover2009, MCD12Q1, and GlobeLand30. First, we compared the area of seven classes between four GLC datasets and a reference dataset. Then, we calculated the accuracy of the four GLC datasets based on an aggregated classification scheme containing seven classes, using overall, producer's and user's accuracies, and kappa coefficient. GlobeLand30 had the highest overall accuracy (77.59%). The overall accuracies of MCD12Q1, GLC2000, and GlobCover2009 were 75.51%, 68.38%, and 57.99%, respectively. These results indicate that GlobeLand30 is the most suitable dataset to support a variety of national scientific endeavors in South Korea.