• Title/Summary/Keyword: Large Data Set

Search Result 1,054, Processing Time 0.031 seconds

Detecting Jaywalking Using the YOLOv5 Model

  • Kim, Hyun-Tae;Lee, Sang-Hyun
    • International Journal of Advanced Culture Technology
    • /
    • v.10 no.2
    • /
    • pp.300-306
    • /
    • 2022
  • Currently, Korea is building traffic infrastructure using Intelligent Transport Systems (ITS), but the pedestrian traffic accident rate is very high. The purpose of this paper is to prevent the risk of traffic accidents by jaywalking pedestrians. The development of this study aims to detect pedestrians who trespass using the public data set provided by the Artificial Intelligence Hub (AIHub). The data set uses training data: 673,150 pieces and validation data: 131,385 pieces, and the types include snow, rain, fog, etc., and there is a total of 7 types including passenger cars, small buses, large buses, trucks, large trailers, motorcycles, and pedestrians. has a class format of Learning is carried out using YOLOv5 as an implementation model, and as an object detection and edge detection method of an input image, a canny edge model is applied to classify and visualize human objects within the detected road boundary range. In this study, it was designed and implemented to detect pedestrians using the deep learning-based YOLOv5 model. As the final result, the mAP 0.5 showed a real-time detection rate of 61% and 114.9 fps at 338 epochs using the YOLOv5 model.

A Walsh-Based Distributed Associative Memory with Genetic Algorithm Maximization of Storage Capacity for Face Recognition

  • Kim, Kyung-A;Oh, Se-Young
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.640-643
    • /
    • 2003
  • A Walsh function based associative memory is capable of storing m patterns in a single pattern storage space with Walsh encoding of each pattern. Furthermore, each stored pattern can be matched against the stored patterns extremely fast using algorithmic parallel processing. As such, this special type of memory is ideal for real-time processing of large scale information. However this incredible efficiency generates large amount of crosstalk between stored patterns that incurs mis-recognition. This crosstalk is a function of the set of different sequencies [number of zero crossings] of the Walsh function associated with each pattern to be stored. This sequency set is thus optimized in this paper to minimize mis-recognition, as well as to maximize memory saying. In this paper, this Walsh memory has been applied to the problem of face recognition, where PCA is applied to dimensionality reduction. The maximum Walsh spectral component and genetic algorithm (GA) are applied to determine the optimal Walsh function set to be associated with the data to be stored. The experimental results indicate that the proposed methods provide a novel and robust technology to achieve an error-free, real-time, and memory-saving recognition of large scale patterns.

  • PDF

Environmental Consciousness Data Modeling by Association Rules

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2004.10a
    • /
    • pp.115-124
    • /
    • 2004
  • Data mining is the method to find useful information for large amounts of data in database. It is used to find hidden knowledge by massive data, unexpectedly pattern, relation to new rule. The methods of data mining are association rules, decision tree, clustering, neural network and so on. Association rule mining searches for interesting relationships among items in a given large data set. Association rules are frequently used by retail stores to assist in marketing, advertising, floor placement, and inventory control. There are three primary quality measures for association rule, support and confidence and lift. We analyze Gyeongnam social indicator survey data using association rule technique for environmental information discovery. We can use to environmental preservation and environmental improvement by association rule outputs.

  • PDF

Dynamic Replication Based on Availability and Popularity in the Presence of Failures

  • Meroufel, Bakhta;Belalem, Ghalem
    • Journal of Information Processing Systems
    • /
    • v.8 no.2
    • /
    • pp.263-278
    • /
    • 2012
  • The data grid provides geographically distributed resources for large-scale applications. It generates a large set of data. The replication of this data in several sites of the grid is an effective solution for achieving good performance. In this paper we propose an approach of dynamic replication in a hierarchical grid that takes into account crash failures in the system. The replication decision is taken based on two parameters: the availability and popularity of the data. The administrator requires a minimum rate of availability for each piece of data according to its access history in previous periods, but this availability may increase if the demand is high on this data. We also proposed a strategy to keep the desired availability respected even in case of a failure or rarity (no-popularity) of the data. The simulation results show the effectiveness of our replication strategy in terms of response time, the unavailability of requests, and availability.

Incremental Multi-classification by Least Squares Support Vector Machine

  • Oh, Kwang-Sik;Shim, Joo-Yong;Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.4
    • /
    • pp.965-974
    • /
    • 2003
  • In this paper we propose an incremental classification of multi-class data set by LS-SVM. By encoding the output variable in the training data set appropriately, we obtain a new specific output vectors for the training data sets. Then, online LS-SVM is applied on each newly encoded output vectors. Proposed method will enable the computation cost to be reduced and the training to be performed incrementally. With the incremental formulation of an inverse matrix, the current information and new input data are used for building another new inverse matrix for the estimation of the optimal bias and lagrange multipliers. Computational difficulties of large scale matrix inversion can be avoided. Performance of proposed method are shown via numerical studies and compared with artificial neural network.

  • PDF

A Study on Partial Pattern Estimation for Sequential Agglomerative Hierarchical Nested Model (SAHN 모델의 부분적 패턴 추정 방법에 대한 연구)

  • Jang, Kyung-Won;Ahn, Tae-Chon
    • Proceedings of the KIEE Conference
    • /
    • 2005.10b
    • /
    • pp.143-145
    • /
    • 2005
  • In this paper, an empirical study result on pattern estimation method is devoted to reveal underlying data patterns with a relatively reduced computational cost. Presented method performs crisp type clustering with given n number of data samples by means of the sequential agglomerative hierarchical nested model (SAHN). Conventional SAHN based clustering requires large computation time in the initial step of algorithm. To deal with this concern, we modified overall process with a partial approach. In the beginning of this method, we divide given data set to several sub groups with uniform sampling and then each divided sub data group is applied to SAHN based method. The advantage of this method reduces computation time of original process and gives similar results. Proposed is applied to several test data set and simulation result with conceptual analysis is presented.

  • PDF

An Experimental Study on the Thermal Performance Measurement of Large Diameter Borehole Heat Exchanger(LD-BHE) for Tripe-U Pipes Spacer Apply (3중관용 스페이서를 적용한 대구경 지중열교환기의 성능측정에 관한 연구)

  • Lee, Sang-Hoon;Park, Jong-Woo;Lim, Kyoung-Bin
    • 한국신재생에너지학회:학술대회논문집
    • /
    • 2009.11a
    • /
    • pp.581-586
    • /
    • 2009
  • Knowledge of ground thermal properties is most important for the proper design of large scale BHE(borehole heat exchanger) systems. The type, pipe size and thermal performance of the BHE is highly dependent on the ground source heatpump system-efficiency and instruction cost. Thermal response tests with mobile measurement devices were developed primarily for insitu determination of design data for large diameter BHE for triple-U spacer apply. The main purpose has been to determine insitu values of effective ground thermal conductivity and thermal resistance, including the effect of ground-water flow and natural convection in the boreholes. The test rig is set up on a some trailer, and contains a circulation pump, a inline heater, temperature sensors, flow meter, power analysis meter and a data logger for recording the temperature, fluid flow data. A constant heat power is injected into the borehole through the tripl-U pipes system of test rig and the resulting temperature change in the borehole is recorded. The recorded temperature data are analysed with a line-source model, which gives the effective insitu values of rock thermal conductivity and borehole thermal resistance of large diameter BHE for spacer apply.

  • PDF

Efficient Continuous Skyline Query Processing Scheme over Large Dynamic Data Sets

  • Li, He;Yoo, Jaesoo
    • ETRI Journal
    • /
    • v.38 no.6
    • /
    • pp.1197-1206
    • /
    • 2016
  • Performing continuous skyline queries of dynamic data sets is now more challenging as the sizes of data sets increase and as they become more volatile due to the increase in dynamic updates. Although previous work proposed support for such queries, their efficiency was restricted to small data sets or uniformly distributed data sets. In a production database with many concurrent queries, the execution of continuous skyline queries impacts query performance due to update requirements to acquire exclusive locks, possibly blocking other query threads. Thus, the computational costs increase. In order to minimize computational requirements, we propose a method based on a multi-layer grid structure. First, relational data object, elements of an initial data set, are processed to obtain the corresponding multi-layer grid structure and the skyline influence regions over the data. Then, the dynamic data are processed only when they are identified within the skyline influence regions. Therefore, a large amount of computation can be pruned by adopting the proposed multi-layer grid structure. Using a variety of datasets, the performance evaluation confirms the efficiency of the proposed method.

Customer Classification Method for Household Appliances Industries with a Large Number of Incomplete Data (다수의 결측치가 존재하는 가전업 고객 데이터 활용을 위한 고객분류기법의 개발)

  • Chang, Young-Soon;Seo, Jong-Hyen
    • IE interfaces
    • /
    • v.19 no.1
    • /
    • pp.86-96
    • /
    • 2006
  • Some customer data of manufacturing industries have a large number of incomplete data set due to the customer's infrequent purchasing behavior and the limitation of customer profile data gathered from sales representatives. So that, most sophisticated data analysis methods may not be applied directly. This paper proposes a heuristic data analysis method to classify customers in household appliances industries. The proposed PD (percent of difference) method can be used for the discriminant analysis of incomplete customer data with simple mathematical calculations. The method is composed of variable distribution estimation step, PD measure and cluster score evaluation steps, variable impact construction step, and segment assignment step. A real example is also presented.

Influence of Self-driving Data Set Partition on Detection Performance Using YOLOv4 Network (YOLOv4 네트워크를 이용한 자동운전 데이터 분할이 검출성능에 미치는 영향)

  • Wang, Xufei;Chen, Le;Li, Qiutan;Son, Jinku;Ding, Xilong;Song, Jeongyoung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.6
    • /
    • pp.157-165
    • /
    • 2020
  • Aiming at the development of neural network and self-driving data set, it is also an idea to improve the performance of network model to detect moving objects by dividing the data set. In Darknet network framework, the YOLOv4 (You Only Look Once v4) network model was used to train and test Udacity data set. According to 7 proportions of the Udacity data set, it was divided into three subsets including training set, validation set and test set. K-means++ algorithm was used to conduct dimensional clustering of object boxes in 7 groups. By adjusting the super parameters of YOLOv4 network for training, Optimal model parameters for 7 groups were obtained respectively. These model parameters were used to detect and compare 7 test sets respectively. The experimental results showed that YOLOv4 can effectively detect the large, medium and small moving objects represented by Truck, Car and Pedestrian in the Udacity data set. When the ratio of training set, validation set and test set is 7:1.5:1.5, the optimal model parameters of the YOLOv4 have highest detection performance. The values show mAP50 reaching 80.89%, mAP75 reaching 47.08%, and the detection speed reaching 10.56 FPS.