• Title/Summary/Keyword: Validation data set

Search Result 379, Processing Time 0.032 seconds

Combining Machine Learning Techniques with Terrestrial Laser Scanning for Automatic Building Material Recognition

  • Yuan, Liang;Guo, Jingjing;Wang, Qian
    • International conference on construction engineering and project management
    • /
    • 2020.12a
    • /
    • pp.361-370
    • /
    • 2020
  • Automatic building material recognition has been a popular research interest over the past decade because it is useful for construction management and facility management. Currently, the extensively used methods for automatic material recognition are mainly based on 2D images. A terrestrial laser scanner (TLS) with a built-in camera can generate a set of coloured laser scan data that contains not only the visual features of building materials but also other attributes such as material reflectance and surface roughness. With more characteristics provided, laser scan data have the potential to improve the accuracy of building material recognition. Therefore, this research aims to develop a TLS-based building material recognition method by combining machine learning techniques. The developed method uses material reflectance, HSV colour values, and surface roughness as the features for material recognition. A database containing the laser scan data of common building materials was created and used for model training and validation with machine learning techniques. Different machine learning algorithms were compared, and the best algorithm showed an average recognition accuracy of 96.5%, which demonstrated the feasibility of the developed method.

  • PDF

DNN based Speech Detection for the Media Audio (미디어 오디오에서의 DNN 기반 음성 검출)

  • Jang, Inseon;Ahn, ChungHyun;Seo, Jeongil;Jang, Younseon
    • Journal of Broadcast Engineering
    • /
    • v.22 no.5
    • /
    • pp.632-642
    • /
    • 2017
  • In this paper, we propose a DNN based speech detection system using acoustic characteristics and context information of media audio. The speech detection for discriminating between speech and non-speech included in the media audio is a necessary preprocessing technique for effective speech processing. However, since the media audio signal includes various types of sound sources, it has been difficult to achieve high performance with the conventional signal processing techniques. The proposed method improves the speech detection performance by separating the harmonic and percussive components of the media audio and constructing the DNN input vector reflecting the acoustic characteristics and context information of the media audio. In order to verify the performance of the proposed system, a data set for speech detection was made using more than 20 hours of drama, and an 8-hour Hollywood movie data set, which was publicly available, was further acquired and used for experiments. In the experiment, it is shown that the proposed system provides better performance than the conventional method through the cross validation for two data sets.

DETECTION OF SOY, PEA AND WHEAT PROTEINS IN MILK POWDER BY NIRS

  • Cattaneo, Tiziana M.P.;Maraboli, Adele;Barzaghi, Stefania;Giangiacomo, Roberto
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.1156-1156
    • /
    • 2001
  • This work aimed to prove the feasibility of NIR spectroscopy to detect vegetable protein isolates (soy, pea and wheat) in milk powder. Two hundred and thirty-nine samples of genuine and adulterated milk powder (NIZO, Ede, NL) were analysed by NIRS using an InfraAlyzer 500 (Bran+Luebbe). NIR spectra were collected at room temperature, and data were processed by using Sesame Software (Bran+Luebbe). Separated calibrations for each non-milk protein added, in the range of 0-5%, were calculated. NIR data were processed by using Sesame Software (Bran+Luebbe). Prediction and validation were made by using a set of samples not included into the calibration set. The best calibrations were obtained by the PLSR. The type of data pre-treatment (normalisation, 1$\^$st/ derivative, etc..) was chosen to optimize the calibration parameters. NIRS technique was able to predict with good accuracy the percentage of each vegetable protein added to milk powder (soy: R$^2$ 0.994, SEE 0.193, SEcv 0.301, RMSEPall 0.148; pea: R$^2$ 0.997, SEE 0.1498, SEcv 0.207, RMSEPall 0.148, wheat: R$^2$ 0.997, SEE 0.1418, SEcv 0.335, RMSEPall 0.149). Prediction results were compared to those obtained using other two techniques: capillary electrophoresis and competitive ELISA. On the basis of the known true values of non-vegetable protein contents, the NIRS was able to determine more accurately than the other two techniques the percentage of adulteration in the analysed samples.

  • PDF

Influential Factors to the Oral Hygiene Behavior and Perceived Oral Health Status of the Elderly (노인의 주관적 구강건강상태 및 구강보건행태에 영향을 미치는 요인)

  • Lee, Sook-Jeong;Kim, Chang-Hwan;Choi, Gyu-Yil
    • The Korean Journal of Health Service Management
    • /
    • v.6 no.1
    • /
    • pp.39-51
    • /
    • 2012
  • The purpose of this study is to provide a set of fundamental data for the oral hygiene education for the elderly as a result of the survey on the oral hygiene and subjective oral health of the elderly in an aged society. For this purpose, 269 elders who dwelled in Gyeongsangbuk-do region were randomly selected in an arbitrary selection process, followed by a survey on their oral hygiene and health. The collected data were coded and processed by using SPSS 15.0 software. As for the analysis of the data, the general characteristics and the basic items concerning the management of the oral health were analyzed for their frequency and percentages, while the general characteristics and the awareness on the oral health were processed with Chi-square validation to show a set of results as follows; Firstly, among the items on the oral health, the satisfaction on the current condition of their oral health was below average. Secondly, concerning the oral hygienic behaviors, the majority of the samples answered that they were brushing their teeth twice a day. And, as for the brushing methods, the largest number of the samples answered that they were brushing their teeth in a 'horizontal direction'. Thirdly, they reported they were having difficulties in getting dental treatments. The implications of this study are that it is necessary to provide sound oral health education to them to correct the inappropriate oral hygienic behaviors.

A Fundamental Study on Detection of Weeds in Paddy Field using Spectrophotometric Analysis (분광특성 분석에 의한 논 잡초 검출의 기초연구)

  • 서규현;서상룡;성제훈
    • Journal of Biosystems Engineering
    • /
    • v.27 no.2
    • /
    • pp.133-142
    • /
    • 2002
  • This is a fundamental study to develop a sensor to detect weeds in paddy field using machine vision adopted spectralphotometric technique in order to use the sensor to spread herbicide selectively. A set of spectral reflectance data was collected from dry and wet soil and leaves of rice and 6 kinds of weed to select desirable wavelengths to classify soil, rice and weeds. Stepwise variable selection method of discriminant analysis was applied to the data set and wavelengths of 680 and 802 m were selected to distinguish plants (including rice and weeds) from dry and wet soil, respectively. And wavelengths of 580 and 680 nm were selected to classify rice and weeds by the same method. Validity of the wavelengths to distinguish the plants from soil was tested by cross-validation test with built discriminant function to prove that all of soil and plants were classified correctly without any failure. Validity of the wavelengths for classification of rice and weeds was tested by the same method and the test resulted that 98% of rice and 83% of weeds were classified correctly. Feasibility of CCD color camera to detect weeds in paddy field was tested with the spectral reflectance data by the same statistical method as above. Central wavelengths of RGB frame of color camera were tried as tile effective wavelengths to distingush plants from soil and weeds from plants. The trial resulted that 100% and 94% of plants in dry soil and wet soil, respectively, were classified correctly by the central wavelength or R frame only, and 95% of rice and 85% of weeds were classified correctly by the central wavelengths of RGB frames. As a result, it was concluded that CCD color camera has good potential to be used to detect weeds in paddy field.

Study on Automation of Comprehensive IT Asset Management (포괄적 IT 자산관리의 자동화에 관한 연구)

  • Wonseop Hwang;Daihwan Min;Junghwan Kim;Hanjin Lee
    • Journal of Information Technology Services
    • /
    • v.23 no.1
    • /
    • pp.1-10
    • /
    • 2024
  • The IT environment is changing due to the acceleration of digital transformation in enterprises and organizations. This expansion of the digital space makes centralized cybersecurity controls more difficult. For this reason, cyberattacks are increasing in frequency and severity and are becoming more sophisticated, such as ransomware and digital supply chain attacks. Even in large organizations with numerous security personnel and systems, security incidents continue to occur due to unmanaged and unknown threats and vulnerabilities to IT assets. It's time to move beyond the current focus on detecting and responding to security threats to managing the full range of cyber risks. This requires the implementation of asset Inventory for comprehensive management by collecting and integrating all IT assets of the enterprise and organization in a wide range. IT Asset Management(ITAM) systems exist to identify and manage various assets from a financial and administrative perspective. However, the asset information managed in this way is not complete, and there are problems with duplication of data. Also, it is insufficient to update of data-set, including Network Infrastructure, Active Directory, Virtualization Management, and Cloud Platforms. In this study, we, the researcher group propose a new framework for automated 'Comprehensive IT Asset Management(CITAM)' required for security operations by designing a process to automatically collect asset data-set. Such as the Hostname, IP, MAC address, Serial, OS, installed software information, last seen time, those are already distributed and stored in operating IT security systems. CITAM framwork could classify them into unique device units through analysis processes in term of aggregation, normalization, deduplication, validation, and integration.

Transaction Pattern Discrimination of Malicious Supply Chain using Tariff-Structured Big Data (관세 정형 빅데이터를 활용한 우범공급망 거래패턴 선별)

  • Kim, Seongchan;Song, Sa-Kwang;Cho, Minhee;Shin, Su-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.2
    • /
    • pp.121-129
    • /
    • 2021
  • In this study, we try to minimize the tariff risk by constructing a hazardous cargo screening model by applying Association Rule Mining, one of the data mining techniques. For this, the risk level between supply chains is calculated using the Apriori Algorithm, which is an association analysis algorithm, using the big data of the import declaration form of the Korea Customs Service(KCS). We perform data preprocessing and association rule mining to generate a model to be used in screening the supply chain. In the preprocessing process, we extract the attributes required for rule generation from the import declaration data after the error removing process. Then, we generate the rules by using the extracted attributes as inputs to the Apriori algorithm. The generated association rule model is loaded in the KCS screening system. When the import declaration which should be checked is received, the screening system refers to the model and returns the confidence value based on the supply chain information on the import declaration data. The result will be used to determine whether to check the import case. The 5-fold cross-validation of 16.6% precision and 33.8% recall showed that import declaration data for 2 years and 6 months were divided into learning data and test data. This is a result that is about 3.4 times higher in precision and 1.5 times higher in recall than frequency-based methods. This confirms that the proposed method is an effective way to reduce tariff risks.

Precision Validation of Electromagnetic Physics in Geant4 Simulation for Proton Therapy (양성자 치료 전산모사를 위한 Geant4 전자기 물리 모델 정확성 검증)

  • Park, So-Hyun;Rah, Jeong-Eun;Shin, Jung-Wook;Park, Sung-Yong;Yoon, Sei-Chul;Jung, Won-Gyun;Suh, Tae-Suk
    • Progress in Medical Physics
    • /
    • v.20 no.4
    • /
    • pp.225-234
    • /
    • 2009
  • Geant4 (GEometry ANd Tracking) provides various packages specialized in modeling electromagnetic interactions. The validation of Geant4 physics models is a significant issue for the applications of Geant4 based simulation in medical physics. The purpose of this study is to evaluate accuracy of Geant4 electromagnetic physics for proton therapy. The validation was performed both the Continuous slowing down approximation (CSDA) range and the stopping power. In each test, the reliability of the electromagnetic models in a selected group of materials was evaluated such as water, bone, adipose tissue and various atomic elements. Results of Geant4 simulation were compared with the National Institute of Standards and Technology (NIST) reference data. As results of comparison about water, bone and adipose tissue, average percent difference of CSDA range were presented 1.0%, 1.4% and 1.4%, respectively. Average percent difference of stopping power were presented 0.7%, 1.0% and 1.3%, respectively. The data were analyzed through the kolmogorov-smirnov Goodness-of-Fit statistical analysis test. All the results from electromagnetic models showed a good agreement with the reference data, where all the corresponding p-values are higher than the confidence level $\alpha=0.05$ set.

  • PDF

A Comparative Study On Accident Prediction Model Using Nonlinear Regression And Artificial Neural Network, Structural Equation for Rural 4-Legged Intersection (비선형 회귀분석, 인공신경망, 구조방정식을 이용한 지방부 4지 신호교차로 교통사고 예측모형 성능 비교 연구)

  • Oh, Ju Taek;Yun, Ilsoo;Hwang, Jeong Won;Han, Eum
    • Journal of Korean Society of Transportation
    • /
    • v.32 no.3
    • /
    • pp.266-279
    • /
    • 2014
  • For the evaluation of roadway safety, diverse methods, including before-after studies, simple comparison using historic traffic accident data, methods based on experts' opinion or literature, have been applied. Especially, many research efforts have developed traffic accident prediction models in order to identify critical elements causing accidents and evaluate the level of safety. A traffic accident prediction model must secure predictability and transferability. By acquiring the predictability, the model can increase the accuracy in predicting the frequency of accidents qualitatively and quantitatively. By guaranteeing the transferability, the model can be used for other locations with acceptable accuracy. To this end, traffic accident prediction models using non-linear regression, artificial neural network, and structural equation were developed in this study. The predictability and transferability of three models were compared using a model development data set collected from 90 signalized intersections and a model validation data set from other 33 signalized intersections based on mean absolute deviation and mean squared prediction error. As a result of the comparison using the model development data set, the artificial neural network showed the highest predictability. However, the non-linear regression model was found out to be most appropriate in the comparison using the model validation data set. Conclusively, the artificial neural network has a strong ability in representing the relationship between the frequency of traffic accidents and traffic and road design elements. However, the predictability of the artificial neural network significantly decreased when the artificial neural network was applied to a new data which was not used in the model developing.

Land Cover Classification over East Asian Region Using Recent MODIS NDVI Data (2006-2008) (최근 MODIS 식생지수 자료(2006-2008)를 이용한 동아시아 지역 지면피복 분류)

  • Kang, Jeon-Ho;Suh, Myoung-Seok;Kwak, Chong-Heum
    • Atmosphere
    • /
    • v.20 no.4
    • /
    • pp.415-426
    • /
    • 2010
  • A Land cover map over East Asian region (Kongju national university Land Cover map: KLC) is classified by using support vector machine (SVM) and evaluated with ground truth data. The basic input data are the recent three years (2006-2008) of MODIS (MODerate Imaging Spectriradiometer) NDVI (normalized difference vegetation index) data. The spatial resolution and temporal frequency of MODIS NDVI are 1km and 16 days, respectively. To minimize the number of cloud contaminated pixels in the MODIS NDVI data, the maximum value composite is applied to the 16 days data. And correction of cloud contaminated pixels based on the spatiotemporal continuity assumption are applied to the monthly NDVI data. To reduce the dataset and improve the classification quality, 9 phenological data, such as, NDVI maximum, amplitude, average, and others, derived from the corrected monthly NDVI data. The 3 types of land cover maps (International Geosphere Biosphere Programme: IGBP, University of Maryland: UMd, and MODIS) were used to build up a "quasi" ground truth data set, which were composed of pixels where the three land cover maps classified as the same land cover type. The classification results show that the fractions of broadleaf trees and grasslands are greater, but those of the croplands and needleleaf trees are smaller compared to those of the IGBP or UMd. The validation results using in-situ observation database show that the percentages of pixels in agreement with the observations are 80%, 77%, 63%, 57% in MODIS, KLC, IGBP, UMd land cover data, respectively. The significant differences in land cover types among the MODIS, IGBP, UMd and KLC are mainly occurred at the southern China and Manchuria, where most of pixels are contaminated by cloud and snow during summer and winter, respectively. It shows that the quality of raw data is one of the most important factors in land cover classification.