• Title/Summary/Keyword: Data classification

Search Result 8,102, Processing Time 0.036 seconds

Segmentation and Classification of Lidar data

  • Tseng, Yi-Hsing;Wang, Miao
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.153-155
    • /
    • 2003
  • Laser scanning has become a viable technique for the collection of a large amount of accurate 3D point data densely distributed on the scanned object surface. The inherent 3D nature of the sub-randomly distributed point cloud provides abundant spatial information. To explore valuable spatial information from laser scanned data becomes an active research topic, for instance extracting digital elevation model, building models, and vegetation volumes. The sub-randomly distributed point cloud should be segmented and classified before the extraction of spatial information. This paper investigates some exist segmentation methods, and then proposes an octree-based split-and-merge segmentation method to divide lidar data into clusters belonging to 3D planes. Therefore, the classification of lidar data can be performed based on the derived attributes of extracted 3D planes. The test results of both ground and airborne lidar data show the potential of applying this method to extract spatial features from lidar data.

  • PDF

Semi-Supervised Learning for Fault Detection and Classification of Plasma Etch Equipment (준지도학습 기반 반도체 공정 이상 상태 감지 및 분류)

  • Lee, Yong Ho;Choi, Jeong Eun;Hong, Sang Jeen
    • Journal of the Semiconductor & Display Technology
    • /
    • v.19 no.4
    • /
    • pp.121-125
    • /
    • 2020
  • With miniaturization of semiconductor, the manufacturing process become more complex, and undetected small changes in the state of the equipment have unexpectedly changed the process results. Fault detection classification (FDC) system that conducts more active data analysis is feasible to achieve more precise manufacturing process control with advanced machine learning method. However, applying machine learning, especially in supervised learning criteria, requires an arduous data labeling process for the construction of machine learning data. In this paper, we propose a semi-supervised learning to minimize the data labeling work for the data preprocessing. We employed equipment status variable identification (SVID) data and optical emission spectroscopy data (OES) in silicon etch with SF6/O2/Ar gas mixture, and the result shows as high as 95.2% of labeling accuracy with the suggested semi-supervised learning algorithm.

A Study on Deep Learning Model-based Object Classification for Big Data Environment

  • Kim, Jeong-Sig;Kim, Jinhong
    • Journal of Software Assessment and Valuation
    • /
    • v.17 no.1
    • /
    • pp.59-66
    • /
    • 2021
  • Recently, conceptual information model is changing fast, and these changes are coming about as a result of individual tendency, social cultural, new circumstances and societal shifts within big data environment. Despite the data is growing more and more, now is the time to commit ourselves to the development of renewable, invaluable information of social/live commerce. Because we have problems with various insoluble data, we propose about deep learning prediction model-based object classification in social commerce of big data environment. Accordingly, it is an increased need of social commerce platform capable of handling high volumes of multiple items by users. Consequently, responding to rapid changes in users is a very significant by deep learning. Namely, promptly meet the needs of the times, and a widespread growth in big data environment with the goal of realizing in this paper.

Application of data mining and statistical measurement of agricultural high-quality development

  • Yan Zhou
    • Advances in nano research
    • /
    • v.14 no.3
    • /
    • pp.225-234
    • /
    • 2023
  • In this study, we aim to use big data resources and statistical analysis to obtain a reliable instruction to reach high-quality and high yield agricultural yields. In this regard, soil type data, raining and temperature data as well as wheat production in each year are collected for a specific region. Using statistical methodology, the acquired data was cleaned to remove incomplete and defective data. Afterwards, using several classification methods in machine learning we tried to distinguish between different factors and their influence on the final crop yields. Comparing the proposed models' prediction using statistical quantities correlation factor and mean squared error between predicted values of the crop yield and actual values the efficacy of machine learning methods is discussed. The results of the analysis show high accuracy of machine learning methods in the prediction of the crop yields. Moreover, it is indicated that the random forest (RF) classification approach provides best results among other classification methods utilized in this study.

Selection Method of Fuzzy Partitions in Fuzzy Rule-Based Classification Systems (퍼지 규칙기반 분류시스템에서 퍼지 분할의 선택방법)

  • Son, Chang-S.;Chung, Hwan-M.;Kwon, Soon-H.
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.3
    • /
    • pp.360-366
    • /
    • 2008
  • The initial fuzzy partitions in fuzzy rule-based classification systems are determined by considering the domain region of each attribute with the given data, and the optimal classification boundaries within the fuzzy partitions can be discovered by tuning their parameters using various learning processes such as neural network, genetic algorithm, and so on. In this paper, we propose a selection method for fuzzy partition based on statistical information to maximize the performance of pattern classification without learning processes where statistical information is used to extract the uncertainty regions (i.e., the regions which the classification boundaries in pattern classification problems are determined) in each input attribute from the numerical data. Moreover the methods for extracting the candidate rules which are associated with the partition intervals generated by statistical information and for minimizing the coupling problem between the candidate rules are additionally discussed. In order to show the effectiveness of the proposed method, we compared the classification accuracy of the proposed with those of conventional methods on the IRIS and New Thyroid Cancer data. From experimental results, we can confirm the fact that the proposed method only considering statistical information of the numerical patterns provides equal to or better classification accuracy than that of the conventional methods.

Construction of vehicle classification estimation model from the TCS data by using bootstrap Algorithm (붓스트랩 기법을 이용한 TCS 데이터로부터 차종별 교통량 추정모형 구축)

  • 노정현;김태균;차경준;박영선;남궁성;황부연
    • Journal of Korean Society of Transportation
    • /
    • v.20 no.1
    • /
    • pp.39-52
    • /
    • 2002
  • Traffic data by vehicle classification is difficult for mutual exchange of data due to the different vehicle classification from each other by the data sources; as a result, application of the data is very limited. In Particular. in case of TCS vehicle classification in national highways, passenger car, van and truck are mixed in one category and the practical usage is very low. The research standardize the vehicle classification to convert other data and develop the model which can estimate national highway traffic data by the standardized vehicle classification from the raw traffic data obtained at the highway tollgates. The tollgates are categorized into several groups by their features and the model estimates traffic data by the standardized vehicle classification by using the point estimation and bootstrap algorithm. The result indicates that both of the two methods above have the significant level. When considering the bias of the extreme value by the sample size, the bootstrap algorithm is more sophisticated. Using result of this study, we is expect the usage improvement of TCS data and more specific comparison between the freeway traffic investigation and link volume on freeway using the TCS data.

A Dynamic Variable Window-based Topographical Classification Method Using Aerial LiDAR Data (항공 라이다 데이터를 이용한 동적 가변 윈도우 기반 지형 분류 기법)

  • Sung, Chul-Woong;Lee, Sung-Gyu;Park, Chang-Hoo;Lee, Ho-Jun;Kim, Yoo-Sung
    • Spatial Information Research
    • /
    • v.18 no.5
    • /
    • pp.13-26
    • /
    • 2010
  • In this paper, a dynamic variable window-based topographical classification method is proposed which has the changeable classification units depending on topographical properties. In the proposed scheme, to im prove the classification efficiency, the unit of topographical classification can be changeable dynamically according to the topographical properties and repeated patterns. Also, in this paper, the classification efficiency and accuracy of the proposed method are analyzed in order to find an optimal maximum decision window-size through the experiment. According to the experiment results, the proposed dynamic variable window-based topographical classification method maintains similar accuracy but remarkably reduce computing time than that of a fixed window-size based one, respectively.

Comparison of Performance Factors for Automatic Classification of Records Utilizing Metadata (메타데이터를 활용한 기록물 자동분류 성능 요소 비교)

  • Young Bum Gim;Woo Kwon Chang
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.3
    • /
    • pp.99-118
    • /
    • 2023
  • The objective of this study is to identify performance factors in the automatic classification of records by utilizing metadata that contains the contextual information of records. For this study, we collected 97,064 records of original textual information from Korean central administrative agencies in 2022. Various classification algorithms, data selection methods, and feature extraction techniques are applied and compared with the intent to discern the optimal performance-inducing technique. The study results demonstrated that among classification algorithms, Random Forest displayed higher performance, and among feature extraction techniques, the TF method proved to be the most effective. The minimum data quantity of unit tasks had a minimal influence on performance, and the addition of features positively affected performance, while their removal had a discernible negative impact.

Development of a Compound Classification Process for Improving the Correctness of Land Information Analysis in Satellite Imagery - Using Principal Component Analysis, Canonical Correlation Classification Algorithm and Multitemporal Imagery - (위성영상의 토지정보 분석정확도 향상을 위한 응용체계의 개발 - 다중시기 영상과 주성분분석 및 정준상관분류 알고리즘을 이용하여 -)

  • Park, Min-Ho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.28 no.4D
    • /
    • pp.569-577
    • /
    • 2008
  • The purpose of this study is focused on the development of compound classification process by mixing multitemporal data and annexing a specific image enhancement technique with a specific image classification algorithm, to gain more accurate land information from satellite imagery. That is, this study suggests the classification process using canonical correlation classification technique after principal component analysis for the mixed multitemporal data. The result of this proposed classification process is compared with the canonical correlation classification result of one date images, multitemporal imagery and a mixed image after principal component analysis for one date images. The satellite images which are used are the Landsat 5 TM images acquired on July 26, 1994 and September 1, 1996. Ground truth data for accuracy assessment is obtained from topographic map and aerial photograph, and all of the study area is used for accuracy assessment. The proposed compound classification process showed superior efficiency to appling canonical correlation classification technique for only one date image in classification accuracy by 8.2%. Especially, it was valid in classifying mixed urban area correctly. Conclusively, to improve the classification accuracy when extracting land cover information using Landsat TM image, appling canonical correlation classification technique after principal component analysis for multitemporal imagery is very useful.

A Comparative Study of Carbon Absorption Measurement Using Hyperspectral Image and High Density LiDAR Data in Geojedo

  • Choi, Byoung Gil;Na, Young Woo;Shin, Young Seob
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.4
    • /
    • pp.231-240
    • /
    • 2017
  • This paper aims to study a method to estimate precise carbon absorption by quantification of forest information that uses accurate LiDAR data, hyperspectral image. To estimate precise carbon absorption value by using spatial data, a problem was found out of carbon absorption value estimation method with statistical method, which is already existed method, and then offered optimized carbon absorption estimation method with spatial information by analyzing with methods of compare digital aerial photogrammetry and LiDAR data. It turned out possible Precise classification and quantification in case of using LiDAR and hyperspectral image. Various classification of tree species was possible with use of LiDAR and hyperspectral image. Classification of hyperspectral image was matched in general with field survey and Mahalanobis distance classification method. Precise forest resources could be extracted using high density LiDAR data. Compared with existing method, 19.7% in forest area, 19.2% in total carbon absorption, 0.9% in absorption per unit area of difference created, and improvement was found out to be estimated precisely in international code.