• Title/Summary/Keyword: Data classification

Search Result 8,102, Processing Time 0.034 seconds

Image Classification Model using web crawling and transfer learning (웹 크롤링과 전이학습을 활용한 이미지 분류 모델)

  • Lee, JuHyeok;Kim, Mi Hui
    • Journal of IKEEE
    • /
    • v.26 no.4
    • /
    • pp.639-646
    • /
    • 2022
  • In this paper, to solve the large dataset problem, we collect images through an image collection method called web crawling and build datasets for use in image classification models through a data preprocessing process. We also propose a lightweight model that can automatically classify images by adding category values by incorporating transfer learning into the image classification model and an image classification model that reduces training time and achieves high accuracy.

Text Classification on Social Network Platforms Based on Deep Learning Models

  • YA, Chen;Tan, Juan;Hoekyung, Jung
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.1
    • /
    • pp.9-16
    • /
    • 2023
  • The natural language on social network platforms has a certain front-to-back dependency in structure, and the direct conversion of Chinese text into a vector makes the dimensionality very high, thereby resulting in the low accuracy of existing text classification methods. To this end, this study establishes a deep learning model that combines a big data ultra-deep convolutional neural network (UDCNN) and long short-term memory network (LSTM). The deep structure of UDCNN is used to extract the features of text vector classification. The LSTM stores historical information to extract the context dependency of long texts, and word embedding is introduced to convert the text into low-dimensional vectors. Experiments are conducted on the social network platforms Sogou corpus and the University HowNet Chinese corpus. The research results show that compared with CNN + rand, LSTM, and other models, the neural network deep learning hybrid model can effectively improve the accuracy of text classification.

Breast Cancer Classification in Ultrasound Images using Semi-supervised method based on Pseudo-labeling

  • Seokmin Han
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.1
    • /
    • pp.124-131
    • /
    • 2024
  • Breast cancer classification using ultrasound, while widely employed, faces challenges due to its relatively low predictive value arising from significant overlap in characteristics between benign and malignant lesions, as well as operator-dependency. To alleviate these challenges and reduce dependency on radiologist interpretation, the implementation of automatic breast cancer classification in ultrasound image can be helpful. To deal with this problem, we propose a semi-supervised deep learning framework for breast cancer classification. In the proposed method, we could achieve reasonable performance utilizing less than 50% of the training data for supervised learning in comparison to when we utilized a 100% labeled dataset for training. Though it requires more modification, this methodology may be able to alleviate the time-consuming annotation burden on radiologists by reducing the number of annotation, contributing to a more efficient and effective breast cancer detection process in ultrasound images.

Hierarchical Clustering Approach of Multisensor Data Fusion: Application of SAR and SPOT-7 Data on Korean Peninsula

  • Lee, Sang-Hoon;Hong, Hyun-Gi
    • Proceedings of the KSRS Conference
    • /
    • 2002.10a
    • /
    • pp.65-65
    • /
    • 2002
  • In remote sensing, images are acquired over the same area by sensors of different spectral ranges (from the visible to the microwave) and/or with different number, position, and width of spectral bands. These images are generally partially redundant, as they represent the same scene, and partially complementary. For many applications of image classification, the information provided by a single sensor is often incomplete or imprecise resulting in misclassification. Fusion with redundant data can draw more consistent inferences for the interpretation of the scene, and can then improve classification accuracy. The common approach to the classification of multisensor data as a data fusion scheme at pixel level is to concatenate the data into one vector as if they were measurements from a single sensor. The multiband data acquired by a single multispectral sensor or by two or more different sensors are not completely independent, and a certain degree of informative overlap may exist between the observation spaces of the different bands. This dependence may make the data less informative and should be properly modeled in the analysis so that its effect can be eliminated. For modeling and eliminating the effect of such dependence, this study employs a strategy using self and conditional information variation measures. The self information variation reflects the self certainty of the individual bands, while the conditional information variation reflects the degree of dependence of the different bands. One data set might be very less reliable than others in the analysis and even exacerbate the classification results. The unreliable data set should be excluded in the analysis. To account for this, the self information variation is utilized to measure the degrees of reliability. The team of positively dependent bands can gather more information jointly than the team of independent ones. But, when bands are negatively dependent, the combined analysis of these bands may give worse information. Using the conditional information variation measure, the multiband data are split into two or more subsets according the dependence between the bands. Each subsets are classified separately, and a data fusion scheme at decision level is applied to integrate the individual classification results. In this study. a two-level algorithm using hierarchical clustering procedure is used for unsupervised image classification. Hierarchical clustering algorithm is based on similarity measures between all pairs of candidates being considered for merging. In the first level, the image is partitioned as any number of regions which are sets of spatially contiguous pixels so that no union of adjacent regions is statistically uniform. The regions resulted from the low level are clustered into a parsimonious number of groups according to their statistical characteristics. The algorithm has been applied to satellite multispectral data and airbone SAR data.

  • PDF

Land cover classification based on the phonology of Korea using NOAA-AVHRR

  • Kim, Won-Joo;Nam, Ki-Deock;Park, Chong-Hwa
    • Proceedings of the KSRS Conference
    • /
    • 1999.11a
    • /
    • pp.439-442
    • /
    • 1999
  • It is important to analyze the seasonal change profiles of land cover type in large scale for establishing preservation strategy and environmental monitoring. Because the NOAA-AVHRR data sets provide global data with high temporal resolution, it is suitable for the land cover classification of the large area. The objectives of this study were to classify land cover of Korea, to investigate the phenological profiles of land cover. The NOAA-AVHRR data from Jan. 1998 to Dec. 1998 were received by Korea Ocean Research & Development Institute(KORDI) and were used for this study. The NDVI data were produced from this data. And monthly maximum value composite data were made for reducing cloud effect and temporal classification. And the data were classified using the method of supervised classification. To label the land cover classes, they were classified again using generalized vegetation map and Landsat-TM classified image. And the profiles of each class was analyzed according to each month. Results of this study can be summarized as follows. First, it was verified that the use of vegetation map and TM classified map was available to obtain the temporal class labeling with NOAA-AVHRR. Second, phenological characteristics of plant communities of Korea using NOAA-AVHRR was identified. Third, NDVI of North Korea is lower on Summer than that of South Korea. And finally, Forest cover is higher than another cover types. Broadleaf forest is highest on may. Outline of covertype profiles was investigated.

  • PDF

Classification of Land Cover over the Korean Peninsula Using Polar Orbiting Meteorological Satellite Data (극궤도 기상위성 자료를 이용한 한반도의 지면피복 분류)

  • Suh, Myoung-Seok;Kwak, Chong-Heum;Kim, Hee-Soo;Kim, Maeng-Ki
    • Journal of the Korean earth science society
    • /
    • v.22 no.2
    • /
    • pp.138-146
    • /
    • 2001
  • The land cover over Korean peninsula was classified using a multi-temporal NOAA/AVHRR (Advanced Very High Resolution Radiometer) data. Four types of phenological data derived from the 10-day composited NDVI (Normalized Differences Vegetation Index), maximum and annual mean land surface temperature, and topographical data were used not only reducing the data volume but also increasing the accuracy of classification. Self organizing feature map (SOFM), a kind of neural network technique, was used for the clustering of satellite data. We used a decision tree for the classification of the clusters. When we compared the classification results with the time series of NDVI and some other available ground truth data, the urban, agricultural area, deciduous tree and evergreen tree were clearly classified.

  • PDF

An Analysis of the Landuse Classification Accuracy Using IHS Merged Images from IRS-1C PAN Data and Landsat TM Data (IRS-1C PAN 데이터와 Landsat TM 데이터의 IHS중합화상을 이용한 토지이용분류 정확도 분석)

  • 안기원;이효성;서두천;신석효
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.16 no.2
    • /
    • pp.187-194
    • /
    • 1998
  • In this study, effective multispectral Landsat TM band combinations for a merging with the high resolution IRS-1C PAN data using the IHS method to improve landuse accuracy is discussed. From the pre-classified image using the merged images with TM all six band images(with the exception of band 6 image) and PAN image, a sample data which has ten classes was generated. An evaluation of the overall classification accuracy for the representative seven merged images which were merged using each TM three-band images and IRS-1C PAN image by IHS method for the sample area. The increase in classification accuracy is most significant with the inclusion of two of TM4, TM5 and TM7 infrared band images. Especially, the largest increase(11.8 percent) in landuse classification accuracy were investigated when Landsat TM247 bands were merged with IRS-1C PAN data. The classification accuracy when TM three band image and PAN image were used without merging is higher than result of the case of using the merged images.

  • PDF

Enhancing Classification Performance by Separating Spectral Signature of Training Data Set (교사 자료의 분광 특징 분리에 의한 감독 분류 성능 향상)

  • 김광은
    • Korean Journal of Remote Sensing
    • /
    • v.18 no.6
    • /
    • pp.369-376
    • /
    • 2002
  • This paper presents a method to enhance the performance of supervised classification by separating the spectral signature of the training data sets for each class. Using clustering technique, a training data set is divided into several subsets which show a pattern of the normal distribution with small value of spectral variances. Then a supervised classification is applied with the divided training data set as training data for the temporary subclasses of the original class. The proposed method is applied to a Landsat TM image of Busan area for the applicability test. The result shows that the proposed method produces better classified results than the conventional statistical classification methods. It is expected that the proposed method will reduce the effort and expense for selecting the training data set for each class in an area which has spectrally homogeneous signature.

Classification of Microarray Gene Expression Data by MultiBlock Dimension Reduction

  • Oh, Mi-Ra;Kim, Seo-Young;Kim, Kyung-Sook;Baek, Jang-Sun;Son, Young-Sook
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.567-576
    • /
    • 2006
  • In this paper, we applied the multiblock dimension reduction methods to the classification of tumor based on microarray gene expressions data. This procedure involves clustering selected genes, multiblock dimension reduction and classification using linear discrimination analysis and quadratic discrimination analysis.

SVC with Modified Hinge Loss Function

  • Lee, Sang-Bock
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.3
    • /
    • pp.905-912
    • /
    • 2006
  • Support vector classification(SVC) provides more complete description of the linear and nonlinear relationships between input vectors and classifiers. In this paper we propose to solve the optimization problem of SVC with a modified hinge loss function, which enables to use an iterative reweighted least squares(IRWLS) procedure. We also introduce the approximate cross validation function to select the hyperparameters which affect the performance of SVC. Experimental results are then presented which illustrate the performance of the proposed procedure for classification.

  • PDF