• Title/Summary/Keyword: Principal Component Analysis (PCA)

Search Result 1,231, Processing Time 0.04 seconds

Enhancing Recommender Systems by Fusing Diverse Information Sources through Data Transformation and Feature Selection

  • Thi-Linh Ho;Anh-Cuong Le;Dinh-Hong Vu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.5
    • /
    • pp.1413-1432
    • /
    • 2023
  • Recommender systems aim to recommend items to users by taking into account their probable interests. This study focuses on creating a model that utilizes multiple sources of information about users and items by employing a multimodality approach. The study addresses the task of how to gather information from different sources (modalities) and transform them into a uniform format, resulting in a multi-modal feature description for users and items. This work also aims to transform and represent the features extracted from different modalities so that the information is in a compatible format for integration and contains important, useful information for the prediction model. To achieve this goal, we propose a novel multi-modal recommendation model, which involves extracting latent features of users and items from a utility matrix using matrix factorization techniques. Various transformation techniques are utilized to extract features from other sources of information such as user reviews, item descriptions, and item categories. We also proposed the use of Principal Component Analysis (PCA) and Feature Selection techniques to reduce the data dimension and extract important features as well as remove noisy features to increase the accuracy of the model. We conducted several different experimental models based on different subsets of modalities on the MovieLens and Amazon sub-category datasets. According to the experimental results, the proposed model significantly enhances the accuracy of recommendations when compared to SVD, which is acknowledged as one of the most effective models for recommender systems. Specifically, the proposed model reduces the RMSE by a range of 4.8% to 21.43% and increases the Precision by a range of 2.07% to 26.49% for the Amazon datasets. Similarly, for the MovieLens dataset, the proposed model reduces the RMSE by 45.61% and increases the Precision by 14.06%. Additionally, the experimental results on both datasets demonstrate that combining information from multiple modalities in the proposed model leads to superior outcomes compared to relying on a single type of information.

TRAO KSP TIMES: Homogeneous, High-sensitivity, Multi-transition Spectral Maps toward the Orion A and Ophiuchus Cloud with a High-velocity Resolution.

  • Yun, Hyeong-Sik;Lee, Jeong-Eun;Choi, Yunhee;Evans, Neal J. II;Offner, Stella S.R.;Heyer, Mark H.;Lee, Yong-Hee;Baek, Giseon;Choi, Minho;Kang, Hyunwoo;Cho, Jungyeon;Lee, Seokho;Tatematsu, Ken'ichi;Gaches, Brandt A.L.;Yang, Yao-Lun;Chen, How-Huan;Lee, Youngung;Jung, Jae Hoon;Lee, Changhoon
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.2
    • /
    • pp.68.1-68.1
    • /
    • 2019
  • Turbulence plays a crucial role in controlling star formation as it produces density fluctuation as well as non-thermal pressure against gravity. Therefore, turbulence controls the mode and tempo of star formation. However, despite a plenty of previous studies, the properties of turbulence remain poorly understood. As part of the Taeduk Radio Astronomy Observatory (TRAO) Key Science Program (KSP), "mapping Turbulent properties In star-forming MolEcular clouds down to the Sonic scale (TIMES; PI: Jeong-Eun Lee)", we mapped the Orion A and the Ophiuchus clouds, in three sets of lines (13CO 1-0/C18O 1-0, HCN 1-0/HCO+ 1-0, and CS 2-1/N2H+ 1-0) with a high-velocity resolution (~0.1 km/s) using the TRAO 14-m telescope. The mean Trms for the observed maps are less than 0.25 K, and all these maps show uniform Trms values throughout the observed area. These homogeneous and high signal-to-noise ratio data provide the best chance to probe the nature of turbulence in two different star-forming clouds, the Orion A and Ophiuchus clouds. We present comparisons between the line intensities of different molecular tracers as well as the results of a Principal Component Analysis (PCA).

  • PDF

Effect of Land Use on the Water Quality of Watersheds in Nam Han river. (토지이용이 남한강 유역 수질에 미치는 영향)

  • Byeon, Sangdon;Yang, Dongseok;Lim, Kyeongjae;Kim, Jonggun;Hong, Eunmi
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.164-164
    • /
    • 2021
  • 우리나라는 최근 도시화 및 산업화 등과 같은 유역개발이 가속화되면서 유역환경의 급격한 변화를 가져왔다. 도시화는 지표면의 불투수 면적을 증가시키고, 농업지역의 확대는 비료 및 농약의 사용을 증가시키고, 강우시 토양침식에 따른 흙탕물과 비점오염원의 수계 유출로 인해 수질악화 등의 문제를 야기시킨다. 이와 같은 유역환경의 변화는 수질에 직접적인 영향을 끼치므로, 미래 토지이용의 변화에 따른 하천유역의 유출특성과 영향 인자를 규명해야 효율적인 하천유역관리를 할 수 있다. 하지만 우리나라는 기후적 특성상 계절에 따른 수질 및 기후변수의 편차가 크기 때문에 하천유역관리에 있어 어려움이 많다. 특히 남한강 유역은 산림 및 고랭지밭 비중이 높은 지역이며, 여름철에는 강우로인한 토양침식이 심각하여 수질 및 수생태계 건강성을 악화시킨다. 남한강 상류 유역에는 송천과 도암호, 골지천과 같은 비점오염관리지역이 위치하고 있으며 현재까지도 하천유역관리가 어려운 지역에 해당한다. 본 연구는 남한강 유역에 위치한 17개 수질측정망을 대상으로 GIS시스템을 이용해 17개의 소권역으로 나누어 분석하였다. 토지이용자료는 환경공간정보 서비스의 2010년대 말 자료를 이용하였으며, 수질 자료는 유역환경 변화에 영향을 미칠 것이라 판단되는 수질 변수를 선별하여 10년동안의 장기간 수질 데이터를 이용하여 분석하였다. 16개의 수질변수는 정규성을 검증한 후 pairwisse t-test를 이용한 시기별 수질의 차이를 비교하였으며, 수질변수들과 토지이용매개변수 간에 상관관계를 찾아 유의관계가 있는지 확인함으로써 서로 다른 변수간에 상관성을 파악하고자 하였다. 유역의 특성별 상관도를 평가하고 해석하기 위하여 주성분 분석(Principal component analysis, PCA)을 실시하였다. 통계적 방법을 통해 시기에 따른 수질과 토지이용간의 관계를 밝힘으로써 미래하천유역관리에 기초자료로 활용될 것이다.

  • PDF

Growth Response, Ecological Niche and Overlap between Quercus variabilis and Quercus dentata under Soil Moisture Gradient (토양수분구배에서 굴참나무와 떡갈나무의 생육반응, 생태 지위 및 중복역)

  • Park, Yeo-Bin;Kim, Eui-Joo
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.26 no.5
    • /
    • pp.47-56
    • /
    • 2023
  • The Quercus variabilis and Quercus dentata, which are said to be relatively drought tolerant among the important genus Quercus that represent deciduous broad-leaved forests in Korea. These two species are widely distributed worldwide in Korea, Japan and China (northern, central, western and eastern subtropical regions). This study compared the ecological niche breadth and overlap according to growth response in 4 soil moisture gradients for the two species and tried to reveal degree of competition and ecological niche characteristics. The ecological niche breadth was 0.977±0.020 for Q. variabilis and 0.979±0.014 for Q. dentata, the latter being slightly wider. And they were similar in 5 traits (stem length, leaf lamina length, leaf width length, stem weight, leaf petiole weight), Q. variabilis was more dominant in 4 traits (leaves number, stem diameter, leaf area, leaf petiole length), and Q. dentata was more dominant in 7 traits (root length, shoot length, plant weight, root weight, shoot weight, leaf weight, leaf petiole weight). The ecological niche overlap for soil moisture between the two species overlapped most in plant structure-related traits and least in photosynthetic organ-related traits such as petiole length. As a result of principal component analysis, degree of competition between the two species for soil moisture was more severe when the soil moisture condition was low than high. Among the measured traits that affect the two-dimensional distribution, 8 traits (Leaves number, Shoot length, Stem length, Plant weight, Root weight, Shoot weight, Stem weight, Leaves weight) were correlated with the factor 1, and 2 traits (Leaf width length, Leaf petiole weight) were correlated with the factor 2 (r>0.5). These results show that the ecological response of the two species to soil moisture is not a few traits involved, but several traits are involved simultaneously.

Development of a Storage Level and Capacity Monitoring and Forecasting Techniques in Yongdam Dam Basin Using High Resolution Satellite Image (고해상도 위성자료를 이용한 용담댐 유역 저수위/저수량 모니터링 및 예측 기술 개발)

  • Yoon, Sunkwon;Lee, Seongkyu;Park, Kyungwon;Jang, Sangmin;Rhee, Jinyung
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.6_1
    • /
    • pp.1041-1053
    • /
    • 2018
  • In this study, a real-time storage level and capacity monitoring and forecasting system for Yongdam Dam watershed was developed using high resolution satellite image. The drought indices such as Standardized Precipitation Index (SPI) from satellite data were used for storage level monitoring in case of drought. Moreover, to predict storage volume we used a statistical method based on Principle Component Analysis (PCA) of Singular Spectrum Analysis (SSA). According to this study, correlation coefficient between storage level and SPI (3) was highly calculated with CC=0.78, and the monitoring and predictability of storage level was diagnosed using the drought index calculated from satellite data. As a result of analysis of principal component analysis by SSA, correlation between SPI (3) and each Reconstructed Components (RCs) data were highly correlated with CC=0.87 to 0.99. And also, the correlations of RC data with Normalized Water Surface Level (N-W.S.L.) were confirmed that has highly correlated with CC=0.83 to 0.97. In terms of high resolution satellite image we developed a water detection algorithm by applying an exponential method to monitor the change of storage level by using Multi-Spectral Instrument (MSI) sensor of Sentinel-2 satellite. The materials of satellite image for water surface area detection in Yongdam dam watershed was considered from 2016 to 2018, respectively. Based on this, we proposed the possibility of real-time drought monitoring system using high resolution water surface area detection by Sentinel-2 satellite image. The results of this study can be applied to estimate of the reservoir volume calculated from various satellite observations, which can be used for monitoring and estimating hydrological droughts in an unmeasured area.

Transfer Learning using Multiple ConvNet Layers Activation Features with Principal Component Analysis for Image Classification (전이학습 기반 다중 컨볼류션 신경망 레이어의 활성화 특징과 주성분 분석을 이용한 이미지 분류 방법)

  • Byambajav, Batkhuu;Alikhanov, Jumabek;Fang, Yang;Ko, Seunghyun;Jo, Geun Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.205-225
    • /
    • 2018
  • Convolutional Neural Network (ConvNet) is one class of the powerful Deep Neural Network that can analyze and learn hierarchies of visual features. Originally, first neural network (Neocognitron) was introduced in the 80s. At that time, the neural network was not broadly used in both industry and academic field by cause of large-scale dataset shortage and low computational power. However, after a few decades later in 2012, Krizhevsky made a breakthrough on ILSVRC-12 visual recognition competition using Convolutional Neural Network. That breakthrough revived people interest in the neural network. The success of Convolutional Neural Network is achieved with two main factors. First of them is the emergence of advanced hardware (GPUs) for sufficient parallel computation. Second is the availability of large-scale datasets such as ImageNet (ILSVRC) dataset for training. Unfortunately, many new domains are bottlenecked by these factors. For most domains, it is difficult and requires lots of effort to gather large-scale dataset to train a ConvNet. Moreover, even if we have a large-scale dataset, training ConvNet from scratch is required expensive resource and time-consuming. These two obstacles can be solved by using transfer learning. Transfer learning is a method for transferring the knowledge from a source domain to new domain. There are two major Transfer learning cases. First one is ConvNet as fixed feature extractor, and the second one is Fine-tune the ConvNet on a new dataset. In the first case, using pre-trained ConvNet (such as on ImageNet) to compute feed-forward activations of the image into the ConvNet and extract activation features from specific layers. In the second case, replacing and retraining the ConvNet classifier on the new dataset, then fine-tune the weights of the pre-trained network with the backpropagation. In this paper, we focus on using multiple ConvNet layers as a fixed feature extractor only. However, applying features with high dimensional complexity that is directly extracted from multiple ConvNet layers is still a challenging problem. We observe that features extracted from multiple ConvNet layers address the different characteristics of the image which means better representation could be obtained by finding the optimal combination of multiple ConvNet layers. Based on that observation, we propose to employ multiple ConvNet layer representations for transfer learning instead of a single ConvNet layer representation. Overall, our primary pipeline has three steps. Firstly, images from target task are given as input to ConvNet, then that image will be feed-forwarded into pre-trained AlexNet, and the activation features from three fully connected convolutional layers are extracted. Secondly, activation features of three ConvNet layers are concatenated to obtain multiple ConvNet layers representation because it will gain more information about an image. When three fully connected layer features concatenated, the occurring image representation would have 9192 (4096+4096+1000) dimension features. However, features extracted from multiple ConvNet layers are redundant and noisy since they are extracted from the same ConvNet. Thus, a third step, we will use Principal Component Analysis (PCA) to select salient features before the training phase. When salient features are obtained, the classifier can classify image more accurately, and the performance of transfer learning can be improved. To evaluate proposed method, experiments are conducted in three standard datasets (Caltech-256, VOC07, and SUN397) to compare multiple ConvNet layer representations against single ConvNet layer representation by using PCA for feature selection and dimension reduction. Our experiments demonstrated the importance of feature selection for multiple ConvNet layer representation. Moreover, our proposed approach achieved 75.6% accuracy compared to 73.9% accuracy achieved by FC7 layer on the Caltech-256 dataset, 73.1% accuracy compared to 69.2% accuracy achieved by FC8 layer on the VOC07 dataset, 52.2% accuracy compared to 48.7% accuracy achieved by FC7 layer on the SUN397 dataset. We also showed that our proposed approach achieved superior performance, 2.8%, 2.1% and 3.1% accuracy improvement on Caltech-256, VOC07, and SUN397 dataset respectively compare to existing work.

Chemical Characterisation of Organic Functional Group Compositions in PM2.5 Collected at Nine Administrative Provinces in Northern Thailand during the Haze Episode in 2013

  • Pongpiachan, Siwatt;Choochuay, Chomsri;Chonchalar, Jittiphan;Kanchai, Panatda;Phonpiboon, Tidarat;Wongsuesat, Sornsawan;Chomkhae, Kanokwan;Kittikoon, Itthipon;Hiranyatrakul, Phoosak;Cao, Junji;Thamrongthanyawong, Sombat
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.14 no.6
    • /
    • pp.3653-3661
    • /
    • 2013
  • Along with rapid economic growth and enhanced agricultural productivity, particulate matter emissions in the northern cities of Thailand have been increasing for the past two decades. This trend is expected to continue in the coming decade. Emissions of particulate matter have brought about a series of public health concerns, particularly chronic respiratory diseases. It is well known that lung cancer incidence among northern Thai women is one of the highest in Asia (an annual age-adjusted incidence rate of 37.4 per 100,000). This fact has aroused serious concern among the public and the government and has drawn much attention and interest from the scientific community. To investigate the potential causes of this relatively high lung cancer incidence, this study employed Fourier transform infrared spectroscopy (FTIR) transmission spectroscopy to identify the chemical composition of the $PM_{2.5}$ collected using Quartz Fibre Filters (QFFs) coupled with MiniVol$^{TM}$ portable air samplers (Airmetrics). $PM_{2.5}$ samples collected in nine administrative provinces in northern Thailand before and after the "Haze Episode" in 2013 were categorised based on three-dimensional plots of a principal component analysis (PCA) with Varimax rotation. In addition, the incremental lifetime exposure to $PM_{2.5}$ of both genders was calculated, and the first derivative of the FTIR spectrum of individual samples is here discussed.

Marine Environments in the Neighborhood of the Narodo as the First Outbreak Region of Cochlodinium polykrikoides Blooms (Cochlodinium polykrikoides 적조의 최초발생해역인 나로도 주변 해역의 해양환경)

  • Lee, Moon-Ock;Moon, Jin-Han
    • Journal of the Korean Society for Marine Environment & Energy
    • /
    • v.11 no.3
    • /
    • pp.113-123
    • /
    • 2008
  • We have analyzed a long term data of marine environments, red tide information and meteorology acquired by NFRDI and KMA, in order to understand the characteristics of marine environments in the Narodo coastal waters which is known to be the first outbreak region of Cochlodinium polykrikoides blooms. During the period of from 1992 to 2007, Cochlodinium polykrikoides blooms have first occurred more often in August. However, the outbreak time of the blooms tended to be earlier annually, and in addition, the surface salinity also had a tendency to increase. Consequently, it suggested that there might be a relationship between the transition of the outbreak time of the blooms and salinity. On the other hand, insolation was relatively rich but precipitation was relatively scarce in Gohung Province, compared to Yeosu or Tongyeong, when Cochlodinium polykrikoides blooms first occur in Narodo coastal waters. Average water temperature and salinity in August in Narodo coastal waters were all higher than those in Gamak and Jinhae bays, suggesting that Narodo coastal waters are a region of relatively high water temperature and high salinity. Also, concentrations of nutrients and chlorophyll- a were significantly low than those in Jinhae Bay, which is known to be a eutrophicated region, while the overall water quality seemed to be similar to Gamak Bay. The results of PCA(Principal Component Analysis) proved that insolation and water temperature are the most important factors for the outbreak of Cochlodinium polykrikoides blooms in Narodo coastal waters while concentrations of COD and dissolved oxygen are secondly important. Furthermore, typhoons also appeared to be one of most important factors for the outbreak of Cochlodinium polykrikoides blooms.

  • PDF

Quality Comparison of M. longissimus from Crossbred Wild Boars, Korean Native Black Pigs and Modern Genotype Pigs during Refrigerated Storage (멧돼지 교잡종육, 재래 흑돼지육, 개량종 돼지육의 냉장저장중 품질비교)

  • Kang, S.M.;Lee, S.K.
    • Journal of Animal Science and Technology
    • /
    • v.49 no.2
    • /
    • pp.257-268
    • /
    • 2007
  • This study was carried out to investigate the quality comparison of M. longissimus from 4 crossbred wild boars(wild boar ♂×Duroc ♀, 113kg, 1 barrow and 3 gilts, CWB) reared outdoor, 5 Korean native black pigs(64kg, 5 barrows, KNP) and 5 modern genotype pigs(Landrace×Yorkshire×Duroc, 114kg, 5 barrows, MGP) reared indoor. The samples were stored at 2±0.2℃ for 12 days and utilized in the quality measurement. The moisture content was significantly higher in CWB than in KNP(p<0.05), however crude fat content was significantly lower in CWB than in KNP(p<0.05). The pH value of CWB was significantly lower than that of MGP during 12 days of storage(p<0.05). Therefore the CWB showed significantly lower water-holding capacity than MGP(p<0.05). The L*, a*, b* and C* values of CWB were significantly lower than those of KNP during 12 days of storage(p<0.05), however those of CWB were significantly higher than those of MGP after 3 and 6 days of storage(p<0.05). In fatty acid composition, the CWB had higher unsaturated fatty acid including linoleic acid, arachidonic acid and lower saturated fatty acid. However, the lipid oxidation of CWB was delayed during storage compared with KNP and MGP. The aroma patterns by principal component analysis(PCA) from electronic nose was discriminately different among 3 different pork at 0 and 12 day of storage.

Limnological Characteristics of the River-type Paltang Reservoir, Korea: Hydrological and Environmental Factors (하천형 저수지 팔당호의 육수학적 특성:수문과 수환경 요인)

  • Shin, Jae-Ki;Kang, Chang-Keun;Kim, Ho-Sub;Hwang, Soon-Jin
    • Korean Journal of Ecology and Environment
    • /
    • v.36 no.3 s.104
    • /
    • pp.242-256
    • /
    • 2003
  • This study aimed to determine the relationship between rainfall-discharge patterns and maior aquatic environmental factors in a river-type reservoir. Specifically, daily monitoring was conducted in Paltang Reservoir from January 1999 to December 2001. Observation of the daily changes of the environment factors showed that natural meteorological factors and hydrological factors causing the change of water discharge had a major effect on the aquatic environment. Rainfall was the main source of hydrological changes, with its frequency a possible direct variable governing the range of discharge changes. Rainfall was weak in November${\sim}$May and heavy in June${\sim}$October (heavist in summer). The range of water discharge was greatest during summer (July to September) and lowest during winter (January to February). A principal component analysis (PCA) showed that aquatic environmental factors could be classified into three different types in the pattern of annual variation. First, type I included water temperature, turbidity, water color and organic matter (COD), which increased with increasing water discharge. Second, type ll consisted of DO and pH, which decreased with increasing water discharge. Third, type III included conductivity, alkalinity and chloride ion, which showed middle values with increasing water discharge. Monthly variation of aquatic environments explained by the first two dimensions of the PCA suggests that aquatic environments of Paltang Reservoir may have annual cycle typical of river-type reservoirs depending on hydrological factor such as water discharge.