• Title/Summary/Keyword: Matrix Factorization

Search Result 305, Processing Time 0.02 seconds

Abbreviation Disambiguation using Topic Modeling (토픽모델링을 이용한 약어 중의성 해소)

  • Woon-Kyo Lee;Ja-Hee Kim;Junki Yang
    • Journal of the Korea Society for Simulation
    • /
    • v.32 no.1
    • /
    • pp.35-44
    • /
    • 2023
  • In recent, there are many research cases that analyze trends or research trends with text analysis. When collecting documents by searching for keywords in abbreviations for data analysis, it is necessary to disambiguate abbreviations. In many studies, documents are classified by hand-work reading the data one by one to find the data necessary for the study. Most of the studies to disambiguate abbreviations are studies that clarify the meaning of words and use supervised learning. The previous method to disambiguate abbreviation is not suitable for classification studies of documents looking for research data from abbreviation search documents, and related studies are also insufficient. This paper proposes a method of semi-automatically classifying documents collected by abbreviations by going topic modeling with Non-Negative Matrix Factorization, an unsupervised learning method, in the data pre-processing step. To verify the proposed method, papers were collected from academic DB with the abbreviation 'MSA'. The proposed method found 316 papers related to Micro Services Architecture in 1,401 papers. The document classification accuracy of the proposed method was measured at 92.36%. It is expected that the proposed method can reduce the researcher's time and cost due to hand work.

A Study on the Derivation of Port Safety Risk Factors Using by Topic Modeling (토픽모델링을 활용한 항만안전 위험요인 도출에 관한 연구)

  • Lee Jeong-Min;Kim Yul-Seong
    • Journal of Korea Port Economic Association
    • /
    • v.39 no.2
    • /
    • pp.59-76
    • /
    • 2023
  • In this study, we tried to find out port safety from various perspectives through news data that can be easily accessed by the general public and domestic academic journal data that reflects the insights of port researchers. Non-negative Matrix Factorization(NMF) based topic modeling was conducted using Python to derive the main topics for each data, and then semantic analysis was conducted for each topic. The news data mainly derived natural and environmental factors among port safety risk factors, and the academic journal data derived security factors, mechanical factors, human factors, environmental factors, and natural factors. Through this, the need for strategies to strengthen the safety of domestic ports, such as strengthening the resilience of port safety, improve safety awareness to broaden the public's view of port safety, and conduct research to develop the port industry environment into a safe and specialized mature port. As a result, this study identified the main factors to be improved and provided basic data to develop into a mature port with a port safety culture.

How to improve the accuracy of recommendation systems: Combining ratings and review texts sentiment scores (평점과 리뷰 텍스트 감성분석을 결합한 추천시스템 향상 방안 연구)

  • Hyun, Jiyeon;Ryu, Sangyi;Lee, Sang-Yong Tom
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.219-239
    • /
    • 2019
  • As the importance of providing customized services to individuals becomes important, researches on personalized recommendation systems are constantly being carried out. Collaborative filtering is one of the most popular systems in academia and industry. However, there exists limitation in a sense that recommendations were mostly based on quantitative information such as users' ratings, which made the accuracy be lowered. To solve these problems, many studies have been actively attempted to improve the performance of the recommendation system by using other information besides the quantitative information. Good examples are the usages of the sentiment analysis on customer review text data. Nevertheless, the existing research has not directly combined the results of the sentiment analysis and quantitative rating scores in the recommendation system. Therefore, this study aims to reflect the sentiments shown in the reviews into the rating scores. In other words, we propose a new algorithm that can directly convert the user 's own review into the empirically quantitative information and reflect it directly to the recommendation system. To do this, we needed to quantify users' reviews, which were originally qualitative information. In this study, sentiment score was calculated through sentiment analysis technique of text mining. The data was targeted for movie review. Based on the data, a domain specific sentiment dictionary is constructed for the movie reviews. Regression analysis was used as a method to construct sentiment dictionary. Each positive / negative dictionary was constructed using Lasso regression, Ridge regression, and ElasticNet methods. Based on this constructed sentiment dictionary, the accuracy was verified through confusion matrix. The accuracy of the Lasso based dictionary was 70%, the accuracy of the Ridge based dictionary was 79%, and that of the ElasticNet (${\alpha}=0.3$) was 83%. Therefore, in this study, the sentiment score of the review is calculated based on the dictionary of the ElasticNet method. It was combined with a rating to create a new rating. In this paper, we show that the collaborative filtering that reflects sentiment scores of user review is superior to the traditional method that only considers the existing rating. In order to show that the proposed algorithm is based on memory-based user collaboration filtering, item-based collaborative filtering and model based matrix factorization SVD, and SVD ++. Based on the above algorithm, the mean absolute error (MAE) and the root mean square error (RMSE) are calculated to evaluate the recommendation system with a score that combines sentiment scores with a system that only considers scores. When the evaluation index was MAE, it was improved by 0.059 for UBCF, 0.0862 for IBCF, 0.1012 for SVD and 0.188 for SVD ++. When the evaluation index is RMSE, UBCF is 0.0431, IBCF is 0.0882, SVD is 0.1103, and SVD ++ is 0.1756. As a result, it can be seen that the prediction performance of the evaluation point reflecting the sentiment score proposed in this paper is superior to that of the conventional evaluation method. In other words, in this paper, it is confirmed that the collaborative filtering that reflects the sentiment score of the user review shows superior accuracy as compared with the conventional type of collaborative filtering that only considers the quantitative score. We then attempted paired t-test validation to ensure that the proposed model was a better approach and concluded that the proposed model is better. In this study, to overcome limitations of previous researches that judge user's sentiment only by quantitative rating score, the review was numerically calculated and a user's opinion was more refined and considered into the recommendation system to improve the accuracy. The findings of this study have managerial implications to recommendation system developers who need to consider both quantitative information and qualitative information it is expect. The way of constructing the combined system in this paper might be directly used by the developers.

Chemical Characteristics of PM1 using Aerosol Mass Spectrometer at Baengnyeong Island and Seoul Metropolitan Area (백령도 및 서울 대기오염집중측정소 에어로졸 질량 분석기 자료를 이용한 대기 중 에어로졸 화학적 특성 연구)

  • Park, Taehyun;Ban, Jihee;Kang, Seokwon;Ghim, Young Sung;Shin, Hye-Jung;Park, Jong Sung;Park, Seung Myung;Moon, Kwang Joo;Lim, Yong-Jae;Lee, Min-Do;Lee, Sang-Bo;Kim, Jeongsoo;Kim, Soon Tae;Bae, Chang Han;Lee, Yonghwan;Lee, Taehyoung
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.34 no.3
    • /
    • pp.430-446
    • /
    • 2018
  • To improve understanding of the sources and chemical properties of particulate pollutants on the Korean Peninsula, An Aerodyne High Resolution Time of Flight Aerosol Mass Spectrometer (HR-ToF-AMS) measured non-refractory fine particle ($NR-PM_1$) from 2013 to 2015 at Baengnyeong Island and Seoul metropolitan area (SMA), Korea. The chemical composition of $NR-PM_1$ in Baengnyeong island was dominated by organics and sulfate in the range of 36~38% for 3 years, and the organics were the dominant species in the range of 44~55% of $NR-PM_1$ in Seoul metropolitan area. The sulfate was found to be more than 85% of the anthropogenic origin in the both areas of Baengnyeong and SMA. Ratio of gas to particle partition of sulfate and nitrate were observed in both areas as more than 0.6 and 0.8, respectively, representing potential for formation of additional particulate sulfate and nitrate. The high-resolution spectra of organic aerosol (OA) were separated by three factors which were Primary OA(POA), Semi-Volatility Oxygenated Organic Aerosol (SV-OOA), and Low-Volatility OOA(LV-OOA) using positive matrix factorization (PMF) analysis. The fraction of oxygenated OA (SOA, ${\fallingdotseq}OOA$=SV-OOA+LV-OOA) was bigger than the fraction of POA in $NR-PM_1$. The POA fraction of OA in Seoul is higher than it of Baengnyeong Island, because Seoul has a relatively large number of primary pollutants, such as gasoline or diesel vehicle, factories, energy facilities. Potential source contribution function (PSCF) analysis revealed that transport from eastern China, an industrial area with high emissions, was associated with high particulate sulfate and organic concentrations at the Baengnyeong and SMA sites. PSCF also presents that the ship emissions on the Yellow Sea was associated with high particulate sulfate concentrations at the measurement sites.

Research on hybrid music recommendation system using metadata of music tracks and playlists (음악과 플레이리스트의 메타데이터를 활용한 하이브리드 음악 추천 시스템에 관한 연구)

  • Hyun Tae Lee;Gyoo Gun Lim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.145-165
    • /
    • 2023
  • Recommendation system plays a significant role on relieving difficulties of selecting information among rapidly increasing amount of information caused by the development of the Internet and on efficiently displaying information that fits individual personal interest. In particular, without the help of recommendation system, E-commerce and OTT companies cannot overcome the long-tail phenomenon, a phenomenon in which only popular products are consumed, as the number of products and contents are rapidly increasing. Therefore, the research on recommendation systems is being actively conducted to overcome the phenomenon and to provide information or contents that are aligned with users' individual interests, in order to induce customers to consume various products or contents. Usually, collaborative filtering which utilizes users' historical behavioral data shows better performance than contents-based filtering which utilizes users' preferred contents. However, collaborative filtering can suffer from cold-start problem which occurs when there is lack of users' historical behavioral data. In this paper, hybrid music recommendation system, which can solve cold-start problem, is proposed based on the playlist data of Melon music streaming service that is given by Kakao Arena for music playlist continuation competition. The goal of this research is to use music tracks, that are included in the playlists, and metadata of music tracks and playlists in order to predict other music tracks when the half or whole of the tracks are masked. Therefore, two different recommendation procedures were conducted depending on the two different situations. When music tracks are included in the playlist, LightFM is used in order to utilize the music track list of the playlists and metadata of each music tracks. Then, the result of Item2Vec model, which uses vector embeddings of music tracks, tags and titles for recommendation, is combined with the result of LightFM model to create final recommendation list. When there are no music tracks available in the playlists but only playlists' tags and titles are available, recommendation was made by finding similar playlists based on playlists vectors which was made by the aggregation of FastText pre-trained embedding vectors of tags and titles of each playlists. As a result, not only cold-start problem can be resolved, but also achieved better performance than ALS, BPR and Item2Vec by using the metadata of both music tracks and playlists. In addition, it was found that the LightFM model, which uses only artist information as an item feature, shows the best performance compared to other LightFM models which use other item features of music tracks.