• Title/Summary/Keyword: 모델 앙상블

Search Result 308, Processing Time 0.027 seconds

Research on hybrid music recommendation system using metadata of music tracks and playlists (음악과 플레이리스트의 메타데이터를 활용한 하이브리드 음악 추천 시스템에 관한 연구)

  • Hyun Tae Lee;Gyoo Gun Lim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.145-165
    • /
    • 2023
  • Recommendation system plays a significant role on relieving difficulties of selecting information among rapidly increasing amount of information caused by the development of the Internet and on efficiently displaying information that fits individual personal interest. In particular, without the help of recommendation system, E-commerce and OTT companies cannot overcome the long-tail phenomenon, a phenomenon in which only popular products are consumed, as the number of products and contents are rapidly increasing. Therefore, the research on recommendation systems is being actively conducted to overcome the phenomenon and to provide information or contents that are aligned with users' individual interests, in order to induce customers to consume various products or contents. Usually, collaborative filtering which utilizes users' historical behavioral data shows better performance than contents-based filtering which utilizes users' preferred contents. However, collaborative filtering can suffer from cold-start problem which occurs when there is lack of users' historical behavioral data. In this paper, hybrid music recommendation system, which can solve cold-start problem, is proposed based on the playlist data of Melon music streaming service that is given by Kakao Arena for music playlist continuation competition. The goal of this research is to use music tracks, that are included in the playlists, and metadata of music tracks and playlists in order to predict other music tracks when the half or whole of the tracks are masked. Therefore, two different recommendation procedures were conducted depending on the two different situations. When music tracks are included in the playlist, LightFM is used in order to utilize the music track list of the playlists and metadata of each music tracks. Then, the result of Item2Vec model, which uses vector embeddings of music tracks, tags and titles for recommendation, is combined with the result of LightFM model to create final recommendation list. When there are no music tracks available in the playlists but only playlists' tags and titles are available, recommendation was made by finding similar playlists based on playlists vectors which was made by the aggregation of FastText pre-trained embedding vectors of tags and titles of each playlists. As a result, not only cold-start problem can be resolved, but also achieved better performance than ALS, BPR and Item2Vec by using the metadata of both music tracks and playlists. In addition, it was found that the LightFM model, which uses only artist information as an item feature, shows the best performance compared to other LightFM models which use other item features of music tracks.

A Korean Community-based Question Answering System Using Multiple Machine Learning Methods (다중 기계학습 방법을 이용한 한국어 커뮤니티 기반 질의-응답 시스템)

  • Kwon, Sunjae;Kim, Juae;Kang, Sangwoo;Seo, Jungyun
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1085-1093
    • /
    • 2016
  • Community-based Question Answering system is a system which provides answers for each question from the documents uploaded on web communities. In order to enhance the capacity of question analysis, former methods have developed specific rules suitable for a target region or have applied machine learning to partial processes. However, these methods incur an excessive cost for expanding fields or lead to cases in which system is overfitted for a specific field. This paper proposes a multiple machine learning method which automates the overall process by adapting appropriate machine learning in each procedure for efficient processing of community-based Question Answering system. This system can be divided into question analysis part and answer selection part. The question analysis part consists of the question focus extractor, which analyzes the focused phrases in questions and uses conditional random fields, and the question type classifier, which classifies topics of questions and uses support vector machine. In the answer selection part, the we trains weights that are used by the similarity estimation models through an artificial neural network. Also these are a number of cases in which the results of morphological analysis are not reliable for the data uploaded on web communities. Therefore, we suggest a method that minimizes the impact of morphological analysis by using character features in the stage of question analysis. The proposed system outperforms the former system by showing a Mean Average Precision criteria of 0.765 and R-Precision criteria of 0.872.

Face Recognition based on Hybrid Classifiers with Virtual Samples (가상 데이터와 융합 분류기에 기반한 얼굴인식)

  • 류연식;오세영
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.40 no.1
    • /
    • pp.19-29
    • /
    • 2003
  • This paper presents a novel hybrid classifier for face recognition with artificially generated virtual training samples. We utilize both the nearest neighbor approach in feature angle space and a connectionist model to obtain a synergy effect by combining the results of two heterogeneous classifiers. First, a classifier called the nearest feature angle (NFA), based on angular information, finds the most similar feature to the query from a given training set. Second, a classifier has been developed based on the recall of stored frontal projection of the query feature. It uses a frontal recall network (FRN) that finds the most similar frontal one among the stored frontal feature set. For FRN, we used an ensemble neural network consisting of multiple multiplayer perceptrons (MLPs), each of which is trained independently to enhance generalization capability. Further, both classifiers used the virtual training set generated adaptively, according to the spatial distribution of each person's training samples. Finally, the results of the two classifiers are combined to comprise the best matching class, and a corresponding similarit measure is used to make the final decision. The proposed classifier achieved an average classification rate of 96.33% against a large group of different test sets of images, and its average error rate is 61.5% that of the nearest feature line (NFL) method, and achieves a more robust classification performance.

Assessment of ECMWF's seasonal weather forecasting skill and Its applicability across South Korean catchments (ECMWF 계절 기상 전망 기술의 정확성 및 국내 유역단위 적용성 평가)

  • Lee, Yong Shin;Kang, Shin Uk
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.9
    • /
    • pp.529-541
    • /
    • 2023
  • Due to the growing concern over forecasting extreme weather events such as droughts caused by climate change, there has been a rising interest in seasonal meteorological forecasts that offer ensemble predictions for the upcoming seven months. Nonetheless, limited research has been conducted in South Korea, particularly in assessing their effectiveness at the catchment-scale. In this study, we assessed the accuracy of ECMWF's seasonal forecasts (including precipitation, temperature, and evapotranspiration) for the period of 2011 to 2020. We focused on 12 multi-purpose reservoir catchments and compared the forecasts to climatology data. Continuous Ranked Probability Skill Score method is adopted to assess the forecast skill, and the linear scaling method was applied to evaluate its impact. The results showed that while the seasonal meteorological forecasts have similar skill to climatology for one month ahead, the skill decreased significantly as the forecast lead time increased. Compared to the climatology, better results were obtained in the Wet season than the Dry season. In particular, during the Wet seasons of the dry years (2015, 2017), the seasonal meteorological forecasts showed the highest skill for all lead times.

Assessing habitat suitability for timber species in South Korea under SSP scenarios (SSP 시나리오에 따른 국내 용재수종의 서식지 적합도 평가)

  • Hyeon-Gwan Ahn;Chul-Hee Lim
    • Korean Journal of Environmental Biology
    • /
    • v.40 no.4
    • /
    • pp.567-578
    • /
    • 2022
  • Various social and environmental problems have recently emerged due to global climate change. In South Korea, coniferous forests in the highlands are decreasing due to climate change whereas the distribution of subtropical species is gradually increasing. This study aims to respond to changes in the distribution of forest species in South Korea due to climate change. This study predicts changes in future suitable areas for Pinus koraiensis, Cryptomeria japonica, and Chamaecyparis obtusa cultivated as timber species based on climate, topography, and environment. Appearance coordinates were collected only for natural forests in consideration of climate suitability in the National Forest Inventory. Future climate data used the SSP scenario by KMA. Species distribution models were ensembled to predict future suitable habitat areas for the base year(2000-2019), near future(2041-2060), and distant future(2081-2100). In the baseline period, the highly suitable habitat for Pinus koraiensis accounted for approximately 13.87% of the country. However, in the distant future(2081-2100), it decreased to approximately 0.11% under SSP5-8.5. For Cryptomeria japonica, the habitat for the base year was approximately 7.08%. It increased to approximately 18.21% under SSP5-8.5 in the distant future. In the case of Chamaecyparis obtusa, the habitat for the base year was approximately 19.32%. It increased to approximately 90.93% under SSP5-8.5 in the distant future. Pinus koraiensis, which had been planted nationwide, gradually moved north due to climate change with suitable habitats in South Korea decreased significantly. After the near future, Pinus koraiensis was not suitable for the afforestation as timber species in South Korea. Chamaecyparis obtusa can be replaced in most areas. In the case of Cryptomeria japonica, it was assessed that it could replace part of the south and central region.

Data-Driven Technology Portfolio Analysis for Commercialization of Public R&D Outcomes: Case Study of Big Data and Artificial Intelligence Fields (공공연구성과 실용화를 위한 데이터 기반의 기술 포트폴리오 분석: 빅데이터 및 인공지능 분야를 중심으로)

  • Eunji Jeon;Chae Won Lee;Jea-Tek Ryu
    • The Journal of Bigdata
    • /
    • v.6 no.2
    • /
    • pp.71-84
    • /
    • 2021
  • Since small and medium-sized enterprises fell short of the securement of technological competitiveness in the field of big data and artificial intelligence (AI) field-core technologies of the Fourth Industrial Revolution, it is important to strengthen the competitiveness of the overall industry through technology commercialization. In this study, we aimed to propose a priority related to technology transfer and commercialization for practical use of public research results. We utilized public research performance information, improving missing values of 6T classification by deep learning model with an ensemble method. Then, we conducted topic modeling to derive the converging fields of big data and AI. We classified the technology fields into four different segments in the technology portfolio based on technology activity and technology efficiency, estimating the potential of technology commercialization for those fields. We proposed a priority of technology commercialization for 10 detailed technology fields that require long-term investment. Through systematic analysis, active utilization of technology, and efficient technology transfer and commercialization can be promoted.

Ensemble Projection of Climate Suitability for Alfalfa (Medicago Sativa L.) in Hamkyongbukdo (함경북도 내 미래 알팔파 재배의 기후적합도 앙상블 전망)

  • Hyun Seung Min;Hyun Shinwoo;Kim Kwang Soo
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.44 no.2
    • /
    • pp.71-82
    • /
    • 2024
  • It would be advantageous to grow legume forage crops in order to increase the productivity and sustainability of sloped croplands in Hamkyongbukdo. In particular, the identification of potential cultivation areas for alfalfa in the given region could aid decision-making on policies and management related to forage crop production in the future. This study aimed to analyze the climate suitability of alfalfa in Hamkyongbukdo under current and future climate conditions using the Fuzzy Union model. The climate suitability predicted by the Fuzzy Union model was compared with the actual alfalfa cultivation area in the northern United States. Climate data obtained from 11 global climate models were used as input data for calculation of climate suitability in the study region to examine the uncertainty of projections under future climate conditions. The area where the climate suitability index was greater than a threshold value (22.6) explained about 44% of the variation in actual alfalfa cultivation areas by state in the northern United States. The climatic suitability of alfalfa was projected to decrease in most areas of Hamkyongbukdo under future climate scenarios. The climatic suitability in Onseong and Gyeongwon County was analyzed to be over 88 in the current climate conditions. However, it was projected to decrease by about 66% in the given areas by the 2090s. Our study illustrated that the impact of climate change on suitable cultivation areas was highly variable when different climate data were used as inputs to the Fuzzy Union model. Still, the ensemble of the climate suitability projections for alfalfa was projected to decrease considerably due to summer depression in Hamkyongbukdo. It would be advantageous to predict suitable cultivation areas by adding soil conditions or to predict the climate suitability of other leguminous crops such as hairy vetch, which merits further studies.

A Real-Time Stock Market Prediction Using Knowledge Accumulation (지식 누적을 이용한 실시간 주식시장 예측)

  • Kim, Jin-Hwa;Hong, Kwang-Hun;Min, Jin-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.109-130
    • /
    • 2011
  • One of the major problems in the area of data mining is the size of the data, as most data set has huge volume these days. Streams of data are normally accumulated into data storages or databases. Transactions in internet, mobile devices and ubiquitous environment produce streams of data continuously. Some data set are just buried un-used inside huge data storage due to its huge size. Some data set is quickly lost as soon as it is created as it is not saved due to many reasons. How to use this large size data and to use data on stream efficiently are challenging questions in the study of data mining. Stream data is a data set that is accumulated to the data storage from a data source continuously. The size of this data set, in many cases, becomes increasingly large over time. To mine information from this massive data, it takes too many resources such as storage, money and time. These unique characteristics of the stream data make it difficult and expensive to store all the stream data sets accumulated over time. Otherwise, if one uses only recent or partial of data to mine information or pattern, there can be losses of valuable information, which can be useful. To avoid these problems, this study suggests a method efficiently accumulates information or patterns in the form of rule set over time. A rule set is mined from a data set in stream and this rule set is accumulated into a master rule set storage, which is also a model for real-time decision making. One of the main advantages of this method is that it takes much smaller storage space compared to the traditional method, which saves the whole data set. Another advantage of using this method is that the accumulated rule set is used as a prediction model. Prompt response to the request from users is possible anytime as the rule set is ready anytime to be used to make decisions. This makes real-time decision making possible, which is the greatest advantage of this method. Based on theories of ensemble approaches, combination of many different models can produce better prediction model in performance. The consolidated rule set actually covers all the data set while the traditional sampling approach only covers part of the whole data set. This study uses a stock market data that has a heterogeneous data set as the characteristic of data varies over time. The indexes in stock market data can fluctuate in different situations whenever there is an event influencing the stock market index. Therefore the variance of the values in each variable is large compared to that of the homogeneous data set. Prediction with heterogeneous data set is naturally much more difficult, compared to that of homogeneous data set as it is more difficult to predict in unpredictable situation. This study tests two general mining approaches and compare prediction performances of these two suggested methods with the method we suggest in this study. The first approach is inducing a rule set from the recent data set to predict new data set. The seocnd one is inducing a rule set from all the data which have been accumulated from the beginning every time one has to predict new data set. We found neither of these two is as good as the method of accumulated rule set in its performance. Furthermore, the study shows experiments with different prediction models. The first approach is building a prediction model only with more important rule sets and the second approach is the method using all the rule sets by assigning weights on the rules based on their performance. The second approach shows better performance compared to the first one. The experiments also show that the suggested method in this study can be an efficient approach for mining information and pattern with stream data. This method has a limitation of bounding its application to stock market data. More dynamic real-time steam data set is desirable for the application of this method. There is also another problem in this study. When the number of rules is increasing over time, it has to manage special rules such as redundant rules or conflicting rules efficiently.