• Title/Summary/Keyword: vector data

Search Result 3,288, Processing Time 0.028 seconds

Research on hybrid music recommendation system using metadata of music tracks and playlists (음악과 플레이리스트의 메타데이터를 활용한 하이브리드 음악 추천 시스템에 관한 연구)

  • Hyun Tae Lee;Gyoo Gun Lim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.145-165
    • /
    • 2023
  • Recommendation system plays a significant role on relieving difficulties of selecting information among rapidly increasing amount of information caused by the development of the Internet and on efficiently displaying information that fits individual personal interest. In particular, without the help of recommendation system, E-commerce and OTT companies cannot overcome the long-tail phenomenon, a phenomenon in which only popular products are consumed, as the number of products and contents are rapidly increasing. Therefore, the research on recommendation systems is being actively conducted to overcome the phenomenon and to provide information or contents that are aligned with users' individual interests, in order to induce customers to consume various products or contents. Usually, collaborative filtering which utilizes users' historical behavioral data shows better performance than contents-based filtering which utilizes users' preferred contents. However, collaborative filtering can suffer from cold-start problem which occurs when there is lack of users' historical behavioral data. In this paper, hybrid music recommendation system, which can solve cold-start problem, is proposed based on the playlist data of Melon music streaming service that is given by Kakao Arena for music playlist continuation competition. The goal of this research is to use music tracks, that are included in the playlists, and metadata of music tracks and playlists in order to predict other music tracks when the half or whole of the tracks are masked. Therefore, two different recommendation procedures were conducted depending on the two different situations. When music tracks are included in the playlist, LightFM is used in order to utilize the music track list of the playlists and metadata of each music tracks. Then, the result of Item2Vec model, which uses vector embeddings of music tracks, tags and titles for recommendation, is combined with the result of LightFM model to create final recommendation list. When there are no music tracks available in the playlists but only playlists' tags and titles are available, recommendation was made by finding similar playlists based on playlists vectors which was made by the aggregation of FastText pre-trained embedding vectors of tags and titles of each playlists. As a result, not only cold-start problem can be resolved, but also achieved better performance than ALS, BPR and Item2Vec by using the metadata of both music tracks and playlists. In addition, it was found that the LightFM model, which uses only artist information as an item feature, shows the best performance compared to other LightFM models which use other item features of music tracks.

Robo-Advisor Algorithm with Intelligent View Model (지능형 전망모형을 결합한 로보어드바이저 알고리즘)

  • Kim, Sunwoong
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.39-55
    • /
    • 2019
  • Recently banks and large financial institutions have introduced lots of Robo-Advisor products. Robo-Advisor is a Robot to produce the optimal asset allocation portfolio for investors by using the financial engineering algorithms without any human intervention. Since the first introduction in Wall Street in 2008, the market size has grown to 60 billion dollars and is expected to expand to 2,000 billion dollars by 2020. Since Robo-Advisor algorithms suggest asset allocation output to investors, mathematical or statistical asset allocation strategies are applied. Mean variance optimization model developed by Markowitz is the typical asset allocation model. The model is a simple but quite intuitive portfolio strategy. For example, assets are allocated in order to minimize the risk on the portfolio while maximizing the expected return on the portfolio using optimization techniques. Despite its theoretical background, both academics and practitioners find that the standard mean variance optimization portfolio is very sensitive to the expected returns calculated by past price data. Corner solutions are often found to be allocated only to a few assets. The Black-Litterman Optimization model overcomes these problems by choosing a neutral Capital Asset Pricing Model equilibrium point. Implied equilibrium returns of each asset are derived from equilibrium market portfolio through reverse optimization. The Black-Litterman model uses a Bayesian approach to combine the subjective views on the price forecast of one or more assets with implied equilibrium returns, resulting a new estimates of risk and expected returns. These new estimates can produce optimal portfolio by the well-known Markowitz mean-variance optimization algorithm. If the investor does not have any views on his asset classes, the Black-Litterman optimization model produce the same portfolio as the market portfolio. What if the subjective views are incorrect? A survey on reports of stocks performance recommended by securities analysts show very poor results. Therefore the incorrect views combined with implied equilibrium returns may produce very poor portfolio output to the Black-Litterman model users. This paper suggests an objective investor views model based on Support Vector Machines(SVM), which have showed good performance results in stock price forecasting. SVM is a discriminative classifier defined by a separating hyper plane. The linear, radial basis and polynomial kernel functions are used to learn the hyper planes. Input variables for the SVM are returns, standard deviations, Stochastics %K and price parity degree for each asset class. SVM output returns expected stock price movements and their probabilities, which are used as input variables in the intelligent views model. The stock price movements are categorized by three phases; down, neutral and up. The expected stock returns make P matrix and their probability results are used in Q matrix. Implied equilibrium returns vector is combined with the intelligent views matrix, resulting the Black-Litterman optimal portfolio. For comparisons, Markowitz mean-variance optimization model and risk parity model are used. The value weighted market portfolio and equal weighted market portfolio are used as benchmark indexes. We collect the 8 KOSPI 200 sector indexes from January 2008 to December 2018 including 132 monthly index values. Training period is from 2008 to 2015 and testing period is from 2016 to 2018. Our suggested intelligent view model combined with implied equilibrium returns produced the optimal Black-Litterman portfolio. The out of sample period portfolio showed better performance compared with the well-known Markowitz mean-variance optimization portfolio, risk parity portfolio and market portfolio. The total return from 3 year-period Black-Litterman portfolio records 6.4%, which is the highest value. The maximum draw down is -20.8%, which is also the lowest value. Sharpe Ratio shows the highest value, 0.17. It measures the return to risk ratio. Overall, our suggested view model shows the possibility of replacing subjective analysts's views with objective view model for practitioners to apply the Robo-Advisor asset allocation algorithms in the real trading fields.

Development of New Variables Affecting Movie Success and Prediction of Weekly Box Office Using Them Based on Machine Learning (영화 흥행에 영향을 미치는 새로운 변수 개발과 이를 이용한 머신러닝 기반의 주간 박스오피스 예측)

  • Song, Junga;Choi, Keunho;Kim, Gunwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.67-83
    • /
    • 2018
  • The Korean film industry with significant increase every year exceeded the number of cumulative audiences of 200 million people in 2013 finally. However, starting from 2015 the Korean film industry entered a period of low growth and experienced a negative growth after all in 2016. To overcome such difficulty, stakeholders like production company, distribution company, multiplex have attempted to maximize the market returns using strategies of predicting change of market and of responding to such market change immediately. Since a film is classified as one of experiential products, it is not easy to predict a box office record and the initial number of audiences before the film is released. And also, the number of audiences fluctuates with a variety of factors after the film is released. So, the production company and distribution company try to be guaranteed the number of screens at the opining time of a newly released by multiplex chains. However, the multiplex chains tend to open the screening schedule during only a week and then determine the number of screening of the forthcoming week based on the box office record and the evaluation of audiences. Many previous researches have conducted to deal with the prediction of box office records of films. In the early stage, the researches attempted to identify factors affecting the box office record. And nowadays, many studies have tried to apply various analytic techniques to the factors identified previously in order to improve the accuracy of prediction and to explain the effect of each factor instead of identifying new factors affecting the box office record. However, most of previous researches have limitations in that they used the total number of audiences from the opening to the end as a target variable, and this makes it difficult to predict and respond to the demand of market which changes dynamically. Therefore, the purpose of this study is to predict the weekly number of audiences of a newly released film so that the stakeholder can flexibly and elastically respond to the change of the number of audiences in the film. To that end, we considered the factors used in the previous studies affecting box office and developed new factors not used in previous studies such as the order of opening of movies, dynamics of sales. Along with the comprehensive factors, we used the machine learning method such as Random Forest, Multi Layer Perception, Support Vector Machine, and Naive Bays, to predict the number of cumulative visitors from the first week after a film release to the third week. At the point of the first and the second week, we predicted the cumulative number of visitors of the forthcoming week for a released film. And at the point of the third week, we predict the total number of visitors of the film. In addition, we predicted the total number of cumulative visitors also at the point of the both first week and second week using the same factors. As a result, we found the accuracy of predicting the number of visitors at the forthcoming week was higher than that of predicting the total number of them in all of three weeks, and also the accuracy of the Random Forest was the highest among the machine learning methods we used. This study has implications in that this study 1) considered various factors comprehensively which affect the box office record and merely addressed by other previous researches such as the weekly rating of audiences after release, the weekly rank of the film after release, and the weekly sales share after release, and 2) tried to predict and respond to the demand of market which changes dynamically by suggesting models which predicts the weekly number of audiences of newly released films so that the stakeholders can flexibly and elastically respond to the change of the number of audiences in the film.

Development of Beauty Experience Pattern Map Based on Consumer Emotions: Focusing on Cosmetics (소비자 감성 기반 뷰티 경험 패턴 맵 개발: 화장품을 중심으로)

  • Seo, Bong-Goon;Kim, Keon-Woo;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.179-196
    • /
    • 2019
  • Recently, the "Smart Consumer" has been emerging. He or she is increasingly inclined to search for and purchase products by taking into account personal judgment or expert reviews rather than by relying on information delivered through manufacturers' advertising. This is especially true when purchasing cosmetics. Because cosmetics act directly on the skin, consumers respond seriously to dangerous chemical elements they contain or to skin problems they may cause. Above all, cosmetics should fit well with the purchaser's skin type. In addition, changes in global cosmetics consumer trends make it necessary to study this field. The desire to find one's own individualized cosmetics is being revealed to consumers around the world and is known as "Finding the Holy Grail." Many consumers show a deep interest in customized cosmetics with the cultural boom known as "K-Beauty" (an aspect of "Han-Ryu"), the growth of personal grooming, and the emergence of "self-culture" that includes "self-beauty" and "self-interior." These trends have led to the explosive popularity of cosmetics made in Korea in the Chinese and Southeast Asian markets. In order to meet the customized cosmetics needs of consumers, cosmetics manufacturers and related companies are responding by concentrating on delivering premium services through the convergence of ICT(Information, Communication and Technology). Despite the evolution of companies' responses regarding market trends toward customized cosmetics, there is no "Intelligent Data Platform" that deals holistically with consumers' skin condition experience and thus attaches emotions to products and services. To find the Holy Grail of customized cosmetics, it is important to acquire and analyze consumer data on what they want in order to address their experiences and emotions. The emotions consumers are addressing when purchasing cosmetics varies by their age, sex, skin type, and specific skin issues and influences what price is considered reasonable. Therefore, it is necessary to classify emotions regarding cosmetics by individual consumer. Because of its importance, consumer emotion analysis has been used for both services and products. Given the trends identified above, we judge that consumer emotion analysis can be used in our study. Therefore, we collected and indexed data on consumers' emotions regarding their cosmetics experiences focusing on consumers' language. We crawled the cosmetics emotion data from SNS (blog and Twitter) according to sales ranking ($1^{st}$ to $99^{th}$), focusing on the ample/serum category. A total of 357 emotional adjectives were collected, and we combined and abstracted similar or duplicate emotional adjectives. We conducted a "Consumer Sentiment Journey" workshop to build a "Consumer Sentiment Dictionary," and this resulted in a total of 76 emotional adjectives regarding cosmetics consumer experience. Using these 76 emotional adjectives, we performed clustering with the Self-Organizing Map (SOM) method. As a result of the analysis, we derived eight final clusters of cosmetics consumer sentiments. Using the vector values of each node for each cluster, the characteristics of each cluster were derived based on the top ten most frequently appearing consumer sentiments. Different characteristics were found in consumer sentiments in each cluster. We also developed a cosmetics experience pattern map. The study results confirmed that recommendation and classification systems that consider consumer emotions and sentiments are needed because each consumer differs in what he or she pursues and prefers. Furthermore, this study reaffirms that the application of emotion and sentiment analysis can be extended to various fields other than cosmetics, and it implies that consumer insights can be derived using these methods. They can be used not only to build a specialized sentiment dictionary using scientific processes and "Design Thinking Methodology," but we also expect that these methods can help us to understand consumers' psychological reactions and cognitive behaviors. If this study is further developed, we believe that it will be able to provide solutions based on consumer experience, and therefore that it can be developed as an aspect of marketing intelligence.

Index-based Searching on Timestamped Event Sequences (타임스탬프를 갖는 이벤트 시퀀스의 인덱스 기반 검색)

  • 박상현;원정임;윤지희;김상욱
    • Journal of KIISE:Databases
    • /
    • v.31 no.5
    • /
    • pp.468-478
    • /
    • 2004
  • It is essential in various application areas of data mining and bioinformatics to effectively retrieve the occurrences of interesting patterns from sequence databases. For example, let's consider a network event management system that records the types and timestamp values of events occurred in a specific network component(ex. router). The typical query to find out the temporal casual relationships among the network events is as fellows: 'Find all occurrences of CiscoDCDLinkUp that are fellowed by MLMStatusUP that are subsequently followed by TCPConnectionClose, under the constraint that the interval between the first two events is not larger than 20 seconds, and the interval between the first and third events is not larger than 40 secondsTCPConnectionClose. This paper proposes an indexing method that enables to efficiently answer such a query. Unlike the previous methods that rely on inefficient sequential scan methods or data structures not easily supported by DBMSs, the proposed method uses a multi-dimensional spatial index, which is proven to be efficient both in storage and search, to find the answers quickly without false dismissals. Given a sliding window W, the input to a multi-dimensional spatial index is a n-dimensional vector whose i-th element is the interval between the first event of W and the first occurrence of the event type Ei in W. Here, n is the number of event types that can be occurred in the system of interest. The problem of‘dimensionality curse’may happen when n is large. Therefore, we use the dimension selection or event type grouping to avoid this problem. The experimental results reveal that our proposed technique can be a few orders of magnitude faster than the sequential scan and ISO-Depth index methods.hods.

Accuracy Analysis of ADCP Stationary Discharge Measurement for Unmeasured Regions (ADCP 정지법 측정 시 미계측 영역의 유량 산정 정확도 분석)

  • Kim, Jongmin;Kim, Seojun;Son, Geunsoo;Kim, Dongsu
    • Journal of Korea Water Resources Association
    • /
    • v.48 no.7
    • /
    • pp.553-566
    • /
    • 2015
  • Acoustic Doppler Current Profilers(ADCPs) have capability to concurrently capitalize three-dimensional velocity vector and bathymetry with highly efficient and rapid manner, and thereby enabling ADCPs to document the hydrodynamic and morphologic data in very high spatial and temporal resolution better than other contemporary instruments. However, ADCPs are also limited in terms of the inevitable unmeasured regions near bottom, surface, and edges of a given cross-section. The velocity in those unmeasured regions are usually extrapolated or assumed for calculating flow discharge, which definitely affects the accuracy in the discharge assessment. This study aimed at scrutinizing a conventional extrapolation method(i.e., the 1/6 power law) for estimating the unmeasured regions to figure out the accuracy in ADCP discharge measurements. For the comparative analysis, we collected spatially dense velocity data using ADV as well as stationary ADCP in a real-scale straight river channel, and applied the 1/6 power law for testing its applicability in conjunction with the logarithmic law which is another representative velocity law. As results, the logarithmic law fitted better with actual velocity measurement than the 1/6 power law. In particular, the 1/6 power law showed a tendency to underestimate the velocity in the near surface region and overestimate in the near bottom region. This finding indicated that the 1/6 power law could be unsatisfactory to follow actual flow regime, thus that resulted discharge estimates in both unmeasured top and bottom region can give rise to discharge bias. Therefore, the logarithmic law should be considered as an alternative especially for the stationary ADCP discharge measurement. In addition, it was found that ADCP should be operated in at least more than 0.6 m of water depth in the left and right edges for better estimate edge discharges. In the future, similar comparative analysis might be required for the moving boat ADCP discharge measurement method, which has been more widely used in the field.

Rietveld Structure Refinement of Biotite Using Neutron Powder Diffraction (중성자분말회절법을 이용한 흑운모의 Rietveld Structure Refinement)

  • 전철민;김신애;문희수
    • Economic and Environmental Geology
    • /
    • v.34 no.1
    • /
    • pp.1-12
    • /
    • 2001
  • The crystal structure of biotite-1M from Bancroft, Ontario, was determined by Rietveld refinement method using high-resolution neutron powder diffraction data at -26.3$^{\circ}C$, 2$0^{\circ}C$, 30$0^{\circ}C$, $600^{\circ}C$, 90$0^{\circ}C$. The crystal structure has been refined to a R sub(B) of 5.06%-11.9% and S (Goodness of fitness) of 2.97-3.94. The expansion rate of a, b, c unit cell dimensions with elevated temperature linearly increase to $600^{\circ}C$. The expansivity of the c dimension is $1.61{\times}10^{40}C^{-1}$, while $2.73{\times}10^{50}C^{-1}$ and $5.71{\times}10^{-50}C^{-1}$ for the a and b dimensions, respectively. Thus, the volume increase of the unit cell is dominated by expansion of the c axis as increasing temperature. In contrast to the trend, the expansivity of the dimensions is decreased at 90$0^{\circ}C$. It may be attributed to a change in cation size caused by dehydroxylation-oxidation of $Fe^{2+}$ to $Fe^{3+}$ in vacuum condition at such high temperature. The position of H-proton was determined by the refinement of diffraction pattern at low temperature (-2.63$^{\circ}C$). The position is 0.9103${\AA}$ from the O sub(4) location and located at atomic coordinates (x/a=0.138, y/b=0.5, z/c=0.305) with the OH vector almost normal to plane (001). According to the increase of the temperature, $\alpha$* (tetrahedral rotation angle), $t_{oct}$ (octahedral sheet thickness), mean distance increase except 90$0^{\circ}C$ data. But the trend is less clearly relative to unit cell dimension expansion because the expansion is dominant to the interlayer. Also, ${\Psi}$ (octahedral flattening angle) shows no trends as increasing temperature and it may be because the octahedron (M1, M2) is substituted by Mg and Fe.

  • PDF

On Method for LBS Multi-media Services using GML 3.0 (GML 3.0을 이용한 LBS 멀티미디어 서비스에 관한 연구)

  • Jung, Kee-Joong;Lee, Jun-Woo;Kim, Nam-Gyun;Hong, Seong-Hak;Choi, Beyung-Nam
    • 한국공간정보시스템학회:학술대회논문집
    • /
    • 2004.12a
    • /
    • pp.169-181
    • /
    • 2004
  • SK Telecom has already constructed GIMS system as the base common framework of LBS/GIS service system based on OGC(OpenGIS Consortium)'s international standard for the first mobile vector map service in 2002, But as service content appears more complex, renovation has been needed to satisfy multi-purpose, multi-function and maximum efficiency as requirements have been increased. This research is for preparation ion of GML3-based platform to upgrade service from GML2 based GIMS system. And with this, it will be possible for variety of application services to provide location and geographic data easily and freely. In GML 3.0, it has been selected animation, event handling, resource for style mapping, topology specification for 3D and telematics services for mobile LBS multimedia service. And the schema and transfer protocol has been developed and organized to optimize data transfer to MS(Mobile Stat ion) Upgrade to GML 3.0-based GIMS system has provided innovative framework in the view of not only construction but also service which has been implemented and applied to previous research and system. Also GIMS channel interface has been implemented to simplify access to GIMS system, and service component of GIMS internals, WFS and WMS, has gotten enhanded and expanded function.

  • PDF

Measurement of Backscattering Coefficients of Rice Canopy Using a Ground Polarimetric Scatterometer System (지상관측 레이다 산란계를 이용한 벼 군락의 후방산란계수 측정)

  • Hong, Jin-Young;Kim, Yi-Hyun;Oh, Yi-Sok;Hong, Suk-Young
    • Korean Journal of Remote Sensing
    • /
    • v.23 no.2
    • /
    • pp.145-152
    • /
    • 2007
  • The polarimetric backscattering coefficients of a wet-land rice field which is an experimental plot belong to National Institute of Agricultural Science and Technology in Suwon are measured using ground-based polarimetric scatterometers at 1.8 and 5.3 GHz throughout a growth year from transplanting period to harvest period (May to October in 2006). The polarimetric scatterometers consist of a vector network analyzer with time-gating function and polarimetric antenna set, and are well calibrated to get VV-, HV-, VH-, HH-polarized backscattering coefficients from the measurements, based on single target calibration technique using a trihedral corner reflector. The polarimetric backscattering coefficients are measured at $30^{\circ},\;40^{\circ},\;50^{\circ}\;and\;60^{\circ}$ with 30 independent samples for each incidence angle at each frequency. In the measurement periods the ground truth data including fresh and dry biomass, plant height, stem density, leaf area, specific leaf area, and moisture contents are also collected for each measurement. The temporal variations of the measured backscattering coefficients as well as the measured plant height, LAI (leaf area index) and biomass are analyzed. Then, the measured polarimetric backscattering coefficients are compared with the rice growth parameters. The measured plant height increases monotonically while the measured LAI increases only till the ripening period and decreases after the ripening period. The measured backscattering coefficientsare fitted with polynomial expressions as functions of growth age, plant LAI and plant height for each polarization, frequency, and incidence angle. As the incidence angle is bigger, correlations of L band signature to the rice growth was higher than that of C band signatures. It is found that the HH-polarized backscattering coefficients are more sensitive than the VV-polarized backscattering coefficients to growth age and other input parameters. It is necessary to divide the data according to the growth period which shows the qualitative changes of growth such as panicale initiation, flowering or heading to derive functions to estimate rice growth.

Factor Analysis Affecting on Changes in Handysize Freight Index and Spot Trip Charterage (핸디사이즈 운임지수 및 스팟용선료 변화에 영향을 미치는 요인 분석)

  • Lee, Choong-Ho;Kim, Tae-Woo;Park, Keun-Sik
    • Journal of Korea Port Economic Association
    • /
    • v.37 no.2
    • /
    • pp.73-89
    • /
    • 2021
  • The handysize bulk carriers are capable of transporting a variety of cargo that cannot be transported by mid-large size ship, and the spot chartering market is active, and it is a market that is independent of mid-large size market, and is more risky due to market conditions and charterage variability. In this study, Granger causality test, the Impulse Response Function(IRF) and Forecast Error Variance Decomposition(FEVD) were performed using monthly time series data. As a result of Granger causality test, coal price for coke making, Japan steel plate commodity price, hot rolled steel sheet price, fleet volume and bunker price have causality to Baltic Handysize Index(BHSI) and charterage. After confirming the appropriate lag and stability of the Vector Autoregressive model(VAR), IRF and FEVD were analyzed. As a result of IRF, the three variables of coal price for coke making, hot rolled steel sheet price and bunker price were found to have significant at both upper and lower limit of the confidence interval. Among them, the impulse of hot rolled steel sheet price was found to have the most significant effect. As a result of FEVD, the explanatory power that affects BHSI and charterage is the same in the order of hot rolled steel sheet price, coal price for coke making, bunker price, Japan steel plate price, and fleet volume. It was found that it gradually increased, affecting BHSI by 30% and charterage by 26%. In order to differentiate from previous studies and to find out the effect of short term lag, analysis was performed using monthly price data of major cargoes for Handysize bulk carriers, and meaningful results were derived that can predict monthly market conditions. This study can be helpful in predicting the short term market conditions for shipping companies that operate Handysize bulk carriers and concerned parties in the handysize chartering market.