• Title/Summary/Keyword: Cluster Systems

Search Result 1,236, Processing Time 0.038 seconds

SKU recommender system for retail stores that carry identical brands using collaborative filtering and hybrid filtering (협업 필터링 및 하이브리드 필터링을 이용한 동종 브랜드 판매 매장간(間) 취급 SKU 추천 시스템)

  • Joe, Denis Yongmin;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.77-110
    • /
    • 2017
  • Recently, the diversification and individualization of consumption patterns through the web and mobile devices based on the Internet have been rapid. As this happens, the efficient operation of the offline store, which is a traditional distribution channel, has become more important. In order to raise both the sales and profits of stores, stores need to supply and sell the most attractive products to consumers in a timely manner. However, there is a lack of research on which SKUs, out of many products, can increase sales probability and reduce inventory costs. In particular, if a company sells products through multiple in-store stores across multiple locations, it would be helpful to increase sales and profitability of stores if SKUs appealing to customers are recommended. In this study, the recommender system (recommender system such as collaborative filtering and hybrid filtering), which has been used for personalization recommendation, is suggested by SKU recommendation method of a store unit of a distribution company that handles a homogeneous brand through a plurality of sales stores by country and region. We calculated the similarity of each store by using the purchase data of each store's handling items, filtering the collaboration according to the sales history of each store by each SKU, and finally recommending the individual SKU to the store. In addition, the store is classified into four clusters through PCA (Principal Component Analysis) and cluster analysis (Clustering) using the store profile data. The recommendation system is implemented by the hybrid filtering method that applies the collaborative filtering in each cluster and measured the performance of both methods based on actual sales data. Most of the existing recommendation systems have been studied by recommending items such as movies and music to the users. In practice, industrial applications have also become popular. In the meantime, there has been little research on recommending SKUs for each store by applying these recommendation systems, which have been mainly dealt with in the field of personalization services, to the store units of distributors handling similar brands. If the recommendation method of the existing recommendation methodology was 'the individual field', this study expanded the scope of the store beyond the individual domain through a plurality of sales stores by country and region and dealt with the store unit of the distribution company handling the same brand SKU while suggesting a recommendation method. In addition, if the existing recommendation system is limited to online, it is recommended to apply the data mining technique to develop an algorithm suitable for expanding to the store area rather than expanding the utilization range offline and analyzing based on the existing individual. The significance of the results of this study is that the personalization recommendation algorithm is applied to a plurality of sales outlets handling the same brand. A meaningful result is derived and a concrete methodology that can be constructed and used as a system for actual companies is proposed. It is also meaningful that this is the first attempt to expand the research area of the academic field related to the existing recommendation system, which was focused on the personalization domain, to a sales store of a company handling the same brand. From 05 to 03 in 2014, the number of stores' sales volume of the top 100 SKUs are limited to 52 SKUs by collaborative filtering and the hybrid filtering method SKU recommended. We compared the performance of the two recommendation methods by totaling the sales results. The reason for comparing the two recommendation methods is that the recommendation method of this study is defined as the reference model in which offline collaborative filtering is applied to demonstrate higher performance than the existing recommendation method. The results of this model are compared with the Hybrid filtering method, which is a model that reflects the characteristics of the offline store view. The proposed method showed a higher performance than the existing recommendation method. The proposed method was proved by using actual sales data of large Korean apparel companies. In this study, we propose a method to extend the recommendation system of the individual level to the group level and to efficiently approach it. In addition to the theoretical framework, which is of great value.

Policy Change and Innovation of Textile Industry in Daegu·Kyungbuk Region (대구·경북지역 섬유산업의 정책변화와 혁신과제)

  • Shin, Jin-Kyo;Kim, Yo-Han
    • Management & Information Systems Review
    • /
    • v.31 no.3
    • /
    • pp.223-248
    • /
    • 2012
  • This study analyses support policy and structural change of textile industry in Daegu Kyungbuk region, and suggests major issues for textile industry's innovation. In Daegu Kyungbuk, it was 1999 that a policy, so called Milano Project, in order to promote a textile industry was devised. In 2004, the Regional Industrial Promotion Plan was devised. The plan was born from a view point of establishing a regional innovation system and of promoting the innovative clusters under a knowledge based economy. After then, the Regional Industry Promotion Project or Regional Strategic Industry Promotion Project became a core of regional textile industrial policy. Research results indicated that the first stage Milano project (1999-2003) showed both positive and negative effects. There were no long-term development plan, clear vision and strategy. But, core industrial infrastructure for differentiated product development, such as New product Development Support Center and Dyeing Design Practical Application Center, was constructed. The second stage Daegu Textile Industry Promotion Plan (2004-2008) displayed a significant technological performance and new product sales with the assistance of Kyungbuk province. Also, textile industry revealed positive fruits such as financial structure, productivity, and profitability as a result of strong restructuring. In industrial structure, there was a important change from clothe textile material to industry textile material. Most of textile companies did not showed high capability in CEO's technology innovation intention, entrepreneurship, R&D and human resource competency in compare with other industry. We suggested that Daegu Kyungbuk has to select and concentrate on the high-tech textile material and living textile for sustainable development and competitiveness. We also proposed a confidence and cooperation based innovation network and company oriented innovation cluster.

  • PDF

A Study on the Intelligent Quick Response System for Fast Fashion(IQRS-FF) (패스트 패션을 위한 지능형 신속대응시스템(IQRS-FF)에 관한 연구)

  • Park, Hyun-Sung;Park, Kwang-Ho
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.163-179
    • /
    • 2010
  • Recentlythe concept of fast fashion is drawing attention as customer needs are diversified and supply lead time is getting shorter in fashion industry. It is emphasized as one of the critical success factors in the fashion industry how quickly and efficiently to satisfy the customer needs as the competition has intensified. Because the fast fashion is inherently susceptible to trend, it is very important for fashion retailers to make quick decisions regarding items to launch, quantity based on demand prediction, and the time to respond. Also the planning decisions must be executed through the business processes of procurement, production, and logistics in real time. In order to adapt to this trend, the fashion industry urgently needs supports from intelligent quick response(QR) system. However, the traditional functions of QR systems have not been able to completely satisfy such demands of the fast fashion industry. This paper proposes an intelligent quick response system for the fast fashion(IQRS-FF). Presented are models for QR process, QR principles and execution, and QR quantity and timing computation. IQRS-FF models support the decision makers by providing useful information with automated and rule-based algorithms. If the predefined conditions of a rule are satisfied, the actions defined in the rule are automatically taken or informed to the decision makers. In IQRS-FF, QRdecisions are made in two stages: pre-season and in-season. In pre-season, firstly master demand prediction is performed based on the macro level analysis such as local and global economy, fashion trends and competitors. The prediction proceeds to the master production and procurement planning. Checking availability and delivery of materials for production, decision makers must make reservations or request procurements. For the outsourcing materials, they must check the availability and capacity of partners. By the master plans, the performance of the QR during the in-season is greatly enhanced and the decision to select the QR items is made fully considering the availability of materials in warehouse as well as partners' capacity. During in-season, the decision makers must find the right time to QR as the actual sales occur in stores. Then they are to decide items to QRbased not only on the qualitative criteria such as opinions from sales persons but also on the quantitative criteria such as sales volume, the recent sales trend, inventory level, the remaining period, the forecast for the remaining period, and competitors' performance. To calculate QR quantity in IQRS-FF, two calculation methods are designed: QR Index based calculation and attribute similarity based calculation using demographic cluster. In the early period of a new season, the attribute similarity based QR amount calculation is better used because there are not enough historical sales data. By analyzing sales trends of the categories or items that have similar attributes, QR quantity can be computed. On the other hand, in case of having enough information to analyze the sales trends or forecasting, the QR Index based calculation method can be used. Having defined the models for decision making for QR, we design KPIs(Key Performance Indicators) to test the reliability of the models in critical decision makings: the difference of sales volumebetween QR items and non-QR items; the accuracy rate of QR the lead-time spent on QR decision-making. To verify the effectiveness and practicality of the proposed models, a case study has been performed for a representative fashion company which recently developed and launched the IQRS-FF. The case study shows that the average sales rateof QR items increased by 15%, the differences in sales rate between QR items and non-QR items increased by 10%, the QR accuracy was 70%, the lead time for QR dramatically decreased from 120 hours to 8 hours.

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

Exploring a Balanced Share of Slow Charging Options by Places Based on Heterogeneous Travel and Charging Behavior of Electric Vehicle Users (장소별 완속충전기 적정 보급 비율에 관한 연구 : 전기차 이용자의 통행 및 충전행태에 따른 이질성을 중심으로)

  • Jae Hyun Lee;Seo Youn Yoon;Hyeonmi Kim
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.21 no.6
    • /
    • pp.21-35
    • /
    • 2022
  • With the support of local and central governments, various incentive policies for "green" cars have been established, and the number of electric vehicle users has been rapidly increasing in recent years. As a result, much attention is being given to establishing a user-centered charging infrastructure. A standard for the number of electric vehicle chargers to be supplied is being prepared based on building characteristics, but there is quite limited research on the appropriate ratio of slow and fast chargers based on the characteristics of each place. Therefore, this study derived an appropriate penetration ratio based on data about the distribution ratio of common slow chargers. These data were collected using a survey of actual electric vehicle users. Next, an analysis was done on how to categorize the needs of charging environments and to determine what criteria or characteristics to use for categorization. Based on the results of the survey analysis, three types of places were derived. Type-1 places require 10% of chargers to be slow chargers, Type-2 places require 40-60% of chargers to be slow chargers (i.e., around equal distribution of slow and fast chargers), and Type-3 places require more than 80% of chargers to be slow chargers. The required levels of slow chargers were classified by place type and by individual using latent class cluster analysis, which made it possible to categorize them into five clusters related to socioeconomic variables, vehicle characteristics, traffic, and charging behaviors. It was found that there was a high correlation between charging behavior, weekend travel behavior, gender, and income. The results and insights from this study could be used to establish charging infrastructure policies in the future and to prepare standards for supplying charging infrastructure according to changes in the electric vehicle market.

A Study on the Cultivation Processes and Settlement Developments on the Mangyoung River Valley (만경강유역의 개간과정과 취락형성발달에 관한 연구)

  • NamGoong, Bong
    • Journal of the Korean association of regional geographers
    • /
    • v.3 no.2
    • /
    • pp.37-87
    • /
    • 1997
  • As a results of researches on the cultivation processes and settlement developments on the Mangyoung river valley as a whole could be have four 'Space-Time Continuity' through a [Origin-Destination] theory model. On a initial phases of cultivation, the cultivation process has been begun at mountain slopes and tributory plains in upper part of river-basin from Koryo Dynasty to early Chosun Dynasty. At first, indigenous peasants burned forests on the mountain slopes for making 'dryfield' for a cereal crops. Following population increase more stable food supply is necessary facets of life inducing a change production method into a 'wetfield' in tributory plains matching the population increase. First sedentary agriculture maybe initiated at this mountain slopes and tributory plains on upper part of river basin through a burning cultivation methods. Mountain slopes and tributory plains are become a Origin area in cultivation processes. It expanded from up to down through the valleys with 'a bits of land' fashion in a steady pace like a terraced fields expanded with bit by bit of land to downward. They expanded their land to the middle part of river basin in mid period of Chosun Dynasty with dike construction techniques on the river bank. Lower part of river cultivated with embankment building techniques in 1920s and then naturally expanded to the tidal marshes on the estuaries and river inlets of coastal areas. 'Pioneer fringes' are consolidated at there in modern times. Changes in landscapes are appeared it's own characters with each periods of time. Followings are results of study through the Mangyoung river valley as a whole. (1) Mountain slopes and tributory plains on the upper part of river are cultivated 'dryfields' by indigenous peasants with Burning cultivation methods at first and developed sedentary settlements at the edges of mountain slopes and on the river terrace near the fields. They formed a kind of 'periphery-located cluster type' of settlement. This type of settlement are become a prominant type in upper part of river basin. 'Dryfields' has been changed into a 'wetfields' at the narrow tributory plains by increasing population pressure in later time. These wetfields are supplied water by Weir and Ponds Irrigation System(제언수리방법). Streams on the tributory plains has been attracted wetfields besides of it and formed a [water+land] complex on it. 'Wetfields' are expanded from up to downward with a terraced land pattern(adder like pattern, 붕전) according to the gradient of valley. These periphery located settlements are formed a intimate ecological linkage with several sets of surroundings. Inner villages are expanded to Outer villages according to the expansion of arable lands into downward. (2) Mountain slopes and tributory plains expanded its territory to the alluvial deposited plains on the middle part of river valley with a urgent need of new land by population increase. This part of alluvial plains are cultivated mainly in mid period of Chosun Dynasty. Irrigation methods are changed into a Dike Construction Irrigation method(천방수리방법) for the control of floods. It has a trend to change the subjectives of cultivation from community-oriented one who constructed Bochang along tributories making rice paddies to local government authorities who could be gather large sums of capitals, techniques and labours for the big dike construction affairs. Settlements are advanced in the midst of plains avoiding friction of distances and formed a 'Centrallocated cluster type' of settlements. There occured a hierarchical structures of settlements in ranks and sizes according merits of water supply and transportation convenience at the broad plains. Big towns are developed at there. It strengthened a more prominant [water+land] complex along the canals. Ecological linkages between settlements and surroundings are shaded out into a tiny one in this area. (3) It is very necessary to get a modern technology of flood control at the rivers that have a large volume of water and broad width. The alluvial plains are remained in a wilderness phase until a technical level reached a large artificial levee construction ability that could protect the arable land from flood. Until that time on most of alluvial land at the lower part of river are remained a wilderness of overgrown with reeds in lacks of techniques to build a large-scale artificial levee along the riverbank. Cultivation processes are progressed in a large scale one by Japanese agricultural companies with [River Rennovation Project] of central government in 1920s. Large scale artificial levees are constructed along the riverbank. Subjectives of cultivation are changed from Korean peasants to Japanese agricultural companies and Korean peasants fell down as a tenant in a colonial situation of that time in Korea. They could not have any voices in planning of spatial structure and decreased their role in planning. Newly cultivated lands are reflected company's intensions, objectives and perspectives for achieving their goals for the sake of colonial power. Newly cultivated lands are planned into a regular Rectangular Block settings of rice paddies and implanted a large scale Bureaucratic-oriented Irrigation System on the cultivated plains. Every settlements are located in the midst of rice paddies with a Central located Cluster type of settlements. [water+land] complex along the canal system are more strengthened. Cultivated space has a characters of [I-IT] landscapes. (4) Artificial levees are connected into a coastal emnankment for a reclamation of broad tidal marshes on the estuaries and inlets of rivers in the colonial times. Subjectives of reclamation are enlarged into a big agricultural companies that could be acted a role as a big cultivator. After that time on most of reclamation project of tidal marshes are controlled by these agricultural companies formed by mostly Japanese capitalists. Reclaimed lands on the estuaries and river inlets are under hands of agricultural companies and all the spatial structures are formed by their intensions, objectives and perspectives. They constructed a Unit Farming Area for the sake of companies. Spatial structures are planned in a regular one with broad arable land for the rice production of rectangular blocks, regular canal systems and tank reservoir for the irrigation water supply into reclaimed lands. There developed a 'Central-located linear type' of settlements in midst of reclaimed land. These settlements are settled in a detail program upon this newly reclaimed land at once with a master plan and they have planned patterns in their distribution, building materials, location, and form. Ecological linkage between Newly settled settlemrnts and its surroundings are lost its colours and became a more artificial one by human-centred environment. [I-IT] landscapes are become more prominant. This region is a destination area of [Origin-Destination] theory model and formed a 'Pioneer Fringe'. It is a kind of pioneer front that could advance or retreat discontinously by physical conditions and socio-cultural conditions of that region.

  • PDF

Genome Type Analysis of Adenovirus Serotypes 1, 2 and 5 Isolated from Children with Lower Respiratory Tract Infections in Korea (하기도 감염 환아에서 분리된 Adenovirus 1, 2, 5 혈청형의 유전체형 분석)

  • Park, Ki-Won;Choi, Eun-Hwa;Choun, Ji-Tae;Lee, Hoan-Jong;Park, Ki-Ho
    • Pediatric Infection and Vaccine
    • /
    • v.12 no.2
    • /
    • pp.166-177
    • /
    • 2005
  • Purpose : The purpose of this study was to examine the molecular epidemiology and genetic variability of adenovirus(Ad) serotypes Ad1, Ad2, and Ad5 over 14 years in Korea. Methods : A total of 382 adenoviral strains isolated from the nasopharyngeal aspirates of children with lower respiratory tract infections in Seoul, Korea from November 1990 to February 2003 were serotyped by neutralization assay with type-specific antisera. Viral DNAs were extracted from infected cell lysates by the modified Hirt procedure. Genome type(GT) was determined by DNA restriction analysis with 12 restriction enzymess(BamHI, BclI, BglI, BglII, BstEII, EcoRI, HindIII, HpaI, SalI, SmaI, XbaI, and XhoI). To evaluate the genetic relatedness, pairwise comigrating restriction fragments(PCRF) analysis was performed. Results : Of 382 strains, 33 strains(9%) were Ad1, 45 strains(12%) were Ad2, and 24 strains(6%) were Ad5. Eighteen GTs(Ad1p1-Ad1p7, Ad1a, Ad1b, Ad1b1-Ad1b3, Ad1c, Ad1d, Ad1e, Ad1e1, Ad1e2, Ad1f) among Ad1, 24(Ad2p1-Ad2p11, Ad2a, Ad2a1-Ad2a6, Ad2b, Ad2c, Ad2d, Ad2e, Ad2e1-Ad2e3) among Ad2, and 10(Ad5p1, Ad5p2, Ad5a, Ad5a1-Ad5a7) among Ad5 strains were identified. One or two strains of the vast majority of GTs were isolated during the study period while a few GTs were identified sporadically with more than 2 strains. It is notable that some GTs such as Ad1p5 and Ad5a1 appeared in cluster during a short period. In analysis of genetic relatedness, the degree of PCRFs(pairwise comigrating restriction fragments) for Ad1 varied from 79 to 99%, for Ad2, 82 to 99%, and for Ad5, 85 to 99%. Conclusion : This study established the comprehensive nomenclature systems of Ad1, Ad2, and Ad5. Diverse GTs identified in this study have crucial implications in the genomic diversity and epidemiological characteristics of Ad1, Ad2, and Ad5.

  • PDF

A Study on the Clustering Method of Row and Multiplex Housing in Seoul Using K-Means Clustering Algorithm and Hedonic Model (K-Means Clustering 알고리즘과 헤도닉 모형을 활용한 서울시 연립·다세대 군집분류 방법에 관한 연구)

  • Kwon, Soonjae;Kim, Seonghyeon;Tak, Onsik;Jeong, Hyeonhee
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.95-118
    • /
    • 2017
  • Recent centrally the downtown area, the transaction between the row housing and multiplex housing is activated and platform services such as Zigbang and Dabang are growing. The row housing and multiplex housing is a blind spot for real estate information. Because there is a social problem, due to the change in market size and information asymmetry due to changes in demand. Also, the 5 or 25 districts used by the Seoul Metropolitan Government or the Korean Appraisal Board(hereafter, KAB) were established within the administrative boundaries and used in existing real estate studies. This is not a district classification for real estate researches because it is zoned urban planning. Based on the existing study, this study found that the city needs to reset the Seoul Metropolitan Government's spatial structure in estimating future housing prices. So, This study attempted to classify the area without spatial heterogeneity by the reflected the property price characteristics of row housing and Multiplex housing. In other words, There has been a problem that an inefficient side has arisen due to the simple division by the existing administrative district. Therefore, this study aims to cluster Seoul as a new area for more efficient real estate analysis. This study was applied to the hedonic model based on the real transactions price data of row housing and multiplex housing. And the K-Means Clustering algorithm was used to cluster the spatial structure of Seoul. In this study, data onto real transactions price of the Seoul Row housing and Multiplex Housing from January 2014 to December 2016, and the official land value of 2016 was used and it provided by Ministry of Land, Infrastructure and Transport(hereafter, MOLIT). Data preprocessing was followed by the following processing procedures: Removal of underground transaction, Price standardization per area, Removal of Real transaction case(above 5 and below -5). In this study, we analyzed data from 132,707 cases to 126,759 data through data preprocessing. The data analysis tool used the R program. After data preprocessing, data model was constructed. Priority, the K-means Clustering was performed. In addition, a regression analysis was conducted using Hedonic model and it was conducted a cosine similarity analysis. Based on the constructed data model, we clustered on the basis of the longitude and latitude of Seoul and conducted comparative analysis of existing area. The results of this study indicated that the goodness of fit of the model was above 75 % and the variables used for the Hedonic model were significant. In other words, 5 or 25 districts that is the area of the existing administrative area are divided into 16 districts. So, this study derived a clustering method of row housing and multiplex housing in Seoul using K-Means Clustering algorithm and hedonic model by the reflected the property price characteristics. Moreover, they presented academic and practical implications and presented the limitations of this study and the direction of future research. Academic implication has clustered by reflecting the property price characteristics in order to improve the problems of the areas used in the Seoul Metropolitan Government, KAB, and Existing Real Estate Research. Another academic implications are that apartments were the main study of existing real estate research, and has proposed a method of classifying area in Seoul using public information(i.e., real-data of MOLIT) of government 3.0. Practical implication is that it can be used as a basic data for real estate related research on row housing and multiplex housing. Another practical implications are that is expected the activation of row housing and multiplex housing research and, that is expected to increase the accuracy of the model of the actual transaction. The future research direction of this study involves conducting various analyses to overcome the limitations of the threshold and indicates the need for deeper research.

Derivation of Digital Music's Ranking Change Through Time Series Clustering (시계열 군집분석을 통한 디지털 음원의 순위 변화 패턴 분류)

  • Yoo, In-Jin;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.171-191
    • /
    • 2020
  • This study focused on digital music, which is the most valuable cultural asset in the modern society and occupies a particularly important position in the flow of the Korean Wave. Digital music was collected based on the "Gaon Chart," a well-established music chart in Korea. Through this, the changes in the ranking of the music that entered the chart for 73 weeks were collected. Afterwards, patterns with similar characteristics were derived through time series cluster analysis. Then, a descriptive analysis was performed on the notable features of each pattern. The research process suggested by this study is as follows. First, in the data collection process, time series data was collected to check the ranking change of digital music. Subsequently, in the data processing stage, the collected data was matched with the rankings over time, and the music title and artist name were processed. Each analysis is then sequentially performed in two stages consisting of exploratory analysis and explanatory analysis. First, the data collection period was limited to the period before 'the music bulk buying phenomenon', a reliability issue related to music ranking in Korea. Specifically, it is 73 weeks starting from December 31, 2017 to January 06, 2018 as the first week, and from May 19, 2019 to May 25, 2019. And the analysis targets were limited to digital music released in Korea. In particular, digital music was collected based on the "Gaon Chart", a well-known music chart in Korea. Unlike private music charts that are being serviced in Korea, Gaon Charts are charts approved by government agencies and have basic reliability. Therefore, it can be considered that it has more public confidence than the ranking information provided by other services. The contents of the collected data are as follows. Data on the period and ranking, the name of the music, the name of the artist, the name of the album, the Gaon index, the production company, and the distribution company were collected for the music that entered the top 100 on the music chart within the collection period. Through data collection, 7,300 music, which were included in the top 100 on the music chart, were identified for a total of 73 weeks. On the other hand, in the case of digital music, since the cases included in the music chart for more than two weeks are frequent, the duplication of music is removed through the pre-processing process. For duplicate music, the number and location of the duplicated music were checked through the duplicate check function, and then deleted to form data for analysis. Through this, a list of 742 unique music for analysis among the 7,300-music data in advance was secured. A total of 742 songs were secured through previous data collection and pre-processing. In addition, a total of 16 patterns were derived through time series cluster analysis on the ranking change. Based on the patterns derived after that, two representative patterns were identified: 'Steady Seller' and 'One-Hit Wonder'. Furthermore, the two patterns were subdivided into five patterns in consideration of the survival period of the music and the music ranking. The important characteristics of each pattern are as follows. First, the artist's superstar effect and bandwagon effect were strong in the one-hit wonder-type pattern. Therefore, when consumers choose a digital music, they are strongly influenced by the superstar effect and the bandwagon effect. Second, through the Steady Seller pattern, we confirmed the music that have been chosen by consumers for a very long time. In addition, we checked the patterns of the most selected music through consumer needs. Contrary to popular belief, the steady seller: mid-term pattern, not the one-hit wonder pattern, received the most choices from consumers. Particularly noteworthy is that the 'Climbing the Chart' phenomenon, which is contrary to the existing pattern, was confirmed through the steady-seller pattern. This study focuses on the change in the ranking of music over time, a field that has been relatively alienated centering on digital music. In addition, a new approach to music research was attempted by subdividing the pattern of ranking change rather than predicting the success and ranking of music.

A Two-Stage Learning Method of CNN and K-means RGB Cluster for Sentiment Classification of Images (이미지 감성분류를 위한 CNN과 K-means RGB Cluster 이-단계 학습 방안)

  • Kim, Jeongtae;Park, Eunbi;Han, Kiwoong;Lee, Junghyun;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.139-156
    • /
    • 2021
  • The biggest reason for using a deep learning model in image classification is that it is possible to consider the relationship between each region by extracting each region's features from the overall information of the image. However, the CNN model may not be suitable for emotional image data without the image's regional features. To solve the difficulty of classifying emotion images, many researchers each year propose a CNN-based architecture suitable for emotion images. Studies on the relationship between color and human emotion were also conducted, and results were derived that different emotions are induced according to color. In studies using deep learning, there have been studies that apply color information to image subtraction classification. The case where the image's color information is additionally used than the case where the classification model is trained with only the image improves the accuracy of classifying image emotions. This study proposes two ways to increase the accuracy by incorporating the result value after the model classifies an image's emotion. Both methods improve accuracy by modifying the result value based on statistics using the color of the picture. When performing the test by finding the two-color combinations most distributed for all training data, the two-color combinations most distributed for each test data image were found. The result values were corrected according to the color combination distribution. This method weights the result value obtained after the model classifies an image's emotion by creating an expression based on the log function and the exponential function. Emotion6, classified into six emotions, and Artphoto classified into eight categories were used for the image data. Densenet169, Mnasnet, Resnet101, Resnet152, and Vgg19 architectures were used for the CNN model, and the performance evaluation was compared before and after applying the two-stage learning to the CNN model. Inspired by color psychology, which deals with the relationship between colors and emotions, when creating a model that classifies an image's sentiment, we studied how to improve accuracy by modifying the result values based on color. Sixteen colors were used: red, orange, yellow, green, blue, indigo, purple, turquoise, pink, magenta, brown, gray, silver, gold, white, and black. It has meaning. Using Scikit-learn's Clustering, the seven colors that are primarily distributed in the image are checked. Then, the RGB coordinate values of the colors from the image are compared with the RGB coordinate values of the 16 colors presented in the above data. That is, it was converted to the closest color. Suppose three or more color combinations are selected. In that case, too many color combinations occur, resulting in a problem in which the distribution is scattered, so a situation fewer influences the result value. Therefore, to solve this problem, two-color combinations were found and weighted to the model. Before training, the most distributed color combinations were found for all training data images. The distribution of color combinations for each class was stored in a Python dictionary format to be used during testing. During the test, the two-color combinations that are most distributed for each test data image are found. After that, we checked how the color combinations were distributed in the training data and corrected the result. We devised several equations to weight the result value from the model based on the extracted color as described above. The data set was randomly divided by 80:20, and the model was verified using 20% of the data as a test set. After splitting the remaining 80% of the data into five divisions to perform 5-fold cross-validation, the model was trained five times using different verification datasets. Finally, the performance was checked using the test dataset that was previously separated. Adam was used as the activation function, and the learning rate was set to 0.01. The training was performed as much as 20 epochs, and if the validation loss value did not decrease during five epochs of learning, the experiment was stopped. Early tapping was set to load the model with the best validation loss value. The classification accuracy was better when the extracted information using color properties was used together than the case using only the CNN architecture.