• Title/Summary/Keyword: Pattern mining

Search Result 624, Processing Time 0.03 seconds

Prediction of a hit drama with a pattern analysis on early viewing ratings (초기 시청시간 패턴 분석을 통한 대흥행 드라마 예측)

  • Nam, Kihwan;Seong, Nohyoon
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.33-49
    • /
    • 2018
  • The impact of TV Drama success on TV Rating and the channel promotion effectiveness is very high. The cultural and business impact has been also demonstrated through the Korean Wave. Therefore, the early prediction of the blockbuster success of TV Drama is very important from the strategic perspective of the media industry. Previous studies have tried to predict the audience ratings and success of drama based on various methods. However, most of the studies have made simple predictions using intuitive methods such as the main actor and time zone. These studies have limitations in predicting. In this study, we propose a model for predicting the popularity of drama by analyzing the customer's viewing pattern based on various theories. This is not only a theoretical contribution but also has a contribution from the practical point of view that can be used in actual broadcasting companies. In this study, we collected data of 280 TV mini-series dramas, broadcasted over the terrestrial channels for 10 years from 2003 to 2012. From the data, we selected the most highly ranked and the least highly ranked 45 TV drama and analyzed the viewing patterns of them by 11-step. The various assumptions and conditions for modeling are based on existing studies, or by the opinions of actual broadcasters and by data mining techniques. Then, we developed a prediction model by measuring the viewing-time distance (difference) using Euclidean and Correlation method, which is termed in our study similarity (the sum of distance). Through the similarity measure, we predicted the success of dramas from the viewer's initial viewing-time pattern distribution using 1~5 episodes. In order to confirm that the model is shaken according to the measurement method, various distance measurement methods were applied and the model was checked for its dryness. And when the model was established, we could make a more predictive model using a grid search. Furthermore, we classified the viewers who had watched TV drama more than 70% of the total airtime as the "passionate viewer" when a new drama is broadcasted. Then we compared the drama's passionate viewer percentage the most highly ranked and the least highly ranked dramas. So that we can determine the possibility of blockbuster TV mini-series. We find that the initial viewing-time pattern is the key factor for the prediction of blockbuster dramas. From our model, block-buster dramas were correctly classified with the 75.47% accuracy with the initial viewing-time pattern analysis. This paper shows high prediction rate while suggesting audience rating method different from existing ones. Currently, broadcasters rely heavily on some famous actors called so-called star systems, so they are in more severe competition than ever due to rising production costs of broadcasting programs, long-term recession, aggressive investment in comprehensive programming channels and large corporations. Everyone is in a financially difficult situation. The basic revenue model of these broadcasters is advertising, and the execution of advertising is based on audience rating as a basic index. In the drama, there is uncertainty in the drama market that it is difficult to forecast the demand due to the nature of the commodity, while the drama market has a high financial contribution in the success of various contents of the broadcasting company. Therefore, to minimize the risk of failure. Thus, by analyzing the distribution of the first-time viewing time, it can be a practical help to establish a response strategy (organization/ marketing/story change, etc.) of the related company. Also, in this paper, we found that the behavior of the audience is crucial to the success of the program. In this paper, we define TV viewing as a measure of how enthusiastically watching TV is watched. We can predict the success of the program successfully by calculating the loyalty of the customer with the hot blood. This way of calculating loyalty can also be used to calculate loyalty to various platforms. It can also be used for marketing programs such as highlights, script previews, making movies, characters, games, and other marketing projects.

Response Modeling with Semi-Supervised Support Vector Regression (준지도 지지 벡터 회귀 모델을 이용한 반응 모델링)

  • Kim, Dong-Il
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.9
    • /
    • pp.125-139
    • /
    • 2014
  • In this paper, I propose a response modeling with a Semi-Supervised Support Vector Regression (SS-SVR) algorithm. In order to increase the accuracy and profit of response modeling, unlabeled data in the customer dataset are used with the labeled data during training. The proposed SS-SVR algorithm is designed to be a batch learning to reduce the training complexity. The label distributions of unlabeled data are estimated in order to consider the uncertainty of labeling. Then, multiple training data are generated from the unlabeled data and their estimated label distributions with oversampling to construct the training dataset with the labeled data. Finally, a data selection algorithm, Expected Margin based Pattern Selection (EMPS), is employed to reduce the training complexity. The experimental results conducted on a real-world marketing dataset showed that the proposed response modeling method trained efficiently, and improved the accuracy and the expected profit.

The extension of the largest generalized-eigenvalue based distance metric Dij1) in arbitrary feature spaces to classify composite data points

  • Daoud, Mosaab
    • Genomics & Informatics
    • /
    • v.17 no.4
    • /
    • pp.39.1-39.20
    • /
    • 2019
  • Analyzing patterns in data points embedded in linear and non-linear feature spaces is considered as one of the common research problems among different research areas, for example: data mining, machine learning, pattern recognition, and multivariate analysis. In this paper, data points are heterogeneous sets of biosequences (composite data points). A composite data point is a set of ordinary data points (e.g., set of feature vectors). We theoretically extend the derivation of the largest generalized eigenvalue-based distance metric Dij1) in any linear and non-linear feature spaces. We prove that Dij1) is a metric under any linear and non-linear feature transformation function. We show the sufficiency and efficiency of using the decision rule $\bar{{\delta}}_{{\Xi}i}$(i.e., mean of Dij1)) in classification of heterogeneous sets of biosequences compared with the decision rules min𝚵iand median𝚵i. We analyze the impact of linear and non-linear transformation functions on classifying/clustering collections of heterogeneous sets of biosequences. The impact of the length of a sequence in a heterogeneous sequence-set generated by simulation on the classification and clustering results in linear and non-linear feature spaces is empirically shown in this paper. We propose a new concept: the limiting dispersion map of the existing clusters in heterogeneous sets of biosequences embedded in linear and nonlinear feature spaces, which is based on the limiting distribution of nucleotide compositions estimated from real data sets. Finally, the empirical conclusions and the scientific evidences are deduced from the experiments to support the theoretical side stated in this paper.

A Study on Fuzzy Logic based Clustering Method for Radar Data Analysis (레이더 데이터 분석을 위한 Fuzzy Logic 기반 클러스터링 기법에 관한 연구)

  • Lee, Hansoo;Kim, Eun Kyeong;Kim, Sungshin
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.3
    • /
    • pp.217-222
    • /
    • 2015
  • Clustering is one of important data mining techniques known as exploratory data analysis and is being applied in various engineering and scientific fields such as pattern recognition, remote sensing, and so on. The method organizes data by abstracting underlying structure either as a grouping of individuals or as a hierarchy of groups. Weather radar observes atmospheric objects by utilizing reflected signals and stores observed data in corresponding coordinate. To analyze the radar data, it is needed to be separately organized precipitation and non-precipitation echo based on similarities. Thus, this paper studies to apply clustering method to radar data. In addition, in order to solve the problem when precipitation echo locates close to non-precipitation echo, fuzzy logic based clustering method which can consider both distance and other properties such as reflectivity and Doppler velocity is suggested in this paper. By using actual cases, the suggested clustering method derives better results than previous method in near-located precipitation and non-precipitation echo case.

Collaboration Framework based on Social Semantic Web for Cloud Systems (클라우드 시스템에서 소셜 시멘틱 웹 기반 협력 프레임 워크)

  • Mateo, Romeo Mark A.;Yang, Hyun-Ho;Lee, Jae-Wan
    • Journal of Internet Computing and Services
    • /
    • v.13 no.1
    • /
    • pp.65-74
    • /
    • 2012
  • Cloud services are used for improving business. Moreover, customer relationship management(CRM) approaches use social networking as tools to enhance services to customers. However, most cloud systems do not support the semantic structures, and because of this, vital information from social network sites is still hard to process and use for business strategy. This paper proposes a collaboration framework based on social semantic web for cloud system. The proposed framework consists of components to support social semantic web to provide an efficient collaboration system for cloud consumers and service providers. The knowledge acquisition module extracts rules from data gathered by social agents and these rules are used for collaboration and business strategy. This paper showed the implementations of processing of social network site data in the proposed semantic model and pattern extraction which was used for the virtual grouping of cloud service providers for efficient collaboration.

A Sequential Pattern Analysis for Dynamic Discovery of Customers' Preference (고객의 동적 선호 탐색을 위한 순차패턴 분석: (주)더페이스샵 사례)

  • Song, Ki-Ryong;Noh, Soeng-Ho;Lee, Jae-Kwang;Choi, Il-Young;Kim, Jae-Kyeong
    • Information Systems Review
    • /
    • v.10 no.2
    • /
    • pp.195-209
    • /
    • 2008
  • Customers' needs change every moment. Profitability of stores can't be increased anymore with an existing standardized chain store management. Accordingly, a personalized store management tool needs through prediction of customers' preference. In this study, we propose a recommending procedure using dynamic customers' preference by analyzing the transaction database. We utilize self-organizing map algorithm and association rule mining which are applied to cluster the chain stores and explore purchase sequence of customers. We demonstrate that the proposed methodology makes an effect on recommendation of products in the market which is characterized by a fast fashion and a short product life cycle.

A Geochemical Study of Gold Skarn Deposits at the Sangdong Mine, Korea (상동광산 금스카른광상의 지구화학적 연구)

  • Lee, Bu Kyung;John, Yong Won
    • Economic and Environmental Geology
    • /
    • v.31 no.4
    • /
    • pp.277-290
    • /
    • 1998
  • The purpose of this research is to investigate the dispersion pattern of gold during skarnization and genesis of gold mineralization in the Sangdong skarn deposits. The Sangdong scheelite orebodies are embedded in the Cambrian Pungchon Limestone and limestone interbedded in the Myobong Slate of the Cambrian age. The tungsten deposits are classified as the Hangingwall Orebody, the Main Orebody and the Footwall Orebody as their stratigraphic locations. Recently, the Sangdong granite of the Cretaceous age (85 Ma) were found by underground exploratory drillings below the orebodies. In geochemisty, the W, Mo, Bi and F concentrations in the granite are significantly higher than those in the Cretaceous granitoids in southern Korea. Highest gold contents are associated with quartz-hornblende skarn in the Main Orebody and pyroxene-hornblende skarn in the Hangingwall Orebody. Also Au contents are closely related to Bi contents. This could be inferred that Au skarns formed from solutions under reduced environment at a temperature of $270^{\circ}C$. According to the multiple regression analysis, the variation of Au contents in the Main Orebody can be explained (87.5%) by Ag, As, Bi, Sb, Pb, Cu. Judging from the mineralogical, chemical and isotope studies, the genetic model of the deposits can be suggested as follows. The primitive Sangdong magma was enriched in W, Mo, Au, Bi and volatiles (metal-carriers such as $H_2O$, $CO_2$ and F). During the upward movement of hydrothermal ore solution, the temperature was decreased, and W deposits were formed at limestone (in the Myobong Slate and Pungchon Limestone). In addition, meteoric water influx gave rise to the retrogressive alterations and maximum solubility of gold, and consequently higher grade of Au mineralization was deposited.

  • PDF

An Analysis of Intrusion Pattern Based on Backpropagation Algorithm (역전파 알고리즘 기반의 침입 패턴 분석)

  • Woo Chong-Woo;Kim Sang-Young
    • Journal of Internet Computing and Services
    • /
    • v.5 no.5
    • /
    • pp.93-103
    • /
    • 2004
  • The main function of the intrusion Detection System (IDS) usee to be more or less passive detection of the intrusion evidences, but recently it is developed with more diverse types and methodologies. Especially, it is required that the IDS should process large system audit data fast enough. Therefore the data mining or neural net algorithm is being focused on, since they could satisfy those situations. In this study, we first surveyed and analyzed the several recent intrusion trends and types. And then we designed and implemented an IDS using back-propagation algorithm of the neural net, which could provide more effective solution. The distinctive feature of our study could be stated as follows. First, we designed the system that allows both the Anomaly dection and the Misuse detection. Second, we carried out the intrusion analysis experiment by using the reliable KDD Cup ‘99 data, which would provide us similar results compared to the real data. Finally, we designed the system based on the object-oriented concept, which could adapt to the other algorithms easily.

  • PDF

Fast K-Means Clustering Algorithm using Prediction Data (예측 데이터를 이용한 빠른 K-Means 알고리즘)

  • Jee, Tae-Chang;Lee, Hyun-Jin;Lee, Yill-Byung
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.1
    • /
    • pp.106-114
    • /
    • 2009
  • In this paper we proposed a fast method for a K-Means Clustering algorithm. The main characteristic of this method is that it uses precalculated data which possibility of change is high in order to speed up the algorithm. When calculating distance to cluster centre at each stage to assign nearest prototype in the clustering algorithm, it could reduce overall computation time by selecting only those data with possibility of change in cluster is high. Calculation time is reduced by using the distance information produced by K-Means algorithm when computing expected input data whose cluster may change, and by using such distance information the algorithm could be less affected by the number of dimensions. The proposed method was compared with original K-Means method - Lloyd's and the improved method KMHybrid. We show that our proposed method significantly outperforms in computation speed than Lloyd's and KMHybrid when using large size data which has large amount of data, great many dimensions and large number of clusters.

A Study on Recognition of Artificial Intelligence Utilizing Big Data Analysis (빅데이터 분석을 활용한 인공지능 인식에 관한 연구)

  • Nam, Soo-Tai;Kim, Do-Goan;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.129-130
    • /
    • 2018
  • Big data analysis is a technique for effectively analyzing unstructured data such as the Internet, social network services, web documents generated in the mobile environment, e-mail, and social data, as well as well formed structured data in a database. The most big data analysis techniques are data mining, machine learning, natural language processing, and pattern recognition, which were used in existing statistics and computer science. Global research institutes have identified analysis of big data as the most noteworthy new technology since 2011. Therefore, companies in most industries are making efforts to create new value through the application of big data. In this study, we analyzed using the Social Matrics which a big data analysis tool of Daum communications. We analyzed public perceptions of "Artificial Intelligence" keyword, one month as of May 19, 2018. The results of the big data analysis are as follows. First, the 1st related search keyword of the keyword of the "Artificial Intelligence" has been found to be technology (4,122). This study suggests theoretical implications based on the results.

  • PDF