• Title/Summary/Keyword: big data mining

Search Result 679, Processing Time 0.025 seconds

A study on the CRM strategy for medium and small industry of distribution (중소유통업체의 CRM 도입방안에 관한 연구)

  • Kim, Gi-Pyoung
    • Journal of Distribution Science
    • /
    • v.8 no.3
    • /
    • pp.37-47
    • /
    • 2010
  • CRM refers to the operating activities that always maintain and promote good relationship with customers to ultimately maximize the company's profits by understanding the value of customers to meet their demands, establishing a strategy which may maximize the Life Time Value and successfully operating the business by integrating the customer management processes. In our country, many big businesses are introducing CRM initiatively to use it in marketing strategy however, most medium and small sized companies do not understand CRM clearly or they feel difficult to introduce it due to huge investment needed. This study is intended to present CRM promotion strategy and activities plan fit for the medium and small sized companies by analyzing the success factors of the leading companies those have already executed CRM by surveying the precedents to make the distributors out of the industries have close relation with consumers to overcome their weakness in scale and strengthen their competitiveness in such a rapidly changing and fiercely competing market. There are 5 stages to build CRM such as the recognition of the needs of CRM establishment, the establishment of CRM integrated database, the establishment of customer analysis and marketing strategy through data mining, the practical use of customer analysis through data mining and the implementation of response analysis and close loop process. Through the case study of leading companies, CRM is needed in types of businesses where the companies constantly contact their customers. To meet their needs, they assertively analyze their customer information. Through this, they develop their own CRM programs personalized for their customers to provide high quality service products. For customers helping them make profits, the VIP marketing strategy is conducted to keep the customers from breaking their relationships with the companies. Through continuous management, CRM should be executed. In other words, through customer segmentation, the profitability for the customers should be maximized. The maximization of the profitability for the customers is the key to CRM. These are the success factors of the CRM of the distributors in Korea. Firstly, the top management's will power for CS management is needed. Secondly, the culture across the company should be made to respect the customers. Thirdly, specialized customer management and CRM workers should be trained. Fourthly, CRM behaviors should be developed for the whole staff members. Fifthly, CRM should be carried out through systematic cooperation between related departments. To make use of the case study for CRM, the company should understand the customer and establish customer management programs to set the optimal CRM strategy and continuously pursue it according to a long-term plan. For this, according to collected information and customer data, customers should be segmented and the responsive customer system should be designed according to the differentiated strategy according to the class of the customers. In terms of the future CRM, integrated CRM is essential where the customer information gathers together in one place. As the degree of customers' expectation increases a lot, the effective way to meet the customers' expectation should be pursued. As the IT technology improved rapidly, RFID (Radio Frequency Identification) appears. On a real-time basis, information about products and customers is obtained massively in a very short time. A strategy for successful CRM promotion should be improving the organizations in charge of contacting customers, re-planning the customer management processes and establishing the integrated system with the marketing strategy to keep good relation with the customers according to a long-term plan and a proper method suitable to the market conditions and run a company-wide program. In addition, a CRM program should be continuously improved and complemented to meet the company's characteristics. Especially, a strategy for successful CRM for the medium and small sized distributors should be as follows. First, they should change their existing recognition in CRM and keep in-depth care for the customers. Second, they should benchmark the techniques of CRM from the leading companies and find out success points to use. Third, they should seek some methods best suited for their particular conditions by achieving the ideas combining their own strong points with marketing. Fourth, a CRM model should be developed that will promote relationship with individual customers just like the precedents of small sized businesses in Switzerland through small but noticeable events.

  • PDF

Building a Korean Sentiment Lexicon Using Collective Intelligence (집단지성을 이용한 한글 감성어 사전 구축)

  • An, Jungkook;Kim, Hee-Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.49-67
    • /
    • 2015
  • Recently, emerging the notion of big data and social media has led us to enter data's big bang. Social networking services are widely used by people around the world, and they have become a part of major communication tools for all ages. Over the last decade, as online social networking sites become increasingly popular, companies tend to focus on advanced social media analysis for their marketing strategies. In addition to social media analysis, companies are mainly concerned about propagating of negative opinions on social networking sites such as Facebook and Twitter, as well as e-commerce sites. The effect of online word of mouth (WOM) such as product rating, product review, and product recommendations is very influential, and negative opinions have significant impact on product sales. This trend has increased researchers' attention to a natural language processing, such as a sentiment analysis. A sentiment analysis, also refers to as an opinion mining, is a process of identifying the polarity of subjective information and has been applied to various research and practical fields. However, there are obstacles lies when Korean language (Hangul) is used in a natural language processing because it is an agglutinative language with rich morphology pose problems. Therefore, there is a lack of Korean natural language processing resources such as a sentiment lexicon, and this has resulted in significant limitations for researchers and practitioners who are considering sentiment analysis. Our study builds a Korean sentiment lexicon with collective intelligence, and provides API (Application Programming Interface) service to open and share a sentiment lexicon data with the public (www.openhangul.com). For the pre-processing, we have created a Korean lexicon database with over 517,178 words and classified them into sentiment and non-sentiment words. In order to classify them, we first identified stop words which often quite likely to play a negative role in sentiment analysis and excluded them from our sentiment scoring. In general, sentiment words are nouns, adjectives, verbs, adverbs as they have sentimental expressions such as positive, neutral, and negative. On the other hands, non-sentiment words are interjection, determiner, numeral, postposition, etc. as they generally have no sentimental expressions. To build a reliable sentiment lexicon, we have adopted a concept of collective intelligence as a model for crowdsourcing. In addition, a concept of folksonomy has been implemented in the process of taxonomy to help collective intelligence. In order to make up for an inherent weakness of folksonomy, we have adopted a majority rule by building a voting system. Participants, as voters were offered three voting options to choose from positivity, negativity, and neutrality, and the voting have been conducted on one of the largest social networking sites for college students in Korea. More than 35,000 votes have been made by college students in Korea, and we keep this voting system open by maintaining the project as a perpetual study. Besides, any change in the sentiment score of words can be an important observation because it enables us to keep track of temporal changes in Korean language as a natural language. Lastly, our study offers a RESTful, JSON based API service through a web platform to make easier support for users such as researchers, companies, and developers. Finally, our study makes important contributions to both research and practice. In terms of research, our Korean sentiment lexicon plays an important role as a resource for Korean natural language processing. In terms of practice, practitioners such as managers and marketers can implement sentiment analysis effectively by using Korean sentiment lexicon we built. Moreover, our study sheds new light on the value of folksonomy by combining collective intelligence, and we also expect to give a new direction and a new start to the development of Korean natural language processing.

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

  • Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.23-45
    • /
    • 2020
  • Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.

Significance of Ages of Tungsten Mineralization (중석(重石) 광화작용(鑛化作用) 시기(時期)의 의의(意義))

  • Moon, Kun Joo
    • Economic and Environmental Geology
    • /
    • v.28 no.6
    • /
    • pp.613-621
    • /
    • 1995
  • It is understood that many big tungsten deposits such as the Sangdong in Korea, Fugigatami in Japan, Yukon in Canada, Pine Creek in U.S.A and Vostok in Russia were formed at late Cretaceous ages. However, most of tungsten mineralization in China where half the total world tungsten ores is reserved took place in late Jurassic to early Cretaceous ages. While the close association of molybdenum with tungsten mineralization is observed in the deposits related with Cretaceous magma, tungsten deposits in China related with late Jurassic to early Cretaceous show a close association of tin as well as molybdenum mineralization. It is characteristic that tungsten mineralization in China was followed by tin mineralization. The mode of occurrence of tungsten ore deposits in China is various and may represent the origin of tungsten in general, since the larger half of total amount of tungsten ores in the world are reserved in China. In case of Korea, more than 90% of total production of tungsten was occupied by the Sangdong tungsten deposit, which produced molybdenite as a byproduct Even if tin is detected in ppm unit content, no cassiterite is found in the Sangdong tungsten orebody. A similar type of two tungsten deposits is comparatively studied in order to confirm the published data; one is the Moping tungsten deposit in China and the other is the Dehwa tungsten deposit in Korea. Mineral assemblages occurring in quartz veins of both deposits are more or less same except that zinnwaldite and cassiterite occur only in the former deposit Ages of zinnwaldite and muscovite closely with molybdenite in the former deposit are 181.1 Ma and 167.8 Ma respectively, while muscovites associated with molybdenite in the latter deposit show ages of 80.9 Ma and 80.2 Ma. These results may represent deficient supply of tin from the source granitoid from which tungsten was derived in Korean peninsula during Cretaceous period, while tin supplied during tungsten mineralization tended to increase and the active tin mineralization followed the Jurassic tungsten mineralization in China.

  • PDF

A Topic Analysis of College Education Using Big Data of News Articles (뉴스 빅데이터를 통해 검토한 대학교육의 토픽 분석)

  • Yang, Ji-Yeon;Koo, Jeong-Ho
    • Journal of Digital Convergence
    • /
    • v.19 no.12
    • /
    • pp.11-20
    • /
    • 2021
  • This study extracts topics related to university education through newspaper articles and analyzes the characteristics of each topic and the reporting patterns of each newspaper. The 9 topics were discovered using LDA. Topic 1 and Topic 3 are related to university support projects for education, but Topic 3 is focused on local universities. Topic 2 is about university education after COVID-19, Topic 4 teaching-learning methods, Topic 5 government policies, Topic 6 the high school education contribution university support projects, Topic 7 the university education vision, Topic 8 internationalization, and Topic 9 the entrance exam. The Chosun Ilbo, Kyunghyang, and Hankyoreh reported a lot of articles associated to lectures after COVID-19, government policies, and comments on university education. Relevant articles since 2016 have been analyzed by newspaper type and before/after COVID-19 through which differences in the topics were studied and discussed. These findings would suggest a basic policy guideline for university education and imply that the positive and negative effects of the media need to be considered.

Bitcoin(Gold)'s Hedge·Safe-Haven·Equity·Taxation (비트코인(금)의 헷지·안전처·공평성·세제 소고)

  • Hwang, Y.
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.3
    • /
    • pp.13-32
    • /
    • 2018
  • Btcoin has made a big progress through anonymity, decentralized authority, sharing economy, multi-ledger book-keeping, block-technology and the convenient financial vehicle. Bitcoin has the characteristics of mining and supply by decentralized suppliers, limited supply quantity and the partial money-like function as well as gold. The paper studies the hedge and safe-haven of Bitcoin and gold on daily frequency data over the period of July 20, 2010-Dec. 27, 2017 employing Asymmetric Vector GARCH. It finds that gold has the hedge and safe-haven against inflation and capital markets while Bitcoin has the weak hedge and the weak safe-haven. It shows insignificant effects of inflations of US and Korea on the volatilities of Bitcoin and gold. It also suggests the necessity of clearing of vagueness behind the anonymity for fair and transparent trade through the law application in the absence or fault in law (Lucken im Recht). following the spirit of the living constitution (lebendige gutes Recht oder Vorschrift). The relevant institutions are hoped to be given some of obligations such as registration, minimum required capital. report, disclosure, explanation, compliance and governance with autonomous corresponding rights. The study also suggests the reestablishment of the relevant financial law and taxation law. The hedge would not be successfully accomplished without the vigilant cautions of investors.

A Study on the Conceptual Changes of Extra-solar Planet in University Students Using Text-Mining Techniques (텍스트마이닝을 활용한 대학생들의 외계행성 개념 변화 연구)

  • Han, Shin;Kim, Yong-Ki;Kim, Hyoungbum
    • Journal of the Korean Society of Earth Science Education
    • /
    • v.13 no.3
    • /
    • pp.305-316
    • /
    • 2020
  • This study aimed to analyze the conception of an extra-solar planet perceived by university students. To conduct this, we developed an extra-solar planet education program and questionnaires which help to figure out changes between before and after the program, and then applied them to the targeted students. The results of the study are as follows. First, as to the conception of an extra-solar planet, participants understood it merely as a planet outside the solar system before they got training. However, they expanded it to the one revolving around a star that appears outside the solar system based on keywords after the training. Second, they gave brief responses regarding exploration strategies (e.g., observing the extra-solar planet by using the Doppler effect, dietary phenomenon, and gravitational lens) based on indirect experiences they encountered in the media. The responses indicated their lack of concept of the extra-solar planet exploration methods. However, their recognition of the extra-solar planet observation became concrete while students learned about the exploration of the extra-solar planet. Third, they were expanding the importance of the exoplanet observation simply beyond the discovery of extraterrestrial life to the creative process and research methods, including the solar system and the development of humanity. Fourth, they recognized that exoplanet education is necessary for curriculum as it will be able to bring about students' interest and curiosity as well as scientific knowledge if contents related to the extra-solar planet appear in the earth science curriculum.

Trend Analysis of Sports for All-Related Issues in Early Stage of COVID-19 Using Topic Modeling (토픽 모델링을 활용한 코로나19 초기 생활체육 이슈 분석)

  • Chung, Yunkil;Seo, Sumin;Kang, Hyunmin
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.57-79
    • /
    • 2022
  • COVID-19, which started in December 2019, has had a great impact on our lives in general, including politics, economy, society, and culture, and activities in sports and arts have also been significantly reduced. In the case of sports, sports for all fields in which ordinary citizens participate were particularly affected, and cases of infection in places closely related to people's lives, such as gyms, table tennis, and badminton clubs, also amplified the social fear of the spread of COVID-19. Therefore, in this study, we analyzed news articles related to sports for all at the time when COVID-19 was first spread, and investigated what issues were emerging and being discussed in the sports for all field under the COVID-19 situation. Specifically, we collected news articles dealt with sports for all issues under the COVID-19 situation from Korea's leading portal news sites and identified key sports for all issues by performing topic modeling on these articles. Through the analysis, we found meaningful issues such as COVID-19 outbreak in sports facilities and support for sports activities. In addition, through wordcloud analysis of these major issues, we visually understood the issues and identified the changes in these issues over time.

Automatic Quality Evaluation with Completeness and Succinctness for Text Summarization (완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법)

  • Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.125-148
    • /
    • 2018
  • Recently, as the demand for big data analysis increases, cases of analyzing unstructured data and using the results are also increasing. Among the various types of unstructured data, text is used as a means of communicating information in almost all fields. In addition, many analysts are interested in the amount of data is very large and relatively easy to collect compared to other unstructured and structured data. Among the various text analysis applications, document classification which classifies documents into predetermined categories, topic modeling which extracts major topics from a large number of documents, sentimental analysis or opinion mining that identifies emotions or opinions contained in texts, and Text Summarization which summarize the main contents from one document or several documents have been actively studied. Especially, the text summarization technique is actively applied in the business through the news summary service, the privacy policy summary service, ect. In addition, much research has been done in academia in accordance with the extraction approach which provides the main elements of the document selectively and the abstraction approach which extracts the elements of the document and composes new sentences by combining them. However, the technique of evaluating the quality of automatically summarized documents has not made much progress compared to the technique of automatic text summarization. Most of existing studies dealing with the quality evaluation of summarization were carried out manual summarization of document, using them as reference documents, and measuring the similarity between the automatic summary and reference document. Specifically, automatic summarization is performed through various techniques from full text, and comparison with reference document, which is an ideal summary document, is performed for measuring the quality of automatic summarization. Reference documents are provided in two major ways, the most common way is manual summarization, in which a person creates an ideal summary by hand. Since this method requires human intervention in the process of preparing the summary, it takes a lot of time and cost to write the summary, and there is a limitation that the evaluation result may be different depending on the subject of the summarizer. Therefore, in order to overcome these limitations, attempts have been made to measure the quality of summary documents without human intervention. On the other hand, as a representative attempt to overcome these limitations, a method has been recently devised to reduce the size of the full text and to measure the similarity of the reduced full text and the automatic summary. In this method, the more frequent term in the full text appears in the summary, the better the quality of the summary. However, since summarization essentially means minimizing a lot of content while minimizing content omissions, it is unreasonable to say that a "good summary" based on only frequency always means a "good summary" in its essential meaning. In order to overcome the limitations of this previous study of summarization evaluation, this study proposes an automatic quality evaluation for text summarization method based on the essential meaning of summarization. Specifically, the concept of succinctness is defined as an element indicating how few duplicated contents among the sentences of the summary, and completeness is defined as an element that indicating how few of the contents are not included in the summary. In this paper, we propose a method for automatic quality evaluation of text summarization based on the concepts of succinctness and completeness. In order to evaluate the practical applicability of the proposed methodology, 29,671 sentences were extracted from TripAdvisor 's hotel reviews, summarized the reviews by each hotel and presented the results of the experiments conducted on evaluation of the quality of summaries in accordance to the proposed methodology. It also provides a way to integrate the completeness and succinctness in the trade-off relationship into the F-Score, and propose a method to perform the optimal summarization by changing the threshold of the sentence similarity.