• Title/Summary/Keyword: Sales System

Search Result 1,182, Processing Time 0.025 seconds

A Study on Image Copyright Archive Model for Museums (미술관 이미지저작권 아카이브 모델 연구)

  • Nam, Hyun Woo;Jeong, Seong In
    • Korea Science and Art Forum
    • /
    • v.23
    • /
    • pp.111-122
    • /
    • 2016
  • The purpose of this multi-disciplinary convergent study is to establish Image Copyright Archive Model for Museums to protect image copyright and vitalize the use of images out of necessity of research and development on copyright services over the life cycle of art contents created by the museums and out of the necessity to vitalize distribution market of image copyright contents in creative industry and to formulate management system of copyright services. This study made various suggestions for enhancement of transparency and efficiency of art contents ecosystem through vitalization of use and recycling of image copyright materials by proposing standard system for calculation, distribution, settlement and monitoring of copyright royalty of 1,000 domestic museums, galleries and exhibit halls. First, this study proposed contents and structure design of image copyright archive model and, by proposing art contents distribution service platform for prototype simulation, execution simulation and model operation simulation, established art contents copyright royalty process model. As billing system and technological development for image contents are still in incipient stage, this study used the existing contents billing framework as basic model for the development of billing technology for distribution of museum collections and artworks and automatic division and calculation engine for copyright royalty. Ultimately, study suggested image copyright archive model which can be used by artists, curators and distributors. In business strategy, study suggested niche market penetration of museum image copyright archive model. In sales expansion strategy, study established a business model in which effective process of image transaction can be conducted in the form of B2B, B2G, B2C and C2B through flexible connection of museum archive system and controllable management of image copyright materials can be possible. This study is expected to minimize disputes between copyright holder of artwork images and their owners and enhance manageability of copyrighted artworks through prevention of such disputes and provision of information on distribution and utilization of art contents (of collections and new creations) owned by the museums. In addition, by providing a guideline for archives of collections of museums and new creations, this study is expected to increase registration of image copyright and to make various convergent businesses possible such as billing, division and settlement of copyright royalty for image copyright distribution service.

The Determination of Trust in Franchisor-Franchisee Relationships in China (중국 프랜차이즈 시스템에서의 본부와 가맹점간 신뢰의 영향요인)

  • Shin, Geon-Cheol;Ma, Yaokun
    • Journal of Global Scholars of Marketing Science
    • /
    • v.18 no.2
    • /
    • pp.65-88
    • /
    • 2008
  • Since the implementation of economic reforms in 1978, the Chinese economy grows rapidly at an average annul growth rate of 9% over the post two decades. Franchising has been widely recognized as an important source of entrepreneurial activity. Trust is important in that it facilitates relational exchanges by permits partners to transcend short-run inequities or risks to concentrate on long-term profits or gains. In the relationship between the franchisors and franchisees, trust has been described as an important source of competitive advantage. However, little research has been done on the factors affecting trust in Chinese franchisor-franchisee relationships. The purpose of this study is to investigate what factors affect the trust in the franchise system in China, and to provide guidelines and insights to franchisors which enter Chinese market. In this study, according to Morgan and Hunt (1994), trust is defined as the extending when one party has confidence in an exchange partner's reliability and integrity. We offered a conceptual model of the empirical study. The model shows that the factors affecting the trust include franchisor's supports, communication, satisfaction with previous outcome and conflict. We also suggested the franchisor's supports and communication like to enhance the franchisee's satisfaction with previous outcome, and the franchisor's supports, communication and he franchisee's satisfaction with previous outcome tend to decrease conflict. Before the formal study, a pretest involving exploratory interviews with owners from three franchisees was conducted to make sure the questionnaire was relevant and clear to the respondents. The data were collected using trained interviewers to carry out personal interviews with the aid of an unidentified, muti-page, structured questionnaire. The respondents comprised of owners, managers, and owner managers of franchisee-owned food service franchises located in Beijing, China. Even though a total of 256 potential franchises were initially contacted, the finally usable sample consisted of 125 respondents. As expected, the sampling method was successful in soliciting respondents with waried personal and firm characteristics. Self-administrated questionnaires were used for all measures. And established scales were used to measure the latent constructs in this study. The measures tapped the franchisees' perceptions of the relationship with the referent franchisor. Five-point Likert-type scales ranging from "strongly disagree" (=1) to "strongly agree" (=7) were used throughout the constructs (trust, eight items; support, five items; communication, four items; satisfaction, six items; conflict, three items). The reliability measurements traditionally employed, such as the Cronbach's alpha, were used. All the reliabilities were greater than.80. The proposed measurement model was estimated using SPSS 12.0 and AMOS 5.0 analysis package. We conducted A series of exploratory factor analyses and confirmatory factor analyses to assess the convergent validity, discriminant validity, and reliability. The results indicate reasonable overall fits between the model and the observed data. The overall fit of measurement model were $X^2$= 159.699, p=0.004, d.f. = 116, GFI =.879, NFI =.898, CFI =.969, IFI =.970, TLI =.959, RMR =.058. The results demonstrated that the data reasonably fitted the model. We also examined construct reliability and reliability and average variance extracted (AVE). The construct reliability of each construct was greater than.80 and the AVE of each construct was greater than.50. According to the analysis of Structure Equation Modeling (SEM), the results of path model indicated an adequate fit of the model: $X^2$= 142.126, p = 0.044, d.f. = 115, GFI =.892, NFI =.909, CFI =.981, IFI =.981, TLI =.974, RMR =.057. As hypothesized, the results showed that it is strategically important to establish trust in a franchise system, and the franchisor's supports, communication and satisfaction with previous outcome tend to reinforce franchisee's trust. The results also showed trust seems to decrease as the experience of conflict episodes increases. And we also noticed that franchisor's supports and communication tend to enhance the franchisee's satisfaction with previous outcome, and communication tend to decrease conflict. If the trust between the franchisor and franchisee can be established in a franchise system, franchising offers many benefits and reduces many costs. To manage a mutual trust of relationship with their franchisees, franchisor's should provide support effectively to their franchisees. Effective assistant services have direct effect on franchisees' satisfaction with previous outcome and trust in franchisor. Especially, franchise sales process, orientation, and training in the start-up period are key elements for success of the franchise system. Franchisor's support is an accumulated separate satisfaction evaluation with different kind of service provided by the franchisor. And providing support definitely can improve the trustworthy image of the franchisor. In the franchise system, conflicts of interests and exertions of different power sources are very common. The experience of conflict episodes seems to negatively relate to trust. Therefore, it is important to reduce the negative side of the relationship conflicts. Communication actually plays a broader role in reducing conflict and establish mutual trust in franchisor-franchisee relationship. And effective communication between franchisors and franchisees can improve franchisees' satisfaction toward the franchise system. As the diversification of Chinese markets, both franchisors and franchisees must keep the relevant, timely, and reliable communication. And it is very important to improve the quality of communication. Satisfaction with precious outcomes seems to positively relate to trust. Franchisors and franchisees that are highly satisfied with the previous outcomes that flow from their relationship will perceive their partner as advancing their goal achievement. Therefore, it is necessary for both franchisor and their franchisees to make the welfare of partner with effort. Little literature has focused on what factors affect the trust between franchisors and their franchisees in China. This study developed the hypotheses regarding the factors affecting trust in the transaction relationship. The results of data analysis supported the hypotheses strongly. There are certain limitations in this study. First, we may point out that some other factors missed in this study could be significantly important. Second, the context of this study, food service industry, limits its potential generalizability for all franchise systems. More studies in different categories of franchise system are needed to broaden its generalizability. Third, the model was tested empirically in a sample in Beijing, more empirical tests of the proposed model in other Chinese areas are needed. Finally, the analysis in this study was solely based on the perception of franchisees and the opinions of franchisors were not included.

  • PDF

A Study on the Improvement of Flexible Working Hours (탄력적 근로시간제 개선에 대한 연구)

  • Kwon, Yong-man
    • Journal of Venture Innovation
    • /
    • v.5 no.3
    • /
    • pp.57-70
    • /
    • 2022
  • In modern industrial capitalism, the relationship between the provision of work and the receipt of wages has become an important principle governing society. According to the labor contract, the wages provided by entrusting the right to dispose of one's labor to the employer are directly compensated, and human life should be guaranteed and reproduced with proper rest. The establishment of labor relations under free contracts represents a problem in protecting workers, and accordingly, the maximum of working hours is set as a minimum right for workers, and the standard for minimum rest is set and assigned. The reduction of working hours is very important in terms of the quality of life of workers, but it is also an important issue in efficient corporate activities. As of 2020, Korea has 1,908 hours of annual working hours, the third lowest among OECD 37 countries in the happiness index surveyed by the Sustainable Development Solution Network(SDSN), an agency under the United Nations. Accordingly, the necessity of reducing working hours has been recognized, and the maximum working hours per week has been limited to 52 hours since 2018. In this situation, various working hours are legally excluded as a way to maintain the company's value-added creation and meet the diverse needs of workers, and Korea's Labor Standards Act restricts flexible working hours within three months, flexible working hours exceeding three months, selective working hours, and extended working hours. However, in the discussion on the application of the revised flexible working hours system in 2021 and the expansion of the settlement unit period recently discussed, there is a problem with the flexible working hours system, which needs to be improved. Therefore, this paper aims to examine the problems of the flexible working hours system and improvement measures. The flexible working hours system is a system that does not violate working hours even if the legal working hours are exceeded on a specific day or week according to a predetermined standard, and does not have to pay additional wages for excessive overtime work. It is mainly useful as a form of shift work in manufacturing, sales service, continuous business or electricity, gas, water, and transportation for long-term operations. It is also used as a way to shorten working hours, such as expanding holidays through short working days. However, if the settlement unit period is expanded, it is disadvantageous to workers as the additional wages that workers can receive will not be received. Therefore, First, in order to expand the settlement unit period currently under discussion, additional wages should be paid for the period expanded from the current standard. Second, it is necessary to improve the application of the flexible working hours system to individual workers to have sufficient consultation with individual workers in a written agreement with the worker representative, Third, clarify the allowable time for extended work during the settlement unit period, and Fourth, limit the daily working hours or apply to continuous rest. In addition, since the written agreement of the worker representative is an important issue in the application of the flexible working hours system, it is necessary to secure the representation of the worker representative.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

Analysis of Success Cases of InsurTech and Digital Insurance Platform Based on Artificial Intelligence Technologies: Focused on Ping An Insurance Group Ltd. in China (인공지능 기술 기반 인슈어테크와 디지털보험플랫폼 성공사례 분석: 중국 평안보험그룹을 중심으로)

  • Lee, JaeWon;Oh, SangJin
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.71-90
    • /
    • 2020
  • Recently, the global insurance industry is rapidly developing digital transformation through the use of artificial intelligence technologies such as machine learning, natural language processing, and deep learning. As a result, more and more foreign insurers have achieved the success of artificial intelligence technology-based InsurTech and platform business, and Ping An Insurance Group Ltd., China's largest private company, is leading China's global fourth industrial revolution with remarkable achievements in InsurTech and Digital Platform as a result of its constant innovation, using 'finance and technology' and 'finance and ecosystem' as keywords for companies. In response, this study analyzed the InsurTech and platform business activities of Ping An Insurance Group Ltd. through the ser-M analysis model to provide strategic implications for revitalizing AI technology-based businesses of domestic insurers. The ser-M analysis model has been studied so that the vision and leadership of the CEO, the historical environment of the enterprise, the utilization of various resources, and the unique mechanism relationships can be interpreted in an integrated manner as a frame that can be interpreted in terms of the subject, environment, resource and mechanism. As a result of the case analysis, Ping An Insurance Group Ltd. has achieved cost reduction and customer service development by digitally innovating its entire business area such as sales, underwriting, claims, and loan service by utilizing core artificial intelligence technologies such as facial, voice, and facial expression recognition. In addition, "online data in China" and "the vast offline data and insights accumulated by the company" were combined with new technologies such as artificial intelligence and big data analysis to build a digital platform that integrates financial services and digital service businesses. Ping An Insurance Group Ltd. challenged constant innovation, and as of 2019, sales reached $155 billion, ranking seventh among all companies in the Global 2000 rankings selected by Forbes Magazine. Analyzing the background of the success of Ping An Insurance Group Ltd. from the perspective of ser-M, founder Mammingz quickly captured the development of digital technology, market competition and changes in population structure in the era of the fourth industrial revolution, and established a new vision and displayed an agile leadership of digital technology-focused. Based on the strong leadership led by the founder in response to environmental changes, the company has successfully led InsurTech and Platform Business through innovation of internal resources such as investment in artificial intelligence technology, securing excellent professionals, and strengthening big data capabilities, combining external absorption capabilities, and strategic alliances among various industries. Through this success story analysis of Ping An Insurance Group Ltd., the following implications can be given to domestic insurance companies that are preparing for digital transformation. First, CEOs of domestic companies also need to recognize the paradigm shift in industry due to the change in digital technology and quickly arm themselves with digital technology-oriented leadership to spearhead the digital transformation of enterprises. Second, the Korean government should urgently overhaul related laws and systems to further promote the use of data between different industries and provide drastic support such as deregulation, tax benefits and platform provision to help the domestic insurance industry secure global competitiveness. Third, Korean companies also need to make bolder investments in the development of artificial intelligence technology so that systematic securing of internal and external data, training of technical personnel, and patent applications can be expanded, and digital platforms should be quickly established so that diverse customer experiences can be integrated through learned artificial intelligence technology. Finally, since there may be limitations to generalization through a single case of an overseas insurance company, I hope that in the future, more extensive research will be conducted on various management strategies related to artificial intelligence technology by analyzing cases of multiple industries or multiple companies or conducting empirical research.

Development of New Variables Affecting Movie Success and Prediction of Weekly Box Office Using Them Based on Machine Learning (영화 흥행에 영향을 미치는 새로운 변수 개발과 이를 이용한 머신러닝 기반의 주간 박스오피스 예측)

  • Song, Junga;Choi, Keunho;Kim, Gunwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.67-83
    • /
    • 2018
  • The Korean film industry with significant increase every year exceeded the number of cumulative audiences of 200 million people in 2013 finally. However, starting from 2015 the Korean film industry entered a period of low growth and experienced a negative growth after all in 2016. To overcome such difficulty, stakeholders like production company, distribution company, multiplex have attempted to maximize the market returns using strategies of predicting change of market and of responding to such market change immediately. Since a film is classified as one of experiential products, it is not easy to predict a box office record and the initial number of audiences before the film is released. And also, the number of audiences fluctuates with a variety of factors after the film is released. So, the production company and distribution company try to be guaranteed the number of screens at the opining time of a newly released by multiplex chains. However, the multiplex chains tend to open the screening schedule during only a week and then determine the number of screening of the forthcoming week based on the box office record and the evaluation of audiences. Many previous researches have conducted to deal with the prediction of box office records of films. In the early stage, the researches attempted to identify factors affecting the box office record. And nowadays, many studies have tried to apply various analytic techniques to the factors identified previously in order to improve the accuracy of prediction and to explain the effect of each factor instead of identifying new factors affecting the box office record. However, most of previous researches have limitations in that they used the total number of audiences from the opening to the end as a target variable, and this makes it difficult to predict and respond to the demand of market which changes dynamically. Therefore, the purpose of this study is to predict the weekly number of audiences of a newly released film so that the stakeholder can flexibly and elastically respond to the change of the number of audiences in the film. To that end, we considered the factors used in the previous studies affecting box office and developed new factors not used in previous studies such as the order of opening of movies, dynamics of sales. Along with the comprehensive factors, we used the machine learning method such as Random Forest, Multi Layer Perception, Support Vector Machine, and Naive Bays, to predict the number of cumulative visitors from the first week after a film release to the third week. At the point of the first and the second week, we predicted the cumulative number of visitors of the forthcoming week for a released film. And at the point of the third week, we predict the total number of visitors of the film. In addition, we predicted the total number of cumulative visitors also at the point of the both first week and second week using the same factors. As a result, we found the accuracy of predicting the number of visitors at the forthcoming week was higher than that of predicting the total number of them in all of three weeks, and also the accuracy of the Random Forest was the highest among the machine learning methods we used. This study has implications in that this study 1) considered various factors comprehensively which affect the box office record and merely addressed by other previous researches such as the weekly rating of audiences after release, the weekly rank of the film after release, and the weekly sales share after release, and 2) tried to predict and respond to the demand of market which changes dynamically by suggesting models which predicts the weekly number of audiences of newly released films so that the stakeholders can flexibly and elastically respond to the change of the number of audiences in the film.

Importance-Performance Analysis(IPA) of the selection attributes of functional cosmetics (기능성화장품 선택속성의 IPA(중요도-만족도) 분석)

  • Han, Do-Kyung;Lee, Hyun-Jun;Paik, Hyun-Dong;Shin, Dong-Kyoo;Park, Dae-Sub;Hwang, Hye-Sun;Hong, Wan-Soo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.6
    • /
    • pp.527-536
    • /
    • 2016
  • This study aims to generate baseline data for vitalizing the sales of functional cosmetics through an Importance-Performance Analysis (IPA) of the selection attributes of functional cosmetics. From the analysis of consumers' selection criteria, the study will assist functional cosmetics companies in reflecting consumer demands and therefore securing competitiveness. For this, general consumers aged over 20 years were surveyed for 5 weeks from Feb 23 through Mar 30, 2015, and 447 empirical data (response rate 88.9%) were processed through SPSS WIN 21.0 program for analysis. To conduct gender difference analysis on the IPA of the selection attributes of functional cosmetics, 17 selection attributes were categorized into 4 factors: functionality, labeling, popularity, and product. Cronbach's alpha for all factors was 0.5, proving the internal consistency and reliability of the survey. The survey results showed that while the entire average came out significantly higher for females (5.89/7points) than for males (5.66/7points) (p<0.001), the selection attributes 'anti-wrinkling', 'whitening function', 'functionality', 'expiration date', 'full ingredient labeling system' and 'various promotional events' showed significant gender differences. IPA results pertaining to gender showed 'price', 'functionality', 'spreadability' and 'full ingredient labeling system' as 2nd quadrant attributes, whereas female consumers selected 'price', 'whitening function', 'anti-wrinkling', 'functionality' and 'full ingredient labeling system' as attributes. Results show that businesses in the field of cosmetics and related areas need to prioritize improving the following factors that received low satisfaction from all consumers: 'price', 'functionality', and 'total labeling.' In particular, the 'price' aspects are considered to require reasonable and affordable pricing.

A Proposal for a Global Market Entry Strategy into the Korean Apparel Industry based on the Italian Fashion Industry - Use of Foreign Exhibitions and Showrooms - (이태리 패션산업을 근거로 본 한국 의류산업 해외진출을 위한 제언 - 박람회 및 쇼룸 활용 -)

  • Kim, Yong-Ju;Lee, Jin-Hee
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.32 no.12
    • /
    • pp.1903-1914
    • /
    • 2008
  • The purpose of this study was to propose an efficient and feasible global market entry strategy for the Korean apparel industry by analyzing the Italian fashion industry. In particular, the study investigated the role of foreign exhibitions and showrooms supported and organized by Italian fashion organizations. The methodology for this study was to analyze industrial reports, review previous studies and conduct in-depth interviews with 23 industry experts in Italy, Korea and LA. The results indicated that the most prominent factor in the Italian fashion industry was the fashion cluster, which is a strong and organic network of diverse fashion related areas No matter the size of the enterprise, firms can get practical, prompt and efficient support from diverse associations. The network operated by the associations provides strong support to each firm by organizing collections and exhibitions, and providing promotional activities. Showrooms and agents are another supportive "gate keeper", directly related to an enterprise's sales. However, Korean fashion firms did not have enough information or knowledge for foreign exhibitions, nor did they make aggressive promotional efforts in the global market. Despite the many fashion-related associations exist in Korea, their programs are too focused on visible accomplishments and are too oriented on "big company" and "big voice", rather than many "small firms". In conclusion, the Korean fashion industry-particularly the fashion industry in Seoul-has strong potential to become the center of the global fashion market in the future. However, the fashion support system that can act as the channel to promote firms and to meet global buyers needs to be supplemented. To feasibly create this system, government or industry associations should develop a strong and generous support system and network, and they must recognize the need for small firms to exist.

Building a Korean Sentiment Lexicon Using Collective Intelligence (집단지성을 이용한 한글 감성어 사전 구축)

  • An, Jungkook;Kim, Hee-Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.49-67
    • /
    • 2015
  • Recently, emerging the notion of big data and social media has led us to enter data's big bang. Social networking services are widely used by people around the world, and they have become a part of major communication tools for all ages. Over the last decade, as online social networking sites become increasingly popular, companies tend to focus on advanced social media analysis for their marketing strategies. In addition to social media analysis, companies are mainly concerned about propagating of negative opinions on social networking sites such as Facebook and Twitter, as well as e-commerce sites. The effect of online word of mouth (WOM) such as product rating, product review, and product recommendations is very influential, and negative opinions have significant impact on product sales. This trend has increased researchers' attention to a natural language processing, such as a sentiment analysis. A sentiment analysis, also refers to as an opinion mining, is a process of identifying the polarity of subjective information and has been applied to various research and practical fields. However, there are obstacles lies when Korean language (Hangul) is used in a natural language processing because it is an agglutinative language with rich morphology pose problems. Therefore, there is a lack of Korean natural language processing resources such as a sentiment lexicon, and this has resulted in significant limitations for researchers and practitioners who are considering sentiment analysis. Our study builds a Korean sentiment lexicon with collective intelligence, and provides API (Application Programming Interface) service to open and share a sentiment lexicon data with the public (www.openhangul.com). For the pre-processing, we have created a Korean lexicon database with over 517,178 words and classified them into sentiment and non-sentiment words. In order to classify them, we first identified stop words which often quite likely to play a negative role in sentiment analysis and excluded them from our sentiment scoring. In general, sentiment words are nouns, adjectives, verbs, adverbs as they have sentimental expressions such as positive, neutral, and negative. On the other hands, non-sentiment words are interjection, determiner, numeral, postposition, etc. as they generally have no sentimental expressions. To build a reliable sentiment lexicon, we have adopted a concept of collective intelligence as a model for crowdsourcing. In addition, a concept of folksonomy has been implemented in the process of taxonomy to help collective intelligence. In order to make up for an inherent weakness of folksonomy, we have adopted a majority rule by building a voting system. Participants, as voters were offered three voting options to choose from positivity, negativity, and neutrality, and the voting have been conducted on one of the largest social networking sites for college students in Korea. More than 35,000 votes have been made by college students in Korea, and we keep this voting system open by maintaining the project as a perpetual study. Besides, any change in the sentiment score of words can be an important observation because it enables us to keep track of temporal changes in Korean language as a natural language. Lastly, our study offers a RESTful, JSON based API service through a web platform to make easier support for users such as researchers, companies, and developers. Finally, our study makes important contributions to both research and practice. In terms of research, our Korean sentiment lexicon plays an important role as a resource for Korean natural language processing. In terms of practice, practitioners such as managers and marketers can implement sentiment analysis effectively by using Korean sentiment lexicon we built. Moreover, our study sheds new light on the value of folksonomy by combining collective intelligence, and we also expect to give a new direction and a new start to the development of Korean natural language processing.

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

  • Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.23-45
    • /
    • 2020
  • Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.