• Title/Summary/Keyword: Similarity Decision

Search Result 225, Processing Time 0.027 seconds

A Multi-Perspective Benchmarking Framework for Estimating Usable-Security of Hospital Management System Software Based on Fuzzy Logic, ANP and TOPSIS Methods

  • Kumar, Rajeev;Ansari, Md Tarique Jamal;Baz, Abdullah;Alhakami, Hosam;Agrawal, Alka;Khan, Raees Ahmad
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.1
    • /
    • pp.240-263
    • /
    • 2021
  • One of the biggest challenges that the software industry is facing today is to create highly efficient applications without affecting the quality of healthcare system software. The demand for the provision of software with high quality protection has seen a rapid increase in the software business market. Moreover, it is worthless to offer extremely user-friendly software applications with no ideal security. Therefore a need to find optimal solutions and bridge the difference between accessibility and protection by offering accessible software services for defense has become an imminent prerequisite. Several research endeavours on usable security assessments have been performed to fill the gap between functionality and security. In this context, several Multi-Criteria Decision Making (MCDM) approaches have been implemented on different usability and security attributes so as to assess the usable-security of software systems. However, only a few specific studies are based on using the integrated approach of fuzzy Analytic Network Process (FANP) and Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) technique for assessing the significant usable-security of hospital management software. Therefore, in this research study, the authors have employed an integrated methodology of fuzzy logic, ANP and TOPSIS to estimate the usable - security of Hospital Management System Software. For the intended objective, the study has taken into account 5 usable-security factors at first tier and 16 sub-factors at second tier with 6 hospital management system softwares as alternative solutions. To measure the weights of parameters and their relation with each other, Fuzzy ANP is implemented. Thereafter, Fuzzy TOPSIS methodology was employed and the rating of alternatives was calculated on the foundation of the proximity to the positive ideal solution.

High Noise Density Median Filter Method for Denoising Cancer Images Using Image Processing Techniques

  • Priyadharsini.M, Suriya;Sathiaseelan, J.G.R
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.11
    • /
    • pp.308-318
    • /
    • 2022
  • Noise is a serious issue. While sending images via electronic communication, Impulse noise, which is created by unsteady voltage, is one of the most common noises in digital communication. During the acquisition process, pictures were collected. It is possible to obtain accurate diagnosis images by removing these noises without affecting the edges and tiny features. The New Average High Noise Density Median Filter. (HNDMF) was proposed in this paper, and it operates in two steps for each pixel. Filter can decide whether the test pixels is degraded by SPN. In the first stage, a detector identifies corrupted pixels, in the second stage, an algorithm replaced by noise free processed pixel, the New average suggested Filter produced for this window. The paper examines the performance of Gaussian Filter (GF), Adaptive Median Filter (AMF), and PHDNF. In this paper the comparison of known image denoising is discussed and a new decision based weighted median filter used to remove impulse noise. Using Mean Square Error (MSE), Peak Signal to Noise Ratio (PSNR), and Structure Similarity Index Method (SSIM) metrics, the paper examines the performance of Gaussian Filter (GF), Adaptive Median Filter (AMF), and PHDNF. A detailed simulation process is performed to ensure the betterment of the presented model on the Mini-MIAS dataset. The obtained experimental values stated that the HNDMF model has reached to a better performance with the maximum picture quality. images affected by various amounts of pretend salt and paper noise, as well as speckle noise, are calculated and provided as experimental results. According to quality metrics, the HNDMF Method produces a superior result than the existing filter method. Accurately detect and replace salt and pepper noise pixel values with mean and median value in images. The proposed method is to improve the median filter with a significant change.

An effective automated ontology construction based on the agriculture domain

  • Deepa, Rajendran;Vigneshwari, Srinivasan
    • ETRI Journal
    • /
    • v.44 no.4
    • /
    • pp.573-587
    • /
    • 2022
  • The agricultural sector is completely different from other sectors since it completely relies on various natural and climatic factors. Climate changes have many effects, including lack of annual rainfall and pests, heat waves, changes in sea level, and global ozone/atmospheric CO2 fluctuation, on land and agriculture in similar ways. Climate change also affects the environment. Based on these factors, farmers chose their crops to increase productivity in their fields. Many existing agricultural ontologies are either domain-specific or have been created with minimal vocabulary and no proper evaluation framework has been implemented. A new agricultural ontology focused on subdomains is designed to assist farmers using Jaccard relative extractor (JRE) and Naïve Bayes algorithm. The JRE is used to find the similarity between two sentences and words in the agricultural documents and the relationship between two terms is identified via the Naïve Bayes algorithm. In the proposed method, the preprocessing of data is carried out through natural language processing techniques and the tags whose dimensions are reduced are subjected to rule-based formal concept analysis and mapping. The subdomain ontologies of weather, pest, and soil are built separately, and the overall agricultural ontology are built around them. The gold standard for the lexical layer is used to evaluate the proposed technique, and its performance is analyzed by comparing it with different state-of-the-art systems. Precision, recall, F-measure, Matthews correlation coefficient, receiver operating characteristic curve area, and precision-recall curve area are the performance metrics used to analyze the performance. The proposed methodology gives a precision score of 94.40% when compared with the decision tree(83.94%) and K-nearest neighbor algorithm(86.89%) for agricultural ontology construction.

Recommendation System of University Major Subject based on Deep Reinforcement Learning (심층 강화학습 기반의 대학 전공과목 추천 시스템)

  • Ducsun Lim;Youn-A Min;Dongkyun Lim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.4
    • /
    • pp.9-15
    • /
    • 2023
  • Existing simple statistics-based recommendation systems rely solely on students' course enrollment history data, making it difficult to identify classes that match students' preferences. To address this issue, this study proposes a personalized major subject recommendation system based on deep reinforcement learning (DRL). This system gauges the similarity between students based on structured data, such as the student's department, grade level, and course history. Based on this information, it recommends the most suitable major subjects by comprehensively considering information about each available major subject and evaluations of the student's courses. We confirmed that this DRL-based recommendation system provides useful insights for university students while selecting their major subjects, and our simulation results indicate that it outperforms conventional statistics-based recommendation systems by approximately 20%. In light of these results, we propose a new system that offers personalized subject recommendations by incorporating students' course evaluations. This system is expected to assist students significantly in finding major subjects that align with their preferences and academic goals.

Deep Learning-Based Lumen and Vessel Segmentation of Intravascular Ultrasound Images in Coronary Artery Disease

  • Gyu-Jun Jeong;Gaeun Lee;June-Goo Lee;Soo-Jin Kang
    • Korean Circulation Journal
    • /
    • v.54 no.1
    • /
    • pp.30-39
    • /
    • 2024
  • Background and Objectives: Intravascular ultrasound (IVUS) evaluation of coronary artery morphology is based on the lumen and vessel segmentation. This study aimed to develop an automatic segmentation algorithm and validate the performances for measuring quantitative IVUS parameters. Methods: A total of 1,063 patients were randomly assigned, with a ratio of 4:1 to the training and test sets. The independent data set of 111 IVUS pullbacks was obtained to assess the vessel-level performance. The lumen and external elastic membrane (EEM) boundaries were labeled manually in every IVUS frame with a 0.2-mm interval. The Efficient-UNet was utilized for the automatic segmentation of IVUS images. Results: At the frame-level, Efficient-UNet showed a high dice similarity coefficient (DSC, 0.93±0.05) and Jaccard index (JI, 0.87±0.08) for lumen segmentation, and demonstrated a high DSC (0.97±0.03) and JI (0.94±0.04) for EEM segmentation. At the vessel-level, there were close correlations between model-derived vs. experts-measured IVUS parameters; minimal lumen image area (r=0.92), EEM area (r=0.88), lumen volume (r=0.99) and plaque volume (r=0.95). The agreement between model-derived vs. expert-measured minimal lumen area was similarly excellent compared to the experts' agreement. The model-based lumen and EEM segmentation for a 20-mm lesion segment required 13.2 seconds, whereas manual segmentation with a 0.2-mm interval by an expert took 187.5 minutes on average. Conclusions: The deep learning models can accurately and quickly delineate vascular geometry. The artificial intelligence-based methodology may support clinicians' decision-making by real-time application in the catheterization laboratory.

The Effect of AD Noises Caused by AD Model Selection on Brand Awareness and Brand Attitudes (광고 모델 관련 광고 노이즈가 브랜드 인지도와 브랜드 태도에 미치는 영향)

  • Chung, Jai-Hak;Lee, Sang-Mi
    • Journal of Global Scholars of Marketing Science
    • /
    • v.18 no.3
    • /
    • pp.89-114
    • /
    • 2008
  • Most of the extant studies on communication effects have been devoted to the typical issue, "what types of communication activities are more effective for brand awareness or brand attitudes?" However, little research has addressed another question on communication decisions, "what makes communication activities less effective?" Our study focuses on factors negatively influenced on the efficiency of communication activities, especially of Advertising. Some studies have introduced concepts closely related to our topic such as consumer confusion, brand confusion, or belief confusion. Studies on product belief confusion have found some factors misleading consumers to misunderstand the physical features of products. Studies on brand confusion have uncovered factors making consumers confused on brand names. Studies on advertising confusion have tested the effects of ad models' employed by many other firms for different products on communication efficiency. We address a new concept, Ad noises, which are any factors interfering with consumers exposed to a particular advertisement in understanding messages provided by advertisements. The objective of this study is to understand the effects of ad noises caused by ad models on brand awareness and brand attitude. There are many different types of AD noises. Particularly, we study the effects of AD noises generated from ad model selection decision. Many companies want to employ celebrities as AD models while the number of celebrities who command a high degree of public and media attention are limited. Inevitably, several firms have been adopting the same celebrities as their AD models for different products. If the same AD model is adopted for TV commercials for different products, consumers exposed to those TV commercials are likely to fail to be aware of the target brand due to interference of TV commercials, for other products, employing the same AD model. This is an ad noise caused by employing ad models who have been exposed to consumers in other advertisements, which is the first type of ad noises studied in this research. Another type of AD noises is related to the decision of AD model replacement for the same product advertising. Firms sometimes launch another TV commercial for the same products. Some firms employ the same AD model for the new TV commercial for the same product and other firms employ new AD models for the new TV commercials for the same product. The typical problem with the replacement of AD models is the possibility of interfering with consumers in understanding messages of the TV commercial due to the dissimilarity of the old and new AD models. We studied the effects of these two types of ad noises, which are the typical factors influencing on the effect of communication: (1) ad noises caused by employing ad models who have been exposed to consumers in other advertisements and (2) ad noises caused by changing ad models with different images for same products. First, we measure the negative influence of AD noises on brand awareness and attitudes, in order to provide the importance of studying AD noises. Furthermore, our study unveiled the mediating conditions(variables) which can increase or decrease the effects of ad noises on brand awareness and attitudes. We study the effects of three mediating variables for ad noises caused by employing ad models who have been exposed to consumers in other advertisements: (1) the fit between product image and AD model image, (2) similarity between AD model images in multiple TV commercials employing the same AD model, and (3) similarity between products of which TV commercial employed the same AD model. We analyze the effects of another three mediating variables for ad noises caused by changing ad models with different images for same products: (1) the fit of old and new AD models for the same product, (2) similarity between AD model images in old and new TV commercials for the same product, and (3) concept similarity between old and new TV commercials for the same product. We summarized the empirical results from a field survey as follows. The employment of ad models who have been used in advertisements for other products has negative effects on both brand awareness and attitudes. our empirical study shows that it is possible to reduce the negative effects of ad models used for other products by choosing ad models whose images are relevant to the images of target products for the advertisement, by requiring ad models of images which are different from those of ad models in other advertisements, or by choosing ad models who have been shown in advertisements for other products which are not similar to the target product. The change of ad models for the same product advertisement can positively influence on brand awareness but positively on brand attitudes. Furthermore, the effects of ad model change can be weakened or strengthened depending on the relevancy of new ad models, the similarity of previous and current ad models, and the consistency of the previous and current ad messages.

  • PDF

A Study on the Effect of Network Centralities on Recommendation Performance (네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.23-46
    • /
    • 2021
  • Collaborative filtering, which is often used in personalization recommendations, is recognized as a very useful technique to find similar customers and recommend products to them based on their purchase history. However, the traditional collaborative filtering technique has raised the question of having difficulty calculating the similarity for new customers or products due to the method of calculating similaritiesbased on direct connections and common features among customers. For this reason, a hybrid technique was designed to use content-based filtering techniques together. On the one hand, efforts have been made to solve these problems by applying the structural characteristics of social networks. This applies a method of indirectly calculating similarities through their similar customers placed between them. This means creating a customer's network based on purchasing data and calculating the similarity between the two based on the features of the network that indirectly connects the two customers within this network. Such similarity can be used as a measure to predict whether the target customer accepts recommendations. The centrality metrics of networks can be utilized for the calculation of these similarities. Different centrality metrics have important implications in that they may have different effects on recommended performance. In this study, furthermore, the effect of these centrality metrics on the performance of recommendation may vary depending on recommender algorithms. In addition, recommendation techniques using network analysis can be expected to contribute to increasing recommendation performance even if they apply not only to new customers or products but also to entire customers or products. By considering a customer's purchase of an item as a link generated between the customer and the item on the network, the prediction of user acceptance of recommendation is solved as a prediction of whether a new link will be created between them. As the classification models fit the purpose of solving the binary problem of whether the link is engaged or not, decision tree, k-nearest neighbors (KNN), logistic regression, artificial neural network, and support vector machine (SVM) are selected in the research. The data for performance evaluation used order data collected from an online shopping mall over four years and two months. Among them, the previous three years and eight months constitute social networks composed of and the experiment was conducted by organizing the data collected into the social network. The next four months' records were used to train and evaluate recommender models. Experiments with the centrality metrics applied to each model show that the recommendation acceptance rates of the centrality metrics are different for each algorithm at a meaningful level. In this work, we analyzed only four commonly used centrality metrics: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. Eigenvector centrality records the lowest performance in all models except support vector machines. Closeness centrality and betweenness centrality show similar performance across all models. Degree centrality ranking moderate across overall models while betweenness centrality always ranking higher than degree centrality. Finally, closeness centrality is characterized by distinct differences in performance according to the model. It ranks first in logistic regression, artificial neural network, and decision tree withnumerically high performance. However, it only records very low rankings in support vector machine and K-neighborhood with low-performance levels. As the experiment results reveal, in a classification model, network centrality metrics over a subnetwork that connects the two nodes can effectively predict the connectivity between two nodes in a social network. Furthermore, each metric has a different performance depending on the classification model type. This result implies that choosing appropriate metrics for each algorithm can lead to achieving higher recommendation performance. In general, betweenness centrality can guarantee a high level of performance in any model. It would be possible to consider the introduction of proximity centrality to obtain higher performance for certain models.

Personalized Recommendation System for IPTV using Ontology and K-medoids (IPTV환경에서 온톨로지와 k-medoids기법을 이용한 개인화 시스템)

  • Yun, Byeong-Dae;Kim, Jong-Woo;Cho, Yong-Seok;Kang, Sang-Gil
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.147-161
    • /
    • 2010
  • As broadcasting and communication are converged recently, communication is jointed to TV. TV viewing has brought about many changes. The IPTV (Internet Protocol Television) provides information service, movie contents, broadcast, etc. through internet with live programs + VOD (Video on demand) jointed. Using communication network, it becomes an issue of new business. In addition, new technical issues have been created by imaging technology for the service, networking technology without video cuts, security technologies to protect copyright, etc. Through this IPTV network, users can watch their desired programs when they want. However, IPTV has difficulties in search approach, menu approach, or finding programs. Menu approach spends a lot of time in approaching programs desired. Search approach can't be found when title, genre, name of actors, etc. are not known. In addition, inserting letters through remote control have problems. However, the bigger problem is that many times users are not usually ware of the services they use. Thus, to resolve difficulties when selecting VOD service in IPTV, a personalized service is recommended, which enhance users' satisfaction and use your time, efficiently. This paper provides appropriate programs which are fit to individuals not to save time in order to solve IPTV's shortcomings through filtering and recommendation-related system. The proposed recommendation system collects TV program information, the user's preferred program genres and detailed genre, channel, watching program, and information on viewing time based on individual records of watching IPTV. To look for these kinds of similarities, similarities can be compared by using ontology for TV programs. The reason to use these is because the distance of program can be measured by the similarity comparison. TV program ontology we are using is one extracted from TV-Anytime metadata which represents semantic nature. Also, ontology expresses the contents and features in figures. Through world net, vocabulary similarity is determined. All the words described on the programs are expanded into upper and lower classes for word similarity decision. The average of described key words was measured. The criterion of distance calculated ties similar programs through K-medoids dividing method. K-medoids dividing method is a dividing way to divide classified groups into ones with similar characteristics. This K-medoids method sets K-unit representative objects. Here, distance from representative object sets temporary distance and colonize it. Through algorithm, when the initial n-unit objects are tried to be divided into K-units. The optimal object must be found through repeated trials after selecting representative object temporarily. Through this course, similar programs must be colonized. Selecting programs through group analysis, weight should be given to the recommendation. The way to provide weight with recommendation is as the follows. When each group recommends programs, similar programs near representative objects will be recommended to users. The formula to calculate the distance is same as measure similar distance. It will be a basic figure which determines the rankings of recommended programs. Weight is used to calculate the number of watching lists. As the more programs are, the higher weight will be loaded. This is defined as cluster weight. Through this, sub-TV programs which are representative of the groups must be selected. The final TV programs ranks must be determined. However, the group-representative TV programs include errors. Therefore, weights must be added to TV program viewing preference. They must determine the finalranks.Based on this, our customers prefer proposed to recommend contents. So, based on the proposed method this paper suggested, experiment was carried out in controlled environment. Through experiment, the superiority of the proposed method is shown, compared to existing ways.

Research Framework for International Franchising (국제프랜차이징 연구요소 및 연구방향)

  • Kim, Ju-Young;Lim, Young-Kyun;Shim, Jae-Duck
    • Journal of Global Scholars of Marketing Science
    • /
    • v.18 no.4
    • /
    • pp.61-118
    • /
    • 2008
  • The purpose of this research is to construct research framework for international franchising based on existing literature and to identify research components in the framework. Franchise can be defined as management styles that allow franchisee use various management assets of franchisor in order to make or sell product or service. It can be divided into product distribution franchise that is designed to sell products and business format franchise that is designed for running it as business whatever its form is. International franchising can be defined as a way of internationalization of franchisor to foreign country by providing its business format or package to franchisee of host country. International franchising is growing fast for last four decades but academic research on this is quite limited. Especially in Korea, research about international franchising is carried out on by case study format with single case or empirical study format with survey based on domestic franchise theory. Therefore, this paper tries to review existing literature on international franchising research, providing research framework, and then stimulating new research on this field. International franchising research components include motives and environmental factors for decision of expanding to international franchising, entrance modes and development plan for international franchising, contracts and management strategy of international franchising, and various performance measures from different perspectives. First, motives of international franchising are fee collection from franchisee. Also it provides easier way to expanding to foreign country. The other motives including increase total sales volume, occupying better strategic position, getting quality resources, and improving efficiency. Environmental factors that facilitating international franchising encompasses economic condition, trend, and legal or political factors in host and/or home countries. In addition, control power and risk management capability of franchisor plays critical role in successful franchising contract. Final decision to enter foreign country via franchising is determined by numerous factors like history, size, growth, competitiveness, management system, bonding capability, industry characteristics of franchisor. After deciding to enter into foreign country, franchisor needs to set entrance modes of international franchising. Within contractual mode, there are master franchising and area developing franchising, licensing, direct franchising, and joint venture. Theories about entrance mode selection contain concepts of efficiency, knowledge-based approach, competence-based approach, agent theory, and governance cost. The next step after entrance decision is operation strategy. Operation strategy starts with selecting a target city and a target country for franchising. In order to finding, screening targets, franchisor needs to collect information about candidates. Critical information includes brand patent, commercial laws, regulations, market conditions, country risk, and industry analysis. After selecting a target city in target country, franchisor needs to select franchisee, in other word, partner. The first important criteria for selecting partners are financial credibility and capability, possession of real estate. And cultural similarity and knowledge about franchisor and/or home country are also recognized as critical criteria. The most important element in operating strategy is legal document between franchisor and franchisee with home and host countries. Terms and conditions in legal documents give objective information about characteristics of franchising agreement for academic research. Legal documents have definitions of terminology, territory and exclusivity, agreement of term, initial fee, continuing fees, clearing currency, and rights about sub-franchising. Also, legal documents could have terms about softer elements like training program and operation manual. And harder elements like law competent court and terms of expiration. Next element in operating strategy is about product and service. Especially for business format franchising, product/service deliverable, benefit communicators, system identifiers (architectural features), and format facilitators are listed for product/service strategic elements. Another important decision on product/service is standardization vs. customization. The rationale behind standardization is cost reduction, efficiency, consistency, image congruence, brand awareness, and competitiveness on price. Also standardization enables large scale R&D and innovative change in management style. Another element in operating strategy is control management. The simple way to control franchise contract is relying on legal terms, contractual control system. There are other control systems, administrative control system and ethical control system. Contractual control system is a coercive source of power, but franchisor usually doesn't want to use legal power since it doesn't help to build up positive relationship. Instead, self-regulation is widely used. Administrative control system uses control mechanism from ordinary work relationship. Its main component is supporting activities to franchisee and communication method. For example, franchisor provides advertising, training, manual, and delivery, then franchisee follows franchisor's direction. Another component is building franchisor's brand power. The last research element is performance factor of international franchising. Performance elements can be divided into franchisor's performance and franchisee's performance. The conceptual performance measures of franchisor are simple but not easy to obtain objectively. They are profit, sale, cost, experience, and brand power. The performance measures of franchisee are mostly about benefits of host country. They contain small business development, promotion of employment, introduction of new business model, and level up technology status. There are indirect benefits, like increase of tax, refinement of corporate citizenship, regional economic clustering, and improvement of international balance. In addition to those, host country gets socio-cultural change other than economic effects. It includes demographic change, social trend, customer value change, social communication, and social globalization. Sometimes it is called as westernization or McDonaldization of society. In addition, the paper reviews on theories that have been frequently applied to international franchising research, such as agent theory, resource-based view, transaction cost theory, organizational learning theory, and international expansion theories. Resource based theory is used in strategic decision based on resources, like decision about entrance and cooperation depending on resources of franchisee and franchisor. Transaction cost theory can be applied in determination of mutual trust or satisfaction of franchising players. Agent theory tries to explain strategic decision for reducing problem caused by utilizing agent, for example research on control system in franchising agreements. Organizational Learning theory is relatively new in franchising research. It assumes organization tries to maximize performance and learning of organization. In addition, Internalization theory advocates strategic decision of direct investment for removing inefficiency of market transaction and is applied in research on terms of contract. And oligopolistic competition theory is used to explain various entry modes for international expansion. Competency theory support strategic decision of utilizing key competitive advantage. Furthermore, research methodologies including qualitative and quantitative methodologies are suggested for more rigorous international franchising research. Quantitative research needs more real data other than survey data which is usually respondent's judgment. In order to verify theory more rigorously, research based on real data is essential. However, real quantitative data is quite hard to get. The qualitative research other than single case study is also highly recommended. Since international franchising has limited number of applications, scientific research based on grounded theory and ethnography study can be used. Scientific case study is differentiated with single case study on its data collection method and analysis method. The key concept is triangulation in measurement, logical coding and comparison. Finally, it provides overall research direction for international franchising after summarizing research trend in Korea. International franchising research in Korea has two different types, one is for studying Korean franchisor going overseas and the other is for Korean franchisee of foreign franchisor. Among research on Korean franchisor, two common patterns are observed. First of all, they usually deal with success story of one franchisor. The other common pattern is that they focus on same industry and country. Therefore, international franchise research needs to extend their focus to broader subjects with scientific research methodology as well as development of new theory.

  • PDF

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.