• Title/Summary/Keyword: 발굴

Search Result 2,703, Processing Time 0.034 seconds

A Study on the Digital Drawing of Archaeological Relics Using Open-Source Software (오픈소스 소프트웨어를 활용한 고고 유물의 디지털 실측 연구)

  • LEE Hosun;AHN Hyoungki
    • Korean Journal of Heritage: History & Science
    • /
    • v.57 no.1
    • /
    • pp.82-108
    • /
    • 2024
  • With the transition of archaeological recording method's transition from analog to digital, the 3D scanning technology has been actively adopted within the field. Research on the digital archaeological digital data gathered from 3D scanning and photogrammetry is continuously being conducted. However, due to cost and manpower issues, most buried cultural heritage organizations are hesitating to adopt such digital technology. This paper aims to present a digital recording method of relics utilizing open-source software and photogrammetry technology, which is believed to be the most efficient method among 3D scanning methods. The digital recording process of relics consists of three stages: acquiring a 3D model, creating a joining map with the edited 3D model, and creating an digital drawing. In order to enhance the accessibility, this method only utilizes open-source software throughout the entire process. The results of this study confirms that in terms of quantitative evaluation, the deviation of numerical measurement between the actual artifact and the 3D model was minimal. In addition, the results of quantitative quality analysis from the open-source software and the commercial software showed high similarity. However, the data processing time was overwhelmingly fast for commercial software, which is believed to be a result of high computational speed from the improved algorithm. In qualitative evaluation, some differences in mesh and texture quality occurred. In the 3D model generated by opensource software, following problems occurred: noise on the mesh surface, harsh surface of the mesh, and difficulty in confirming the production marks of relics and the expression of patterns. However, some of the open source software did generate the quality comparable to that of commercial software in quantitative and qualitative evaluations. Open-source software for editing 3D models was able to not only post-process, match, and merge the 3D model, but also scale adjustment, join surface production, and render image necessary for the actual measurement of relics. The final completed drawing was tracked by the CAD program, which is also an open-source software. In archaeological research, photogrammetry is very applicable to various processes, including excavation, writing reports, and research on numerical data from 3D models. With the breakthrough development of computer vision, the types of open-source software have been diversified and the performance has significantly improved. With the high accessibility to such digital technology, the acquisition of 3D model data in archaeology will be used as basic data for preservation and active research of cultural heritage.

Reevaluating the National Museum of Korea's Evacuation and Exhibition Projects in the 1950s (6.25 전쟁기 국립박물관 소장품의 국외반출 과정에 대한 신고찰)

  • KIM Hyunjung
    • Korean Journal of Heritage: History & Science
    • /
    • v.57 no.1
    • /
    • pp.198-216
    • /
    • 2024
  • This article reevaluates the National Museum of Korea's pivotal actions during the Korean War in the 1950s and its aftermath. It argues that the evacuation of the museum's collection to Busan and the subsequent exhibition "Masterpieces of Korean Art" in the United States in 1957 were not isolated events, but rather interconnected facets of a larger narrative shaping the museum's trajectory. With newly discovered archival evidence, this study unravels the intricate relationship between these episodes, revealing how the initial Busan evacuation evolved into a strategic U.S.-led touring exhibition. Traditionally, the Busan evacuation has been understood solely as a four-stage relocation of the museum's collections between December 1950 and May 1951. However, this overlooks the broader context, particularly the subsequent U.S. journey. Driven by the war's initial retreat of the war, the Busan evacuation served as a stepping stone for evacuation to Honolulu Museum of Art. The path of evacuation took an unexpected turn when the government redirected the collections to the Honolulu Museum of Art. Initially conceived as a storage solution, public opposition led to a remarkable transformation: the U.S. exhibition. To address public concerns, the evacuation plan was canceled. This shift transformed the planned introduction into a full-fledged traveling exhibition. Subsequently approved by the National Assembly, the U.S. Department of State spearheaded development of the exhibition, marking a distinct strategic cultural policy shift for Korea. Therefore, the Busan evacuation, initially envisioned as a temporary introduction to the U.S., ultimately metamorphosed into a multi-stage U.S. touring exhibition orchestrated by the U.S. Department of State. This reframed narrative sheds new light on the museum's crucial role in navigating a complex postwar landscape, revealing the intricate interplay between cultural preservation, public diplomacy, and strategic national interests.

Effects of oxypeucedanin hydrate isolated from Angelica dahurica on myoblast differentiation in association with mitochondrial function (백지에서 추출한 oxypeucedanin hydrate의 미토콘드리아 기능 관련 근생성 효과)

  • Eun-Ju Song;Ji-Won Heo;Jee Hee Jang;Yoon-Ju Kwon;Yun Hee Jeong;Min Jung Kim;Sung-Eun Kim
    • Journal of Nutrition and Health
    • /
    • v.57 no.1
    • /
    • pp.53-64
    • /
    • 2024
  • Purpose: Mitochondria play a crucial role in preserving skeletal muscle mass, and damage to mitochondria leads to muscle mass loss. This study investigated the effects of oxypeucedanin hydrate, a furanocoumarin isolated from Angelica dahurica radix, on myogenesis and mitochondrial function in vitro and in zebrafish models. Methods: C2C12 myotubes cultured in media containing 0.1, 1, 10, or 100 ng/mL oxypeucedanin hydrate were immunostained with myosin heavy chain (MHC), and then multinucleated MHC-positive cells were counted. The expressions of markers related to muscle differentiation, muscle protein degradation, and mitochondrial function were determined by quantitative reverse transcription polymerase chain reaction. To investigate the effects of oxypeucedanin hydrate on mitochondrial dysfunction, Tg(Xla.Eef1a1:mito-EGFP) zebrafish embryos were treated with 5-fluorouracil, leucovorin, and irinotecan (FOLFIRI) with or without oxypeucedanin hydrate and analyzed for mito-EGFP intensity and mitochondrial length. Results: Oxypeucedanin hydrate significantly increased MHC-positive multinucleated myotubes (≥ 3 nuclei) and increased the expression of the myogenic marker myosin heavy chain 4. However, it decreased the expressions of muscle-specific RING finger protein 1 and muscle atrophy f-box (markers of muscle protein degradation). Furthermore, oxypeucedanin hydrate enhanced the expressions of markers of mitochondrial biogenesis (peroxisome proliferator-activated receptor-gamma coactivator 1 alpha, transcription factor a mitochondrial, succinate dehydrogenase complex flavoprotein subunit A, and cytochrome c oxidase subunit 1) and mitochondrial fusion (optic atrophy 1). However, it reduced the expression of dynamin-related protein 1 (a mitochondrial fission regulator). Consistently, oxypeucedanin hydrate reduced FOLFIRI-induced mitochondrial dysfunction in the skeletal muscles of zebrafish embryos. Conclusion: The study indicates that oxypeucedanin hydrate promotes myogenesis by improving mitochondrial function, and thus, suggests oxypeucedanin hydrate has potential use as a nutritional supplement that improves muscle mass and function.

Reexamination of Ancient Ironmaking Technology Restoration Experiment Operating Methods (고대 제철기술 복원실험 조업방식에 대한 재검토 - 국립중원문화유산연구소 1~8차 복원실험을 중심으로 -)

  • CHOI Yeongmin;JEONG Gyeonghwa
    • Korean Journal of Heritage: History & Science
    • /
    • v.57 no.2
    • /
    • pp.6-25
    • /
    • 2024
  • This study concentrated on a report on the results of smelting experiments conducted eight times by the Jungwon National Research Institute of Cultural Heritage, put together the goals and results of the operation, and examined changes in the content of experiments and in the experimental results. First, changes related to operation, such as the ratio of raw materials to fuel and the presence or absence of additives, were reviewed depending on the operation goal. In addition, the results of metallurgical analysis of raw materials, formations, and byproducts were summarized and reviewed by comparing them with materials excavated from the ruins. The operation method varied up to the eighth smelting experiment in terms of iron ore roasting, additives, and raw material/fuel ratio. After reviewing the results again, pure iron with a low carbon content began to be confirmed through metallurgical analysis. As a result, it was confirmed that the charging ratio of raw materials and fuel plays an important role depending on the purpose of production. In addition, most of the products are gray cast iron, and it was deemed that this is due to changes in the internal structure of the pig iron while it was left in the furnace for a long time. The iron was an ingot that was in a molten state even though the carbon content did not reach 4.3%, where the process reaction takes place, and it was deemed to have been caused by excessive operating temperature. Based on the previously reviewed results and the structure and shape of the experimental furnace used in other ironmaking technology restoration experiments, this study finally attempted to restore the structure of an ancient iron smelting furnace, including the furnace's upper structure. By comprehensively referring to the remaining conditions of the excavated iron smelting furnace and the characteristics of the blow pipe, the form of the ancient iron smelting furnace was subdivided into six categories: furnace wall thickness, furnace height, blower height, blow pipe size, furnace inner wall shape, and top shape, and a restoration plan was proposed. To improve the problems of the restoration plan and the Jungwon National Research Institute of Cultural Heritage's experiments that have been conducted through continuous trial and error, an experiment that reflects changes in operating methods by lowering the furnace height and controlling the blowing volume is necessary.

A Study on Ontology and Topic Modeling-based Multi-dimensional Knowledge Map Services (온톨로지와 토픽모델링 기반 다차원 연계 지식맵 서비스 연구)

  • Jeong, Hanjo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.79-92
    • /
    • 2015
  • Knowledge map is widely used to represent knowledge in many domains. This paper presents a method of integrating the national R&D data and assists of users to navigate the integrated data via using a knowledge map service. The knowledge map service is built by using a lightweight ontology and a topic modeling method. The national R&D data is integrated with the research project as its center, i.e., the other R&D data such as research papers, patents, and reports are connected with the research project as its outputs. The lightweight ontology is used to represent the simple relationships between the integrated data such as project-outputs relationships, document-author relationships, and document-topic relationships. Knowledge map enables us to infer further relationships such as co-author and co-topic relationships. To extract the relationships between the integrated data, a Relational Data-to-Triples transformer is implemented. Also, a topic modeling approach is introduced to extract the document-topic relationships. A triple store is used to manage and process the ontology data while preserving the network characteristics of knowledge map service. Knowledge map can be divided into two types: one is a knowledge map used in the area of knowledge management to store, manage and process the organizations' data as knowledge, the other is a knowledge map for analyzing and representing knowledge extracted from the science & technology documents. This research focuses on the latter one. In this research, a knowledge map service is introduced for integrating the national R&D data obtained from National Digital Science Library (NDSL) and National Science & Technology Information Service (NTIS), which are two major repository and service of national R&D data servicing in Korea. A lightweight ontology is used to design and build a knowledge map. Using the lightweight ontology enables us to represent and process knowledge as a simple network and it fits in with the knowledge navigation and visualization characteristics of the knowledge map. The lightweight ontology is used to represent the entities and their relationships in the knowledge maps, and an ontology repository is created to store and process the ontology. In the ontologies, researchers are implicitly connected by the national R&D data as the author relationships and the performer relationships. A knowledge map for displaying researchers' network is created, and the researchers' network is created by the co-authoring relationships of the national R&D documents and the co-participation relationships of the national R&D projects. To sum up, a knowledge map-service system based on topic modeling and ontology is introduced for processing knowledge about the national R&D data such as research projects, papers, patent, project reports, and Global Trends Briefing (GTB) data. The system has goals 1) to integrate the national R&D data obtained from NDSL and NTIS, 2) to provide a semantic & topic based information search on the integrated data, and 3) to provide a knowledge map services based on the semantic analysis and knowledge processing. The S&T information such as research papers, research reports, patents and GTB are daily updated from NDSL, and the R&D projects information including their participants and output information are updated from the NTIS. The S&T information and the national R&D information are obtained and integrated to the integrated database. Knowledge base is constructed by transforming the relational data into triples referencing R&D ontology. In addition, a topic modeling method is employed to extract the relationships between the S&T documents and topic keyword/s representing the documents. The topic modeling approach enables us to extract the relationships and topic keyword/s based on the semantics, not based on the simple keyword/s. Lastly, we show an experiment on the construction of the integrated knowledge base using the lightweight ontology and topic modeling, and the knowledge map services created based on the knowledge base are also introduced.

The Health Behavior Patterns of Some Rural Residents in Korea and Their Association with Health Status and Health Management Practice (일부 농촌주민의 건강행위유형과 건강상태 및 건강관련실태와의 관련성)

  • Kim, Young-Gab;Kang, Myung-Guen;Ryu, So-Yeon;Kim, Ki-Soon;Kang, Sung-Deuk
    • Journal of agricultural medicine and community health
    • /
    • v.29 no.1
    • /
    • pp.43-63
    • /
    • 2004
  • Objectives: The purpose of this study was to classify the patterns of health behaviors of some rural residents in Korea by sub-grouping them into populations with similar patterns of diet quality, physical activity, alcohol consumption and cigarette smoking, and then to investigate the relationship between these health behavior patterns and health status or health management of them. Methods: The study subjects were 722 rural residents above 20 years old on a typical rural district in Korea, and the data used in this study was from the survey data for health planning of a health center. Study questionnaire for this survey was developed from modifying the questionnaire for 'National Nutrition and Health Study' conducted in 1998. To classify health behavior patterns, cluster analysis was conducted. And to test the association of health behavior patterns with health status or health management, multiple logistic regression analysis were conducted. Results: The results and their implications of this study were as follows: 1. We identified six health behavior typologies : 67.8% of the sample had a good diet quality but showed sedentary activity level(good diet lifestyle) and 10.9% had heavy smoking behavior(smoking lifestyle). Individuals included in fitness lifestyle cluster(6.2%) had high physical activity level and those in drinking life style(2.6%) had had mainly large amount of alcohol. Zero point six percent of sample were included in hedonic lifestyle cluster, who showed poor health behaviors in all. Those included in passive lifestyle(11.9%) had no active health promoting activities but tended to avoid risk taking health behavior such as cigarette smoking and alcohol drinking. 2. As a result of logistic regression analysis, to compare with the individuals in good diet lifestyle, the prevalence of chronic diseases of those in fitness lifestyle showed higher and that of those in smoking lifestyle, drinking lifestyle, hedonic lifestyle, passive lifestyle showed lower than them, retrospectively. 3. Adjusting with general characteristics and health status, to compare with the individuals in good diet lifestyle, the proportion of those who had good health management practices in fitness lifestyle was higher, and the proportion of those who had health check in past 2 years was lower than them, retrospectively. Conclusions: There were some differences in health behavior patterns between rural population and national population, which influenced significantly on health status and health management practice of them. We suggested that the health promotion program for them be developed with considering these points.

  • PDF

Characristics and Management Plans of Myeongwoldae and Myeongwol Village Groves Located in, Jeju (제주 팽림월대(彭林月臺)의 경관특성 및 관리방안)

  • Rho, Jae-Hyun;Oh, Hyun-Kyung;Chol, Yung-Hyun;Kahng, Byung-Seon;Kim, Young-Suk
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.32 no.2
    • /
    • pp.68-81
    • /
    • 2014
  • This study was conducted to identify the spacialty, to illuminate the existence and values of Myeongwoldae(明月臺) and Forest Myeongwol, and to suggest the sustainable usage, preservation and management plans with the purpose of ecological and cultural landscaping characteristic and value identification. The result of the study is as follows. Castle Myeongwol and Port Myeongwol shows the status of Hallim-eup Myeongwol District which is the administrative center of western Jeju as well as is the fortress. Building Wolgyejeongsa and School Woohakdang, the head temple of education and culture, located in Myeongwol District represents the spaciality of Myeonwol-ri which was the center of education. Stand Myeongwol is one of the most representative Confucian cultural landscapes in Jeju Island and the field of communion with nature where scholars enjoy poetries, nature, changgi(Korean chess), and go in the Joseon Dynasty period. It was found that the current relics of Myeongwoldae was recovered through the maintenance project conducted by Youth Group Myeongwol composed with Hongjong-si(洪鍾時) as the center during the Japanese colonial era in 1931. It seems that the stonework of Myeongwoldae composed of three levels in the order of square, octagon, and circle based on the heaven-man unity theory of Confucianism and the octagon in the middle is the messenger of Cheonwonjibang(天圓地方), in other words, between the square-shaped earth and the circle-shaped sky. It is assumed that both Grand Bridge Myeongwol and Bridge Myeongwol were constructed as arched bridges in early days. Bridge Myeongwol is the only arched bridge remaining in Jeju Island now, which has the modern cultural heritage value. In Forest Myeongwol, 97 taxa of plants were confirmed and in accordance with 'Taxonomic Group and Class Criteria of Floristic Specific Plants', eight taxa were found; Arachniodes aristata of FD IV and Ilex cornuta, Piper kadsura, Litsea japonica, Melia azedarach, Xylosma congestum, Richosanthes kirilowii var. japonica, Dichondra repens, Viburnum odoratissimum var. awabuki of FD III. Otherwise, 14 taxa of naturalized plants including Apium leptophylihum which is imported to Jeju Island only were confirmed. In Forest Myeongwol, 77 trees including 41 Celtis sinensis, 30 Aphananthe aspera, two Wylosma congestum, a Pinus densiflora, a Camellia japonica, a Melia azedarach, and an Ilex cornuta form a colony. Based on the researched data, the preservation and plans of Myeongwoldae and Forest Myeongwol is suggested as follows. Myeongwoldae, Bridge Myeongwol, and Forest Myeongwol should be managed as one integrated division. Bridge Myeongwol, an arched bridge which is hard to be found in Jeju Island is a high-standard stonework requiring long-term preservation plans. Otherwise, Grand Bridge Myeongwol that is exposed to accident risks because of deterioration and needs safety diagnosis requires measures according to the result of precise safety diagnosis. It is desirable to restore it to a two-sluice arched bridge as its initial shape and to preserve and use it as a representative local landmark with Stand Myeongwol. In addition, considering the topophsis based on the analysis result, the current name of Jeju Special Self-Governing Province Monument No. 19 'Myoengwol Hackberry Colony' should change to 'Myeongwol Hackberry-Muku Tree Colony'. In addition, the serial number system which is composed without distinction of hackberry and muku tree should be improved and the regular monitoring of big and old trees, specific plants, and naturalized species is required.

Discovering Promising Convergence Technologies Using Network Analysis of Maturity and Dependency of Technology (기술 성숙도 및 의존도의 네트워크 분석을 통한 유망 융합 기술 발굴 방법론)

  • Choi, Hochang;Kwahk, Kee-Young;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.101-124
    • /
    • 2018
  • Recently, most of the technologies have been developed in various forms through the advancement of single technology or interaction with other technologies. Particularly, these technologies have the characteristic of the convergence caused by the interaction between two or more techniques. In addition, efforts in responding to technological changes by advance are continuously increasing through forecasting promising convergence technologies that will emerge in the near future. According to this phenomenon, many researchers are attempting to perform various analyses about forecasting promising convergence technologies. A convergence technology has characteristics of various technologies according to the principle of generation. Therefore, forecasting promising convergence technologies is much more difficult than forecasting general technologies with high growth potential. Nevertheless, some achievements have been confirmed in an attempt to forecasting promising technologies using big data analysis and social network analysis. Studies of convergence technology through data analysis are actively conducted with the theme of discovering new convergence technologies and analyzing their trends. According that, information about new convergence technologies is being provided more abundantly than in the past. However, existing methods in analyzing convergence technology have some limitations. Firstly, most studies deal with convergence technology analyze data through predefined technology classifications. The technologies appearing recently tend to have characteristics of convergence and thus consist of technologies from various fields. In other words, the new convergence technologies may not belong to the defined classification. Therefore, the existing method does not properly reflect the dynamic change of the convergence phenomenon. Secondly, in order to forecast the promising convergence technologies, most of the existing analysis method use the general purpose indicators in process. This method does not fully utilize the specificity of convergence phenomenon. The new convergence technology is highly dependent on the existing technology, which is the origin of that technology. Based on that, it can grow into the independent field or disappear rapidly, according to the change of the dependent technology. In the existing analysis, the potential growth of convergence technology is judged through the traditional indicators designed from the general purpose. However, these indicators do not reflect the principle of convergence. In other words, these indicators do not reflect the characteristics of convergence technology, which brings the meaning of new technologies emerge through two or more mature technologies and grown technologies affect the creation of another technology. Thirdly, previous studies do not provide objective methods for evaluating the accuracy of models in forecasting promising convergence technologies. In the studies of convergence technology, the subject of forecasting promising technologies was relatively insufficient due to the complexity of the field. Therefore, it is difficult to find a method to evaluate the accuracy of the model that forecasting promising convergence technologies. In order to activate the field of forecasting promising convergence technology, it is important to establish a method for objectively verifying and evaluating the accuracy of the model proposed by each study. To overcome these limitations, we propose a new method for analysis of convergence technologies. First of all, through topic modeling, we derive a new technology classification in terms of text content. It reflects the dynamic change of the actual technology market, not the existing fixed classification standard. In addition, we identify the influence relationships between technologies through the topic correspondence weights of each document, and structuralize them into a network. In addition, we devise a centrality indicator (PGC, potential growth centrality) to forecast the future growth of technology by utilizing the centrality information of each technology. It reflects the convergence characteristics of each technology, according to technology maturity and interdependence between technologies. Along with this, we propose a method to evaluate the accuracy of forecasting model by measuring the growth rate of promising technology. It is based on the variation of potential growth centrality by period. In this paper, we conduct experiments with 13,477 patent documents dealing with technical contents to evaluate the performance and practical applicability of the proposed method. As a result, it is confirmed that the forecast model based on a centrality indicator of the proposed method has a maximum forecast accuracy of about 2.88 times higher than the accuracy of the forecast model based on the currently used network indicators.

Product Evaluation Criteria Extraction through Online Review Analysis: Using LDA and k-Nearest Neighbor Approach (온라인 리뷰 분석을 통한 상품 평가 기준 추출: LDA 및 k-최근접 이웃 접근법을 활용하여)

  • Lee, Ji Hyeon;Jung, Sang Hyung;Kim, Jun Ho;Min, Eun Joo;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.97-117
    • /
    • 2020
  • Product evaluation criteria is an indicator describing attributes or values of products, which enable users or manufacturers measure and understand the products. When companies analyze their products or compare them with competitors, appropriate criteria must be selected for objective evaluation. The criteria should show the features of products that consumers considered when they purchased, used and evaluated the products. However, current evaluation criteria do not reflect different consumers' opinion from product to product. Previous studies tried to used online reviews from e-commerce sites that reflect consumer opinions to extract the features and topics of products and use them as evaluation criteria. However, there is still a limit that they produce irrelevant criteria to products due to extracted or improper words are not refined. To overcome this limitation, this research suggests LDA-k-NN model which extracts possible criteria words from online reviews by using LDA and refines them with k-nearest neighbor. Proposed approach starts with preparation phase, which is constructed with 6 steps. At first, it collects review data from e-commerce websites. Most e-commerce websites classify their selling items by high-level, middle-level, and low-level categories. Review data for preparation phase are gathered from each middle-level category and collapsed later, which is to present single high-level category. Next, nouns, adjectives, adverbs, and verbs are extracted from reviews by getting part of speech information using morpheme analysis module. After preprocessing, words per each topic from review are shown with LDA and only nouns in topic words are chosen as potential words for criteria. Then, words are tagged based on possibility of criteria for each middle-level category. Next, every tagged word is vectorized by pre-trained word embedding model. Finally, k-nearest neighbor case-based approach is used to classify each word with tags. After setting up preparation phase, criteria extraction phase is conducted with low-level categories. This phase starts with crawling reviews in the corresponding low-level category. Same preprocessing as preparation phase is conducted using morpheme analysis module and LDA. Possible criteria words are extracted by getting nouns from the data and vectorized by pre-trained word embedding model. Finally, evaluation criteria are extracted by refining possible criteria words using k-nearest neighbor approach and reference proportion of each word in the words set. To evaluate the performance of the proposed model, an experiment was conducted with review on '11st', one of the biggest e-commerce companies in Korea. Review data were from 'Electronics/Digital' section, one of high-level categories in 11st. For performance evaluation of suggested model, three other models were used for comparing with the suggested model; actual criteria of 11st, a model that extracts nouns by morpheme analysis module and refines them according to word frequency, and a model that extracts nouns from LDA topics and refines them by word frequency. The performance evaluation was set to predict evaluation criteria of 10 low-level categories with the suggested model and 3 models above. Criteria words extracted from each model were combined into a single words set and it was used for survey questionnaires. In the survey, respondents chose every item they consider as appropriate criteria for each category. Each model got its score when chosen words were extracted from that model. The suggested model had higher scores than other models in 8 out of 10 low-level categories. By conducting paired t-tests on scores of each model, we confirmed that the suggested model shows better performance in 26 tests out of 30. In addition, the suggested model was the best model in terms of accuracy. This research proposes evaluation criteria extracting method that combines topic extraction using LDA and refinement with k-nearest neighbor approach. This method overcomes the limits of previous dictionary-based models and frequency-based refinement models. This study can contribute to improve review analysis for deriving business insights in e-commerce market.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.