• Title/Summary/Keyword: updated

Search Result 2,421, Processing Time 0.029 seconds

Korean Practice Guidelines for Gastric Cancer 2022: An Evidence-based, Multidisciplinary Approach

  • Tae-Han Kim;In-Ho Kim;Seung Joo Kang;Miyoung Choi;Baek-Hui Kim;Bang Wool Eom;Bum Jun Kim;Byung-Hoon Min;Chang In Choi;Cheol Min Shin;Chung Hyun Tae;Chung sik Gong;Dong Jin Kim;Arthur Eung-Hyuck Cho;Eun Jeong Gong;Geum Jong Song;Hyeon-Su Im;Hye Seong Ahn;Hyun Lim;Hyung-Don Kim;Jae-Joon Kim;Jeong Il Yu;Jeong Won Lee;Ji Yeon Park;Jwa Hoon Kim;Kyoung Doo Song;Minkyu Jung;Mi Ran Jung;Sang-Yong Son;Shin-Hoo Park;Soo Jin Kim;Sung Hak Lee;Tae-Yong Kim;Woo Kyun Bae;Woong Sub Koom;Yeseob Jee;Yoo Min Kim;Yoonjin Kwak;Young Suk Park;Hye Sook Han;Su Youn Nam;Seong-Ho Kong;The Development Working Group for the Korean Practice Guidelines for Gastric Cancer 2022 Task Force Team
    • Journal of Gastric Cancer
    • /
    • v.23 no.1
    • /
    • pp.3-106
    • /
    • 2023
  • Gastric cancer is one of the most common cancers in Korea and the world. Since 2004, this is the 4th gastric cancer guideline published in Korea which is the revised version of previous evidence-based approach in 2018. Current guideline is a collaborative work of the interdisciplinary working group including experts in the field of gastric surgery, gastroenterology, endoscopy, medical oncology, abdominal radiology, pathology, nuclear medicine, radiation oncology and guideline development methodology. Total of 33 key questions were updated or proposed after a collaborative review by the working group and 40 statements were developed according to the systematic review using the MEDLINE, Embase, Cochrane Library and KoreaMed database. The level of evidence and the grading of recommendations were categorized according to the Grading of Recommendations, Assessment, Development and Evaluation proposition. Evidence level, benefit, harm, and clinical applicability was considered as the significant factors for recommendation. The working group reviewed recommendations and discussed for consensus. In the earlier part, general consideration discusses screening, diagnosis and staging of endoscopy, pathology, radiology, and nuclear medicine. Flowchart is depicted with statements which is supported by meta-analysis and references. Since clinical trial and systematic review was not suitable for postoperative oncologic and nutritional follow-up, working group agreed to conduct a nationwide survey investigating the clinical practice of all tertiary or general hospitals in Korea. The purpose of this survey was to provide baseline information on follow up. Herein we present a multidisciplinary-evidence based gastric cancer guideline.

Comprehensive and synthetic inventory of Dokdo Island, Republic of Korea

  • Ui Wook Hwang;Hyun Soo Rho;Bia Park;Eun Hwa Choi;Cho Rong Shin;Sa Heung Kim;Jongrak Lee;Hack Cheul Kim;Mann Kyoon Shin;Taeseo Park;Jumin Jun;Heegab Lee;Jong Eun Lee;Yoon Sik Oh;Jung-Goo Myoung;Chang Geun Choi;Jin Hee Park;Seon-joo Park;Jimin Lee;Jaeho Lee;Hyeok Yeong Kwon;Kyu Tae Park;Chun Woo Lim;Seung Wook Jung;Mi Jin Lee;Yucheol Lee;Yeongheon Shin;Hee-Jung Choi;Young Wook Lee;Hyun Jong Kil;Jin-Han Kim;Myung-Suk Kang;Eun-Young Lee;Sang-Hwa Lee;Young Hyo Kim;Jongwoo Jung;Kuem Hee Jang;Young Jin Lim;Shi Hyun Ryu;Won-Gi Min;Joo Myun Park;Hyojin Lee;Minsu Woo;Yun-Bae Kim;Sehun Myoung
    • Journal of Species Research
    • /
    • v.12 no.spc
    • /
    • pp.1-69
    • /
    • 2023
  • This study aims to establish a comprehensive, synthetic inventory system for the fauna and flora of Dokdo Island, Republic of Korea, which has been conducted by a specialized research group consisting of more than 50 experts. The research was conducted over five years(2015-2019) and supported by the National Institute of Biological Resources, Ministry of Environment, Republic of Korea. All possible publications on the fauna and flora of Dokdo Island over the last 68 years from 1952 to 2020 were reviewed. As a result, 1,302 species were found on Dokdo Island during the study period. An updated list of 1,963 species was created. This is expected to be of great help for the conservation and national publicity of important indigenous biological resources of Dokdo Island.

A study on improving the accuracy of machine learning models through the use of non-financial information in predicting the Closure of operator using electronic payment service (전자결제서비스 이용 사업자 폐업 예측에서 비재무정보 활용을 통한 머신러닝 모델의 정확도 향상에 관한 연구)

  • Hyunjeong Gong;Eugene Hwang;Sunghyuk Park
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.361-381
    • /
    • 2023
  • Research on corporate bankruptcy prediction has been focused on financial information. Since the company's financial information is updated quarterly, there is a problem that timeliness is insufficient in predicting the possibility of a company's business closure in real time. Evaluated companies that want to improve this need a method of judging the soundness of a company that uses information other than financial information to judge the soundness of a target company. To this end, as information technology has made it easier to collect non-financial information about companies, research has been conducted to apply additional variables and various methodologies other than financial information to predict corporate bankruptcy. It has become an important research task to determine whether it has an effect. In this study, we examined the impact of electronic payment-related information, which constitutes non-financial information, when predicting the closure of business operators using electronic payment service and examined the difference in closure prediction accuracy according to the combination of financial and non-financial information. Specifically, three research models consisting of a financial information model, a non-financial information model, and a combined model were designed, and the closure prediction accuracy was confirmed with six algorithms including the Multi Layer Perceptron (MLP) algorithm. The model combining financial and non-financial information showed the highest prediction accuracy, followed by the non-financial information model and the financial information model in order. As for the prediction accuracy of business closure by algorithm, XGBoost showed the highest prediction accuracy among the six algorithms. As a result of examining the relative importance of a total of 87 variables used to predict business closure, it was confirmed that more than 70% of the top 20 variables that had a significant impact on the prediction of business closure were non-financial information. Through this, it was confirmed that electronic payment-related information of non-financial information is an important variable in predicting business closure, and the possibility of using non-financial information as an alternative to financial information was also examined. Based on this study, the importance of collecting and utilizing non-financial information as information that can predict business closure is recognized, and a plan to utilize it for corporate decision-making is also proposed.

The Infrared Medium-deep Survey. VIII. Quasar Luminosity Function at z ~ 5

  • Kim, Yongjung;Im, Myungshin;Jeon, Yiseul;Kim, Minjin;Pak, Soojong;Hyun, Minhee;Taak, Yoon Chan;Shin, Suhyun;Lim, Gu;Paek, Gregory S.H.;Paek, Insu;Jiang, Linhua;Choi, Changsu;Hong, Jueun;Ji, Tae-Geun;Jun, Hyunsung D.;Karouzos, Marios;Kim, Dohyeong;Kim, Duho;Kim, Jae-Woo;Kim, Ji Hoon;Lee, Hye-In;Lee, Seong-Kook;Park, Won-Kee;Yoon, Yongmin;Byeon, Seoyeon;Hwang, Sungyong;Kim, Joonho;Kim, Sophia;Park, Woojin
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.45 no.1
    • /
    • pp.34.3-34.3
    • /
    • 2020
  • Faint z ~ 5 quasars with M1450 ~ -23 mag are known to be the potentially important contributors to the ultraviolet ionizing background in the post-reionization era. However, their number density has not been well determined, making it difficult to assess their role in the early ionization of the intergalactic medium (IGM). In this work, we present the updated results of our z ~ 5 quasar survey using the Infrared Medium-deep Survey (IMS), a near-infrared imaging survey covering an area of 85 square degrees. From our spectroscopic observations with the Gemini Multi-Object Spectrograph (GMOS) on the Gemini-South 8 m Telescope, we discovered eight new quasars at z ~ 5 with -26.1 ≤ M1450 ≤ -23.3. Combining our IMS faint quasars with the brighter Sloan Digital Sky Survey (SDSS) quasars, we derive, for the first time, the z ~ 5 quasar luminosity function (QLF) without any fixed parameters down to the magnitude limit of M1450 = -23 mag. We find that the faint-end slope of the QLF is very flat (-1.2) with a characteristic luminosity of -25.7 mag. The number density of z ~ 5 quasars from the QLF gives lower ionizing emissivity and ionizing photon density than those in previous works. These results imply that quasars are responsible for only 10-20% of the photons required to completely ionize the IGM at z ~ 5, disfavoring the idea that quasars alone could have ionized the IGM at z ~ 5.

  • PDF

Convergence of Remote Sensing and Digital Geospatial Information for Monitoring Unmeasured Reservoirs (미계측 저수지 수체 모니터링을 위한 원격탐사 및 디지털 공간정보 융합)

  • Hee-Jin Lee;Chanyang Sur;Jeongho Cho;Won-Ho Nam
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_4
    • /
    • pp.1135-1144
    • /
    • 2023
  • Many agricultural reservoirs in South Korea, constructed before 1970, have become aging facilities. The majority of small-scale reservoirs lack measurement systems to ascertain basic specifications and water levels, classifying them as unmeasured reservoirs. Furthermore, continuous sedimentation within the reservoirs and industrial development-induced water quality deterioration lead to reduced water supply capacity and changes in reservoir morphology. This study utilized Light Detection And Ranging (LiDAR) sensors, which provide elevation information and allow for the characterization of surface features, to construct high-resolution Digital Surface Model (DSM) and Digital Elevation Model (DEM) data of reservoir facilities. Additionally, bathymetric measurements based on multibeam echosounders were conducted to propose an updated approach for determining reservoir capacity. Drone-based LiDAR was employed to generate DSM and DEM data with a spatial resolution of 50 cm, enabling the display of elevations of hydraulic structures, such as embankments, spillways, and intake channels. Furthermore, using drone-based hyperspectral imagery, Normalized Difference Vegetation Index (NDVI) and Normalized Difference Water Index (NDWI) were calculated to detect water bodies and verify differences from existing reservoir boundaries. The constructed high-resolution DEM data were integrated with bathymetric measurements to create underwater contour maps, which were used to generate a Triangulated Irregular Network (TIN). The TIN was utilized to calculate the inundation area and volume of the reservoir, yielding results highly consistent with basic specifications. Considering areas that were not surveyed due to underwater vegetation, it is anticipated that this data will be valuable for future updates of reservoir capacity information.

Venture Capital Investment and the Performance of Newly Listed Firms on KOSDAQ (벤처캐피탈 투자에 따른 코스닥 상장기업의 상장실적 및 경영성과 분석)

  • Shin, Hyeran;Han, Ingoo;Joo, Jihwan
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.17 no.2
    • /
    • pp.33-51
    • /
    • 2022
  • This study analyzes newly listed companies on KOSDAQ from 2011 to 2020 for both firms having experience in attracting venture investment before listing (VI) and those without having experience in attracting venture investment (NVI) by examining differences between two groups (VI and NVI) with respect to both the level of listing performance and that of firm performance (growth) after the listing. This paper conducts descriptive statistics, mean difference, and multiple regression analysis. Independent variables for regression models include VC investment, firm age at the time of listing, firm type, firm location, firm size, the age of VC, the level of expertise of VC, and the level of fitness of VC with investment company. Throughout this paper, results suggest that listing performance and post-listed growth are better for VI than NVI. VC investment shows a negative effect on the listing period and a positive effect on the sales growth rate. Also, the amount of VC investment has negative effects on the listing period and positive effects on the market capitalization at the time of IPO and on sales growth among growth indicators. Our evidence also implies a significantly positive effect on growth after listing for firms which belong to R&D specialized industries. In addition, it is statistically significant for several years that the firm age has a positive effect on the market capitalization growth rate. This shows that market seems to put the utmost importance on a long-term stability of management capability. Finally, among the VC characteristics such as the age of VC, the level of expertise of VC, and the level of fitness of VC with investment company, we point out that a higher market capitalization tends to be observed at the time of IPO when the level of expertise of anchor VC is high. Our paper differs from prior research in that we reexamine the venture ecosystem under the outbreak of coronavirus disease 2019 which stimulates the degradation of the business environment. In addition, we introduce more effective variables such as VC investment amount when examining the effect of firm type. It enables us to indirectly evaluate the validity of technology exception policy. Although our findings suggest that related policies such as the technology special listing system or the injection of funds into the venture ecosystem are still helpful, those related systems should be updated in a more timely fashion in order to support growth power of firms due to the rapid technological development. Furthermore, industry specialization is essential to achieve regional development, and the growth of the recovery market is also urgent.

The Mediating Effect of Experiential Value on Customers' Perceived Value of Digital Content: China's Anti-virus Program Market (경험개치대소비자대전자내용적인지개치적중개영향(经验价值对消费者对电子内容的认知价值的中介影响): 중국살독연건시장(中国杀毒软件市场))

  • Jia, Weiwei;Kim, Sae-Bum
    • Journal of Global Scholars of Marketing Science
    • /
    • v.20 no.2
    • /
    • pp.219-230
    • /
    • 2010
  • Digital content makes big changes to our daily lives while bringing opportunities and challenges for companies. Creative firms integrate pictures, texts, videos, audios, and data by digitalization to develop new products or services and create digital experiences to promote their brands. Most articles on digital content contribute to the basic concept or development of marketing it in literature. Actually, compared with traditional value chains for common products or services, the digital content industry seems to have more potential value. Because quite a bit of digital content is free to the consumer, price is not necessarily perceived as an indicator of the quality or value of information (Rowley 2008). It becomes evident that a current theme in digital content is the issue of "value," and research on customers' perceived value of digital content is a necessity. This article argues that experiential value has an advantage in customers' evaluations of digital content. Two different but related contributions to the understanding of "value" of digital content are made here. First, based on the comparison of digital content with products and services, the article proposes two key characteristics that make experiential strategy available for digital content: intangibility and near-zero reproduction cost. On top of that, based on the discussion of the gap between company's idealized value and customer's perceived value, this article emphasizes that digital content prices and pricing of digital content is different from products and services. As a result of intangibility, prices may not reflect customer value. Moreover, the cost of digital content in the development stage may be very high while reproduction costs shrink dramatically. Moreover, because of the value gap mentioned before, the pricing polices vary for different digital contents. For example, flat price policy is generally used for movies and music (Magiera 2001; Netherby 2002), while for continuous demand, digital content such as online games and anti-virus programs involves a more complicated matter of utility and competitive price levels. Digital content companies have to explore various kinds of strategies to overcome this gap. Rethinking marketing solutions such as advertisements, images, and word-of-mouth and their effect on customers' perceived value becomes essential. China's digital content industry is becoming more and more globalized and drawing special attention from different countries and regions that have respective competitive advantages. The 2008-2009 Annual Report on the Development of China's Digital Content Industry (CCIDConsulting 2009) indicates that, with the driven power of domestic demand and governmental policy support, the country's digital content industry maintained a fast growth of some 30 percent in 2008, obviously indicating the initial stage of industry expansion. In China, anti-virus programs and other software programs which need to be updated use a quarter-based pricing policy. Customers can download a trial version for free and use it for six months or a year. If they want to use it longer, continuous payment is needed. They examine the excellence of the digital content during this trial period and decide whether to pay for continued usage. For China’s music and movie industries, as a result of initial development, experiential strategy has not been much applied, even though firms in other countries find the trial experience and explore important strategies(such as customers listening to music for several seconds for free before downloading it). For the above reasons, anti-virus program may be a representative for digital content industry in China and an exploratory study of the advantage of experiential value in customer's perceived value of digital content is done in the anti-virus market of China. In order to enhance the reliability of the survey data, this study focused on people who were experienced users of anti-virus programs. The empirical results revealed that experiential value has a positive effect on customers' perceived value of digital content. In other words, because digital content is intangible and the reproduction costs are nearly zero, customers' evaluations are based heavily on their experience. Moreover, image and word-of-mouth do not have a positive effect on perceived value, only on experiential value. That is to say, a digital content value chain is different from that of a general product or service. Experiential value has a notable advantage and mediates the effect of image and word-of-mouth on perceived value. The results of this study help provide an understanding of why free digital content downloads exist in developing countries. Customers can perceive the value of digital content only by using and experiencing it. This is also why such governments support the development of digital content. Other developing countries whose digital content business is also in the beginning stage can make use of the suggestions here. Moreover, based on the advantage of experiential strategy, companies should make more of an effort to invest in customers' experience. As a result of the characteristics and value gap of digital content, customers perceive more value in the intangible digital content only by experiencing what they really want. Moreover, because of the near-zero reproduction costs, companies can perhaps use experiential strategy to enhance customer understanding of digital content.

Evaluation of the Positional Uncertainty of a Liver Tumor using 4-Dimensional Computed Tomography and Gated Orthogonal Kilovolt Setup Images (사차원전산화단층촬영과 호흡연동 직각 Kilovolt 준비 영상을 이용한 간 종양의 움직임 분석)

  • Ju, Sang-Gyu;Hong, Chae-Seon;Park, Hee-Chul;Ahn, Jong-Ho;Shin, Eun-Hyuk;Shin, Jung-Suk;Kim, Jin-Sung;Han, Young-Yih;Lim, Do-Hoon;Choi, Doo-Ho
    • Radiation Oncology Journal
    • /
    • v.28 no.3
    • /
    • pp.155-165
    • /
    • 2010
  • Purpose: In order to evaluate the positional uncertainty of internal organs during radiation therapy for treatment of liver cancer, we measured differences in inter- and intra-fractional variation of the tumor position and tidal amplitude using 4-dimentional computed radiograph (DCT) images and gated orthogonal setup kilovolt (KV) images taken on every treatment using the on board imaging (OBI) and real time position management (RPM) system. Materials and Methods: Twenty consecutive patients who underwent 3-dimensional (3D) conformal radiation therapy for treatment of liver cancer participated in this study. All patients received a 4DCT simulation with an RT16 scanner and an RPM system. Lipiodol, which was updated near the target volume after transarterial chemoembolization or diaphragm was chosen as a surrogate for the evaluation of the position difference of internal organs. Two reference orthogonal (anterior and lateral) digital reconstructed radiograph (DRR) images were generated using CT image sets of 0% and 50% into the respiratory phases. The maximum tidal amplitude of the surrogate was measured from 3D conformal treatment planning. After setting the patient up with laser markings on the skin, orthogonal gated setup images at 50% into the respiratory phase were acquired at each treatment session with OBI and registered on reference DRR images by setting each beam center. Online inter-fractional variation was determined with the surrogate. After adjusting the patient setup error, orthogonal setup images at 0% and 50% into the respiratory phases were obtained and tidal amplitude of the surrogate was measured. Measured tidal amplitude was compared with data from 4DCT. For evaluation of intra-fractional variation, an orthogonal gated setup image at 50% into the respiratory phase was promptly acquired after treatment and compared with the same image taken just before treatment. In addition, a statistical analysis for the quantitative evaluation was performed. Results: Medians of inter-fractional variation for twenty patients were 0.00 cm (range, -0.50 to 0.90 cm), 0.00 cm (range, -2.40 to 1.60 cm), and 0.00 cm (range, -1.10 to 0.50 cm) in the X (transaxial), Y (superior-inferior), and Z (anterior-posterior) directions, respectively. Significant inter-fractional variations over 0.5 cm were observed in four patients. Min addition, the median tidal amplitude differences between 4DCTs and the gated orthogonal setup images were -0.05 cm (range, -0.83 to 0.60 cm), -0.15 cm (range, -2.58 to 1.18 cm), and -0.02 cm (range, -1.37 to 0.59 cm) in the X, Y, and Z directions, respectively. Large differences of over 1 cm were detected in 3 patients in the Y direction, while differences of more than 0.5 but less than 1 cm were observed in 5 patients in Y and Z directions. Median intra-fractional variation was 0.00 cm (range, -0.30 to 0.40 cm), -0.03 cm (range, -1.14 to 0.50 cm), 0.05 cm (range, -0.30 to 0.50 cm) in the X, Y, and Z directions, respectively. Significant intra-fractional variation of over 1 cm was observed in 2 patients in Y direction. Conclusion: Gated setup images provided a clear image quality for the detection of organ motion without a motion artifact. Significant intra- and inter-fractional variation and tidal amplitude differences between 4DCT and gated setup images were detected in some patients during the radiation treatment period, and therefore, should be considered when setting up the target margin. Monitoring of positional uncertainty and its adaptive feedback system can enhance the accuracy of treatments.

A Study on Knowledge Entity Extraction Method for Individual Stocks Based on Neural Tensor Network (뉴럴 텐서 네트워크 기반 주식 개별종목 지식개체명 추출 방법에 관한 연구)

  • Yang, Yunseok;Lee, Hyun Jun;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.25-38
    • /
    • 2019
  • Selecting high-quality information that meets the interests and needs of users among the overflowing contents is becoming more important as the generation continues. In the flood of information, efforts to reflect the intention of the user in the search result better are being tried, rather than recognizing the information request as a simple string. Also, large IT companies such as Google and Microsoft focus on developing knowledge-based technologies including search engines which provide users with satisfaction and convenience. Especially, the finance is one of the fields expected to have the usefulness and potential of text data analysis because it's constantly generating new information, and the earlier the information is, the more valuable it is. Automatic knowledge extraction can be effective in areas where information flow is vast, such as financial sector, and new information continues to emerge. However, there are several practical difficulties faced by automatic knowledge extraction. First, there are difficulties in making corpus from different fields with same algorithm, and it is difficult to extract good quality triple. Second, it becomes more difficult to produce labeled text data by people if the extent and scope of knowledge increases and patterns are constantly updated. Third, performance evaluation is difficult due to the characteristics of unsupervised learning. Finally, problem definition for automatic knowledge extraction is not easy because of ambiguous conceptual characteristics of knowledge. So, in order to overcome limits described above and improve the semantic performance of stock-related information searching, this study attempts to extract the knowledge entity by using neural tensor network and evaluate the performance of them. Different from other references, the purpose of this study is to extract knowledge entity which is related to individual stock items. Various but relatively simple data processing methods are applied in the presented model to solve the problems of previous researches and to enhance the effectiveness of the model. From these processes, this study has the following three significances. First, A practical and simple automatic knowledge extraction method that can be applied. Second, the possibility of performance evaluation is presented through simple problem definition. Finally, the expressiveness of the knowledge increased by generating input data on a sentence basis without complex morphological analysis. The results of the empirical analysis and objective performance evaluation method are also presented. The empirical study to confirm the usefulness of the presented model, experts' reports about individual 30 stocks which are top 30 items based on frequency of publication from May 30, 2017 to May 21, 2018 are used. the total number of reports are 5,600, and 3,074 reports, which accounts about 55% of the total, is designated as a training set, and other 45% of reports are designated as a testing set. Before constructing the model, all reports of a training set are classified by stocks, and their entities are extracted using named entity recognition tool which is the KKMA. for each stocks, top 100 entities based on appearance frequency are selected, and become vectorized using one-hot encoding. After that, by using neural tensor network, the same number of score functions as stocks are trained. Thus, if a new entity from a testing set appears, we can try to calculate the score by putting it into every single score function, and the stock of the function with the highest score is predicted as the related item with the entity. To evaluate presented models, we confirm prediction power and determining whether the score functions are well constructed by calculating hit ratio for all reports of testing set. As a result of the empirical study, the presented model shows 69.3% hit accuracy for testing set which consists of 2,526 reports. this hit ratio is meaningfully high despite of some constraints for conducting research. Looking at the prediction performance of the model for each stocks, only 3 stocks, which are LG ELECTRONICS, KiaMtr, and Mando, show extremely low performance than average. this result maybe due to the interference effect with other similar items and generation of new knowledge. In this paper, we propose a methodology to find out key entities or their combinations which are necessary to search related information in accordance with the user's investment intention. Graph data is generated by using only the named entity recognition tool and applied to the neural tensor network without learning corpus or word vectors for the field. From the empirical test, we confirm the effectiveness of the presented model as described above. However, there also exist some limits and things to complement. Representatively, the phenomenon that the model performance is especially bad for only some stocks shows the need for further researches. Finally, through the empirical study, we confirmed that the learning method presented in this study can be used for the purpose of matching the new text information semantically with the related stocks.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.


  • (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.