• Title/Summary/Keyword: 대응방법

Search Result 4,674, Processing Time 0.04 seconds

A Study on the Operation Plan of the Gangwon-do Disaster Management Resources Integrated Management Center (강원도 재난관리자원 통합관리센터 운영방안에 관한 연구)

  • Hang-Il Jo;Sang-Beom Park;Kye-Won Jun
    • Journal of Korean Society of Disaster and Security
    • /
    • v.17 no.1
    • /
    • pp.9-16
    • /
    • 2024
  • In Korea, as disasters become larger and more complex, there is a trend of shifting from a focus on response and recovery to a focus on prevention and preparedness. In order to prevent and prepare for disasters, each local government manages disaster management resources by stockpiling them. However, although disaster management resources are stored in individual warehouses, they are managed by department rather than by warehouse, resulting in insufficient management of disaster management resources due to the heavy workload of those in charge. In order to intensively manage these disaster management resources, an integrated disaster management resource management center is established and managed at the metropolitan/provincial level. In the case of Gangwon-do, the subject of this study, a warehouse is rented and operated as an integrated disaster management resource management center. When leasing an integrated management center, there is the inconvenience of having to move the location every 1 to 2 years, so it is deemed necessary to build a dedicated facility in an available site. To select a location candidate, network analysis was used to measure access to and use of facilities along interconnected routes of networks such as roads and railways. During network analysis, the Location-Allocation method, which was widely used in the past to determine the location of multiple facilities, was applied. As a result, Hoengseong-gun in Gangwon-do was identified as a suitable candidate site. In addition, if the integrated management center uses our country's logistics system to stockpile disaster management resources, local governments can mobilize disaster management resources in 3 days, and it is said that it takes 3 days to return to normal life after a disaster occurs. Each city's disaster management resource stockpile is 3 days' worth per week, and the integrated management center stores 3 times the maximum of the city's 4-day stockpile.

A Study on the Characteristics of Enterprise R&D Capabilities Using Data Mining (데이터마이닝을 활용한 기업 R&D역량 특성에 관한 탐색 연구)

  • Kim, Sang-Gook;Lim, Jung-Sun;Park, Wan
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.1-21
    • /
    • 2021
  • As the global business environment changes, uncertainties in technology development and market needs increase, and competition among companies intensifies, interests and demands for R&D activities of individual companies are increasing. In order to cope with these environmental changes, R&D companies are strengthening R&D investment as one of the means to enhance the qualitative competitiveness of R&D while paying more attention to facility investment. As a result, facilities or R&D investment elements are inevitably a burden for R&D companies to bear future uncertainties. It is true that the management strategy of increasing investment in R&D as a means of enhancing R&D capability is highly uncertain in terms of corporate performance. In this study, the structural factors that influence the R&D capabilities of companies are explored in terms of technology management capabilities, R&D capabilities, and corporate classification attributes by utilizing data mining techniques, and the characteristics these individual factors present according to the level of R&D capabilities are analyzed. This study also showed cluster analysis and experimental results based on evidence data for all domestic R&D companies, and is expected to provide important implications for corporate management strategies to enhance R&D capabilities of individual companies. For each of the three viewpoints, detailed evaluation indexes were composed of 7, 2, and 4, respectively, to quantitatively measure individual levels in the corresponding area. In the case of technology management capability and R&D capability, the sub-item evaluation indexes that are being used by current domestic technology evaluation agencies were referenced, and the final detailed evaluation index was newly constructed in consideration of whether data could be obtained quantitatively. In the case of corporate classification attributes, the most basic corporate classification profile information is considered. In particular, in order to grasp the homogeneity of the R&D competency level, a comprehensive score for each company was given using detailed evaluation indicators of technology management capability and R&D capability, and the competency level was classified into five grades and compared with the cluster analysis results. In order to give the meaning according to the comparative evaluation between the analyzed cluster and the competency level grade, the clusters with high and low trends in R&D competency level were searched for each cluster. Afterwards, characteristics according to detailed evaluation indicators were analyzed in the cluster. Through this method of conducting research, two groups with high R&D competency and one with low level of R&D competency were analyzed, and the remaining two clusters were similar with almost high incidence. As a result, in this study, individual characteristics according to detailed evaluation indexes were analyzed for two clusters with high competency level and one cluster with low competency level. The implications of the results of this study are that the faster the replacement cycle of professional managers who can effectively respond to changes in technology and market demand, the more likely they will contribute to enhancing R&D capabilities. In the case of a private company, it is necessary to increase the intensity of input of R&D capabilities by enhancing the sense of belonging of R&D personnel to the company through conversion to a corporate company, and to provide the accuracy of responsibility and authority through the organization of the team unit. Since the number of technical commercialization achievements and technology certifications are occurring both in the case of contributing to capacity improvement and in case of not, it was confirmed that there is a limit in reviewing it as an important factor for enhancing R&D capacity from the perspective of management. Lastly, the experience of utility model filing was identified as a factor that has an important influence on R&D capability, and it was confirmed the need to provide motivation to encourage utility model filings in order to enhance R&D capability. As such, the results of this study are expected to provide important implications for corporate management strategies to enhance individual companies' R&D capabilities.

Information types and characteristics within the Wireless Emergency Alert in COVID-19: Focusing on Wireless Emergency Alerts in Seoul (코로나 19 하에서 재난문자 내의 정보유형 및 특성: 서울특별시 재난문자를 중심으로)

  • Yoon, Sungwook;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.45-68
    • /
    • 2022
  • The central and local governments of the Republic of Korea provided information necessary for disaster response through wireless emergency alerts (WEAs) in order to overcome the pandemic situation in which COVID-19 rapidly spreads. Among all channels for delivering disaster information, wireless emergency alert is the most efficient, and since it adopts the CBS(Cell Broadcast Service) method that broadcasts directly to the mobile phone, it has the advantage of being able to easily access disaster information through the mobile phone without the effort of searching. In this study, the characteristics of wireless emergency alerts sent to Seoul during the past year and one month (January 2020 to January 2021) were derived through various text mining methodologies, and various types of information contained in wireless emergency alerts were analyzed. In addition, it was confirmed through the population mobility by age in the districts of Seoul that what kind of influence it had on the movement behavior of people. After going through the process of classifying key words and information included in each character, text analysis was performed so that individual sent characters can be used as an analysis unit by applying a document cluster analysis technique based on the included words. The number of WEAs sent to the Seoul has grown dramatically since the spread of Covid-19. In January 2020, only 10 WEAs were sent to the Seoul, but the number of the WEAs increased 5 times in March, and 7.7 times over the previous months. Since the basic, regional local government were authorized to send wireless emergency alerts independently, the sending behavior of related to wireless emergency alerts are different for each local government. Although most of the basic local governments increased the transmission of WEAs as the number of confirmed cases of Covid-19 increases, the trend of the increase in WEAs according to the increase in the number of confirmed cases of Covid-19 was different by region. By using structured econometric model, the effect of disaster information included in wireless emergency alerts on population mobility was measured by dividing it into baseline effect and accumulating effect. Six types of disaster information, including date, order, online URL, symptom, location, normative guidance, were identified in WEAs and analyzed through econometric modelling. It was confirmed that the types of information that significantly change population mobility by age are different. Population mobility of people in their 60s and 70s decreased when wireless emergency alerts included information related to date and order. As date and order information is appeared in WEAs when they intend to give information about Covid-19 confirmed cases, these results show that the population mobility of higher ages decreased as they reacted to the messages reporting of confirmed cases of Covid-19. Online information (URL) decreased the population mobility of in their 20s, and information related to symptoms reduced the population mobility of people in their 30s. On the other hand, it was confirmed that normative words that including the meaning of encouraging compliance with quarantine policies did not cause significant changes in the population mobility of all ages. This means that only meaningful information which is useful for disaster response should be included in the wireless emergency alerts. Repeated sending of wireless emergency alerts reduces the magnitude of the impact of disaster information on population mobility. It proves indirectly that under the prolonged pandemic, people started to feel tired of getting repetitive WEAs with similar content and started to react less. In order to effectively use WEAs for quarantine and overcoming disaster situations, it is necessary to reduce the fatigue of the people who receive WEA by sending them only in necessary situations, and to raise awareness of WEAs.

Discovering Promising Convergence Technologies Using Network Analysis of Maturity and Dependency of Technology (기술 성숙도 및 의존도의 네트워크 분석을 통한 유망 융합 기술 발굴 방법론)

  • Choi, Hochang;Kwahk, Kee-Young;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.101-124
    • /
    • 2018
  • Recently, most of the technologies have been developed in various forms through the advancement of single technology or interaction with other technologies. Particularly, these technologies have the characteristic of the convergence caused by the interaction between two or more techniques. In addition, efforts in responding to technological changes by advance are continuously increasing through forecasting promising convergence technologies that will emerge in the near future. According to this phenomenon, many researchers are attempting to perform various analyses about forecasting promising convergence technologies. A convergence technology has characteristics of various technologies according to the principle of generation. Therefore, forecasting promising convergence technologies is much more difficult than forecasting general technologies with high growth potential. Nevertheless, some achievements have been confirmed in an attempt to forecasting promising technologies using big data analysis and social network analysis. Studies of convergence technology through data analysis are actively conducted with the theme of discovering new convergence technologies and analyzing their trends. According that, information about new convergence technologies is being provided more abundantly than in the past. However, existing methods in analyzing convergence technology have some limitations. Firstly, most studies deal with convergence technology analyze data through predefined technology classifications. The technologies appearing recently tend to have characteristics of convergence and thus consist of technologies from various fields. In other words, the new convergence technologies may not belong to the defined classification. Therefore, the existing method does not properly reflect the dynamic change of the convergence phenomenon. Secondly, in order to forecast the promising convergence technologies, most of the existing analysis method use the general purpose indicators in process. This method does not fully utilize the specificity of convergence phenomenon. The new convergence technology is highly dependent on the existing technology, which is the origin of that technology. Based on that, it can grow into the independent field or disappear rapidly, according to the change of the dependent technology. In the existing analysis, the potential growth of convergence technology is judged through the traditional indicators designed from the general purpose. However, these indicators do not reflect the principle of convergence. In other words, these indicators do not reflect the characteristics of convergence technology, which brings the meaning of new technologies emerge through two or more mature technologies and grown technologies affect the creation of another technology. Thirdly, previous studies do not provide objective methods for evaluating the accuracy of models in forecasting promising convergence technologies. In the studies of convergence technology, the subject of forecasting promising technologies was relatively insufficient due to the complexity of the field. Therefore, it is difficult to find a method to evaluate the accuracy of the model that forecasting promising convergence technologies. In order to activate the field of forecasting promising convergence technology, it is important to establish a method for objectively verifying and evaluating the accuracy of the model proposed by each study. To overcome these limitations, we propose a new method for analysis of convergence technologies. First of all, through topic modeling, we derive a new technology classification in terms of text content. It reflects the dynamic change of the actual technology market, not the existing fixed classification standard. In addition, we identify the influence relationships between technologies through the topic correspondence weights of each document, and structuralize them into a network. In addition, we devise a centrality indicator (PGC, potential growth centrality) to forecast the future growth of technology by utilizing the centrality information of each technology. It reflects the convergence characteristics of each technology, according to technology maturity and interdependence between technologies. Along with this, we propose a method to evaluate the accuracy of forecasting model by measuring the growth rate of promising technology. It is based on the variation of potential growth centrality by period. In this paper, we conduct experiments with 13,477 patent documents dealing with technical contents to evaluate the performance and practical applicability of the proposed method. As a result, it is confirmed that the forecast model based on a centrality indicator of the proposed method has a maximum forecast accuracy of about 2.88 times higher than the accuracy of the forecast model based on the currently used network indicators.

A Hybrid Recommender System based on Collaborative Filtering with Selective Use of Overall and Multicriteria Ratings (종합 평점과 다기준 평점을 선택적으로 활용하는 협업필터링 기반 하이브리드 추천 시스템)

  • Ku, Min Jung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.85-109
    • /
    • 2018
  • Recommender system recommends the items expected to be purchased by a customer in the future according to his or her previous purchase behaviors. It has been served as a tool for realizing one-to-one personalization for an e-commerce service company. Traditional recommender systems, especially the recommender systems based on collaborative filtering (CF), which is the most popular recommendation algorithm in both academy and industry, are designed to generate the items list for recommendation by using 'overall rating' - a single criterion. However, it has critical limitations in understanding the customers' preferences in detail. Recently, to mitigate these limitations, some leading e-commerce companies have begun to get feedback from their customers in a form of 'multicritera ratings'. Multicriteria ratings enable the companies to understand their customers' preferences from the multidimensional viewpoints. Moreover, it is easy to handle and analyze the multidimensional ratings because they are quantitative. But, the recommendation using multicritera ratings also has limitation that it may omit detail information on a user's preference because it only considers three-to-five predetermined criteria in most cases. Under this background, this study proposes a novel hybrid recommendation system, which selectively uses the results from 'traditional CF' and 'CF using multicriteria ratings'. Our proposed system is based on the premise that some people have holistic preference scheme, whereas others have composite preference scheme. Thus, our system is designed to use traditional CF using overall rating for the users with holistic preference, and to use CF using multicriteria ratings for the users with composite preference. To validate the usefulness of the proposed system, we applied it to a real-world dataset regarding the recommendation for POI (point-of-interests). Providing personalized POI recommendation is getting more attentions as the popularity of the location-based services such as Yelp and Foursquare increases. The dataset was collected from university students via a Web-based online survey system. Using the survey system, we collected the overall ratings as well as the ratings for each criterion for 48 POIs that are located near K university in Seoul, South Korea. The criteria include 'food or taste', 'price' and 'service or mood'. As a result, we obtain 2,878 valid ratings from 112 users. Among 48 items, 38 items (80%) are used as training dataset, and the remaining 10 items (20%) are used as validation dataset. To examine the effectiveness of the proposed system (i.e. hybrid selective model), we compared its performance to the performances of two comparison models - the traditional CF and the CF with multicriteria ratings. The performances of recommender systems were evaluated by using two metrics - average MAE(mean absolute error) and precision-in-top-N. Precision-in-top-N represents the percentage of truly high overall ratings among those that the model predicted would be the N most relevant items for each user. The experimental system was developed using Microsoft Visual Basic for Applications (VBA). The experimental results showed that our proposed system (avg. MAE = 0.584) outperformed traditional CF (avg. MAE = 0.591) as well as multicriteria CF (avg. AVE = 0.608). We also found that multicriteria CF showed worse performance compared to traditional CF in our data set, which is contradictory to the results in the most previous studies. This result supports the premise of our study that people have two different types of preference schemes - holistic and composite. Besides MAE, the proposed system outperformed all the comparison models in precision-in-top-3, precision-in-top-5, and precision-in-top-7. The results from the paired samples t-test presented that our proposed system outperformed traditional CF with 10% statistical significance level, and multicriteria CF with 1% statistical significance level from the perspective of average MAE. The proposed system sheds light on how to understand and utilize user's preference schemes in recommender systems domain.

The Predictable Factors for the Mortality of Fatal Asthma with Acute Respiratory Failure (호흡부전을 동반한 중증천식환자의 사망 예측 인자)

  • Park, Joo-Hun;Moon, Hee-Bom;Na, Joo-Ock;Song, Hun-Ho;Lim, Chae-Man;Lee, Moo-Song;Shim, Tae-Sun;Lee,, Sang-Do;Kim, Woo-Sung;Kim, Dong-Soon;Kim, Won-Dong;Koh, Youn-Suck
    • Tuberculosis and Respiratory Diseases
    • /
    • v.47 no.3
    • /
    • pp.356-364
    • /
    • 1999
  • Backgrounds: Previous reports have revealed a high morbidity and mortality in fatal asthma patients, especially those treated in the medical intensive care unit(MICU). But it has not been well known about the predictable factors for the mortality of fatal asthma(F A) with acute respiratory failure. In order to define the predictable factors for the mortality of FA at the admission to MICU, we analyzed the relationship between the clinical parameters and the prognosis of FA patients. Methods: A retrospective analysis of all medical records of 59 patients who had admitted for FA to MICU at a tertiary care MICU from January 1992 to March 1997 was performed. Results: Over all mortality rate was 32.2% and 43 patients were mechanically ventilated. In uni-variate analysis, the death group had significantly older age ($66.2{\pm}10.5$ vs. $51.0{\pm}18.8$ year), lower FVC($59.2{\pm}21.1$ vs. $77.6{\pm}23.3%$) and lower $FEV_1$($41.4{\pm}18.8$ vs. $61.l{\pm}23.30%$), and longer total ventilation time ($255.0{\pm}236.3$ vs. $98.1{\pm}120.4$ hour) (p<0.05) compared with the survival group (PFT: best value of recent 1 year). At MICU admission, there were no significant differences in vital signs, $PaCO_2$, $PaO_2/FiO_2$, and $AaDO_2$, in both groups. However, on the second day of MICU, the death group had significantly more rapid pulse rate ($121.6{\pm}22.3$ vs. $105.2{\pm}19.4$ rate/min), elevated $PaCO_2$ ($50.1{\pm}16.5$ vs. $41.8{\pm}12.2 mm Hg$), lower $PaO_2/FiO_2$, ($160.8{\pm}59.8$ vs. $256.6{\pm}78.3 mm Hg$), higher $AaDO_2$ ($181.5{\pm}79.7$ vs. $98.6{\pm}47.9 mm Hg$), and higher APACHE III score ($57.6{\pm}21.1$ vs. $20.3{\pm}13.2$) than survival group (p<0.05). The death group had more frequently associated with pneumonia and anoxic brain damage at admission, and had more frequently developed sepsis during disease progression than the survival group (p<0.05). Multi-variate analysis using APACHE III score and $PaO_2/FiO_2$, ratio on first and second day, age, sex, and pneumonia combined at admission revealed that APACHE III score (40) and $PaO_2/FiO_2$ ratio (<200) on second day were regarded as predictive factors for the mortality of fatal asthma (p<0.05). Conclusions: APACHE III score ($\geq$40) and $PaO_2/FiO_2$ ratio (<200) on the second day of MICU, which might reflect the response of treatment, rather than initially presented clinical parameters would be more important predictable factors of mortality in patients with FA.

  • PDF

Evaluation of Artifacts by Dental Metal Prostheses and Implants on PET/CT Images: Phantom and Clinical Studies (PET/CT 영상에서의 치과재료에 의한 인공물에 관한 연구)

  • Bahn, Young-Kag;Park, Hoon-Hee;NamKoong, Hyuk;Cho, Suk-Won;Lim, Han-Sang;Lee, Chang-Ho
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.14 no.2
    • /
    • pp.110-116
    • /
    • 2010
  • Purpose: The X-ray attenuation coefficient based on CT images is used for attenuation correction in PET/CT. The polychromatic X-ray beam can introduce beam-hardening artifact on CT images. The aims of the study were to evaluate the effect of dental metal prostheses in phantom and patients on apparent tracer activity measured with PET/CT when using CT attenuation correction. Materials and Methods: 40 normal patients (mean age $54{\pm}12$) was scanned between Jan and Feb 2010. NEMA(National Electrical Manufactures Association) PET $Phantom^{TM}$ (NU2-1994) was filled with $^{18}F$-FDG injected into the water that insert implant and metal prostheses dental cast. Region of interest were drawn in non-artifact region, bright steak artifact region and dark streak artifact region on the same transaxial CT and PET slices. Patients and phantom with dental metal prostheses and dental implant were evaluated the change rate of CT Number and $SUV_{mean}$ in PET/CT. A paired t-test was performed to compare the ratio and the difference of the calculated values. Results: In patients with dental metal prostheses, $SUV_{mean}$ was reduced 19.64% (p<0.05) in the non-steak artifact region than the brightstreak artifact region whereas was increased 90.1% (p>0.05) in the non-steak artifact region than the dark streak artifact region. In phantom with dental metal prostheses, $SUV_{mean}$ was reduced 18.1% (p<0.05) in the non-steak artifact region than the bright streak artifact region whereas was increased 18.0% (p>0.05) in the non-steak artifact region than the dark streak artifact region. In patients with dental implant, $SUV_{mean}$ was increased 19.1% (p<0.05) in the non-steak artifact region than the bright streak artifact region whereas was increased 96.62% (p>0.05) in the non-steak artifact region than the dark streak artifact region. In phantom with dental implant, $SUV_{mean}$ was increased 14.4% (p<0.05) in the non-steak artifact region than the bright streak artifact region whereas was increased 7.0% (p>0.05) in the non-steak artifact region than the dark streak artifact region. Conclusion: When CT is used for attenuation correction in patients with dental metal prostheses, 19.1% reduced $SUV_{mean}$ is anticipated in the dark streak artifact region on CT images. The dark streak artifacts of CT by dental metal prostheses may cause false negative finding in PET/CT. We recommend that the non-attenuation corrected PET images also be evaluated for clinical use.

  • PDF

Detoxification of PSP and relationship between PSP toxicity and Protogonyaulax sp. (마비성패류독의 제독방법 및 패류독성과 원인플랑크톤과의 관계에 관한 연구)

  • CHANG Dong-Suck;SHIN Il-Shik;KIM Ji-Hoe;PYUN Jae-hueung;CHOE Wi-Kung
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.22 no.4
    • /
    • pp.177-188
    • /
    • 1989
  • The purpose of this study was to investigate the detoxifying effect on PSP-infested sea mussel, Mytilus edulis, by heating treatment and correlation between the PSP toxicity and the environmental conditions of shellfish culture area such as temperature, pH, salinity, density of Protogonyaulax sp. and concentration of inorganic nutrients such as $NH_4-N,\;NO_3-N,\;NO_2-N\;and\;PO_4-P$. This experiment was carried out at $Suj\u{o}ng$ in Masan, Yangdo in Jindong, $Hach\u{o}ng\;in\;K\u{o}jedo\;and\;Gamch\u{o}n$ bay in Pusan from February to June in $1987\~1989$. It was observed that the detection ratio and toxicity of PSP in sea mussel were different by the year even same collected area. The PSP was often detected when the temperature of sea water about $8.0\~14.0^{\circ}C$. Sometimes the PSP fox of sea mussel was closely related to density of Protogonyaulax sp. at $Gamch\u{o}n$ bay in Pusan from March to April in 1989, but no relationship was observed except above duration during the study period. The concentration of inorganic nutrients effects on the growth of Protogonyaulax sp., then effects of $NO_3-N$ was the strongest among them. When the PSP-infested sea mussel homogenate was heated at various temperature, the PSP toxicity was not changed significantly at below $70^{\circ}C$ for 60 min. but it was proper-tionaly decreased as the heating temperature was increased. For example, when the sea mussel homogenate was heated at $100^{\circ}C,\;121^{\circ}C$ for 10 min., the toxicity was decreased about $67\%\;and\;90\%$, respectively. On the other hand, when shellstock sea mussel contained PSP of $150{\mu}g/100g$ was boiled at $100^{\circ}C$ for 30 min. with tap water, the toxicity was not detected by mouse assay, but that of PSP of $5400{\mu}g/100g$ was reduced to $57{\mu}g/100g$ even after boiling for 120 min.

  • PDF

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.


  • (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.