• Title/Summary/Keyword: virus

Search Result 6,444, Processing Time 0.038 seconds

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • Bacterial Blight Resistance Genes Pyramided in Mid-Late Maturing Rice Cultivar 'Sinjinbaek' with High Grain Quality (벼흰잎마름병 저항성 유전자 집적 고품질 중만생 벼 '신진백')

    • Park, Hyun-Su;Kim, Ki-Young;Baek, Man-Kee;Cho, Young-Chan;Kim, Bo-Kyeong;Nam, Jeong-Kwon;Shin, Woon-Chul;Kim, Woo-Jae;Ko, Jong-Cheol;Kim, Jeong-Ju;Jeong, Jong-Min;Jeung, Ji-Ung;Lee, Keon-Mi;Park, Seul-Gi;Lee, Chang-Min;Kim, Choon-Song;Suh, Jung-Pil;Lee, Jeom-Ho
      • Korean Journal of Breeding Science
      • /
      • v.51 no.3
      • /
      • pp.263-276
      • /
      • 2019
    • 'Sinjinbaek' is a bacterial blight (BB)-resistant, mid-late maturing rice cultivar with high grain quality. To diversify the resistance genes and enhance the resistance of Korean rice cultivars against BB, 'Sinjinbaek' was developed from a cross between 'Iksan493' (cultivar name 'Jinbaek') and the F1 cross between 'Hopum' and 'HR24670-9-2-1' ('HR24670'). 'Jinbaek' is a BB-resistant cultivar with two BB resistance genes, Xa3 and xa5. 'Hopum' is a high grain quality cultivar with the Xa3 resistance gene. 'HR24670' is a near-isogenic line that carries the Xa21 gene, a resistance gene inherited from a wild rice species O. longistaminata, in the genetic background of japonica elite rice line 'Suweon345'. 'Sinjinbaek' was selected through the pedigree method, yield trials, and local adaptability tests. Using bioassay for BB races and DNA markers for resistance genes, three resistance genes, Xa3, xa5, and Xa21, were pyramided in the 'Sinjinbaek' cultivar. 'Sinjinbaek' exhibited high-level and broad-spectrum resistance against BB, including the K3a race, the most virulent race in Korea. 'Sinjinbaek' is a mid-late maturing rice cultivar tolerant to lodging. It has multiple disease resistance against BB, rice blast, and stripe virus. The yield of 'Sinjinbaek' was similar to that of 'Nampyeong'. 'Sinjinbaek' showed excellent grain appearance, good taste of cooked rice, and enhanced milling performance, and we concluded that it could contribute to improving the quality of BB-resistant cultivars. 'Sinjinbaek' was successfully introgressed with the Xa21 gene without the linkage drag negatively affecting its agronomic characteristics. 'Sinjinbaek' improved the resistance of Korean rice cultivars against BB by introgression of a new resistance gene, Xa21, as well as by pyramiding three resistance genes, Xa3, xa5, and Xa21. 'Sinjinbaek' would be suitable for the cultivation in BB-prone areas since it has been used in breeding programs for enhancing plants' resistance to BB (Registration No. 7273).

    Chronic HBV Infection in Children: The histopathologic classification and its correlation with clinical findings (소아의 만성 B형 간염: 새로운 병리조직학적 분류와 임상 소견의 상관 분석)

    • Lee, Seon-Young;Ko, Jae-Sung;Kim, Chong-Jai;Jang, Ja-June;Seo, Jeong-Kee
      • Pediatric Gastroenterology, Hepatology & Nutrition
      • /
      • v.1 no.1
      • /
      • pp.56-78
      • /
      • 1998
    • Objective: Chronic hepatitis B infection (CHB) occurs in 6% to 10% of population in Korea. In ethinic communities where prevalence of chronic infection is high such as Korea, transmission of hepatitis B infection is either vertical (ie, by perinatal infection) or by close family contact (usually from mothers or siblings) during the first 5 years of life. The development of chronic hepatitis B infection is increasingly more common the earlier a person is exposed to the virus, particularly in fetal and neonatal life. And it progress to cirrhosis and hepatocellular carcinoma, especially in severe liver damage and perinatal infection. Histopathology of CHB is important when evaluating the final outcomes. A numerical scoring system which is a semiquantitatively assessed objective reproducible classification of chronic viral hepatitis, is a valuable tool for statistical analysis when predicting the outcome and evaluating antiviral and other therapies. In this study, a numerical scoring system (Ludwig system) was applied and compared with the conventional histological classification of De Groute. And the comparative analysis of cinical findings, family history, serology, and liver function test by histopathological findings in chronic hepatitis B of children was done. Methods: Ninety nine patients [mean age=9 years (range=17 months to 16 years)] with clinical, biochemical, serological and histological patterns of chronic HBV infection included in this study. Five of these children had hepatocelluar carcinoma. They were 83 male and 16 female children. They all underwent liver biopsies and histologic evaluation was performed by one pathologist. The biopsy specimens were classified, according to the standard criteria of De Groute as follows: normal, chronic lobular hepatitis (CLH), chronic persistent hepatitis (CPH), mild to severe chronic active hepatitis (CAH), or active cirrhosis, inactive cirrhosis, hepatocellular carcinoma (HCC). And the biopsy specimens were also assessed and scored semiquantitatively by the numerical scoring Ludwig system. Serum HBsAg, anti-HBs, HBeAg, anti-HBe, anti-HBc (IgG, IgM), and HDV were measured by radioimunoassays. Results: Male predominated in a proportion of 5.2:1 for all patients. Of 99 patients, 2 cases had normal, 2 cases had CLH, 22 cases had CPH, 40 cases had mild CAH, 19 cases had moderate CAH, 1 case had severe CAH, 7 cases had active cirrhosis, 1 case had inactive cirrhosis, and 5 cases had HCC. The mean age, sex distribution, symptoms, signs, and family history did not differ statistically among the different histologic groups. The numerical scoring system was correlated well with the conventional histological classification. The histological activity evaluated by both the conventional classification and the scoring system was more severe as the levels of serum aminotransferases were higher. In contrast, the levels of serum aminotransferases were not useful for predicting the degree of histologic activity because of its wide range overlapping. When the histological activity was more severe and especially the cirrhosis more progressing, the prothrombin time was more prolonged. The histological severity was inversely related with the duration of seroconversion of HBeAg. Conclusions: The histological activity could not be accurately predicted by clinical and biochemical findings, but by the proper histological classification of the numerical scoring system for the biopsy specimen. The numerical scoring system was correlated well with the conventional histological classification, and it seems to be a valuable tool for the statistical analysis when predicting the outcome and evaluating effects of antiviral and other therapies in chronic hepatitis B in children.

    • PDF

    Implementation of integrated monitoring system for trace and path prediction of infectious disease (전염병의 경로 추적 및 예측을 위한 통합 정보 시스템 구현)

    • Kim, Eungyeong;Lee, Seok;Byun, Young Tae;Lee, Hyuk-Jae;Lee, Taikjin
      • Journal of Internet Computing and Services
      • /
      • v.14 no.5
      • /
      • pp.69-76
      • /
      • 2013
    • The incidence of globally infectious and pathogenic diseases such as H1N1 (swine flu) and Avian Influenza (AI) has recently increased. An infectious disease is a pathogen-caused disease, which can be passed from the infected person to the susceptible host. Pathogens of infectious diseases, which are bacillus, spirochaeta, rickettsia, virus, fungus, and parasite, etc., cause various symptoms such as respiratory disease, gastrointestinal disease, liver disease, and acute febrile illness. They can be spread through various means such as food, water, insect, breathing and contact with other persons. Recently, most countries around the world use a mathematical model to predict and prepare for the spread of infectious diseases. In a modern society, however, infectious diseases are spread in a fast and complicated manner because of rapid development of transportation (both ground and underground). Therefore, we do not have enough time to predict the fast spreading and complicated infectious diseases. Therefore, new system, which can prevent the spread of infectious diseases by predicting its pathway, needs to be developed. In this study, to solve this kind of problem, an integrated monitoring system, which can track and predict the pathway of infectious diseases for its realtime monitoring and control, is developed. This system is implemented based on the conventional mathematical model called by 'Susceptible-Infectious-Recovered (SIR) Model.' The proposed model has characteristics that both inter- and intra-city modes of transportation to express interpersonal contact (i.e., migration flow) are considered. They include the means of transportation such as bus, train, car and airplane. Also, modified real data according to the geographical characteristics of Korea are employed to reflect realistic circumstances of possible disease spreading in Korea. We can predict where and when vaccination needs to be performed by parameters control in this model. The simulation includes several assumptions and scenarios. Using the data of Statistics Korea, five major cities, which are assumed to have the most population migration have been chosen; Seoul, Incheon (Incheon International Airport), Gangneung, Pyeongchang and Wonju. It was assumed that the cities were connected in one network, and infectious disease was spread through denoted transportation methods only. In terms of traffic volume, daily traffic volume was obtained from Korean Statistical Information Service (KOSIS). In addition, the population of each city was acquired from Statistics Korea. Moreover, data on H1N1 (swine flu) were provided by Korea Centers for Disease Control and Prevention, and air transport statistics were obtained from Aeronautical Information Portal System. As mentioned above, daily traffic volume, population statistics, H1N1 (swine flu) and air transport statistics data have been adjusted in consideration of the current conditions in Korea and several realistic assumptions and scenarios. Three scenarios (occurrence of H1N1 in Incheon International Airport, not-vaccinated in all cities and vaccinated in Seoul and Pyeongchang respectively) were simulated, and the number of days taken for the number of the infected to reach its peak and proportion of Infectious (I) were compared. According to the simulation, the number of days was the fastest in Seoul with 37 days and the slowest in Pyeongchang with 43 days when vaccination was not considered. In terms of the proportion of I, Seoul was the highest while Pyeongchang was the lowest. When they were vaccinated in Seoul, the number of days taken for the number of the infected to reach at its peak was the fastest in Seoul with 37 days and the slowest in Pyeongchang with 43 days. In terms of the proportion of I, Gangneung was the highest while Pyeongchang was the lowest. When they were vaccinated in Pyeongchang, the number of days was the fastest in Seoul with 37 days and the slowest in Pyeongchang with 43 days. In terms of the proportion of I, Gangneung was the highest while Pyeongchang was the lowest. Based on the results above, it has been confirmed that H1N1, upon the first occurrence, is proportionally spread by the traffic volume in each city. Because the infection pathway is different by the traffic volume in each city, therefore, it is possible to come up with a preventive measurement against infectious disease by tracking and predicting its pathway through the analysis of traffic volume.