• Title/Summary/Keyword: TextMining

Search Result 1,563, Processing Time 0.023 seconds

Comparative co-expression analysis of RNA-Seq transcriptome revealing key genes, miRNA and transcription factor in distinct metabolic pathways in diabetic nerve, eye, and kidney disease

  • Asmy, Veerankutty Subaida Shafna;Natarajan, Jeyakumar
    • Genomics & Informatics
    • /
    • v.20 no.3
    • /
    • pp.26.1-26.19
    • /
    • 2022
  • Diabetes and its related complications are associated with long term damage and failure of various organ systems. The microvascular complications of diabetes considered in this study are diabetic retinopathy, diabetic neuropathy, and diabetic nephropathy. The aim is to identify the weighted co-expressed and differentially expressed genes (DEGs), major pathways, and their miRNA, transcription factors (TFs) and drugs interacting in all the three conditions. The primary goal is to identify vital DEGs in all the three conditions. The overlapped five genes (AKT1, NFKB1, MAPK3, PDPK1, and TNF) from the DEGs and the co-expressed genes were defined as key genes, which differentially expressed in all the three cases. Then the protein-protein interaction network and gene set linkage analysis (GSLA) of key genes was performed. GSLA, gene ontology, and pathway enrichment analysis of the key genes elucidates nine major pathways in diabetes. Subsequently, we constructed the miRNA-gene and transcription factor-gene regulatory network of the five gene of interest in the nine major pathways were studied. hsa-mir-34a-5p, a major miRNA that interacted with all the five genes. RELA, FOXO3, PDX1, and SREBF1 were the TFs interacting with the major five gene of interest. Finally, drug-gene interaction network elucidates five potential drugs to treat the genes of interest. This research reveals biomarker genes, miRNA, TFs, and therapeutic drugs in the key signaling pathways, which may help us, understand the processes of all three secondary microvascular problems and aid in disease detection and management.

A Study on Educational Data Mining for Public Data Portal through Topic Modeling Method with Latent Dirichlet Allocation (LDA기반 토픽모델링을 활용한 공공데이터 기반의 교육용 데이터마이닝 연구)

  • Seungki Shin
    • Journal of The Korean Association of Information Education
    • /
    • v.26 no.5
    • /
    • pp.439-448
    • /
    • 2022
  • This study aims to search for education-related datasets provided by public data portals and examine what data types are constructed through classification using topic modeling methods. Regarding the data of the public data portal, 3,072 cases of file data in the education field were collected based on the classification system. Text mining analysis was performed using the LDA-based topic modeling method with stopword processing and data pre-processing for each dataset. Program information and student-supporting notifications were usually provided in the pre-classified dataset for education from the data portal. On the other hand, the characteristics of educational programs and supporting information for the disabled, parents, the elderly, and children through the perspective of lifelong education were generally indicated in the dataset collected by searching for education. The results of data analysis through this study show that providing sufficient educational information through the public data portal would be better to help the students' data science-based decision-making and problem-solving skills.

Analysis of the ESG Research Trend : Focusing on SCOPUS DB (ESG 주요 연구 동향 분석: SCOPUS DB를 중심으로)

  • Kyoo-Sung Noh
    • Journal of Digital Convergence
    • /
    • v.21 no.2
    • /
    • pp.9-16
    • /
    • 2023
  • The purpose of this study is to analyze research trends on ESG (Environmental, Social, and Governance), and to present a direction for companies and investors to use ESG information. To this end, text mining, one of the atypical data mining techniques, was used for analysis. Thesis abstracts from January 2014 to February 2023 were collected from the SCOPUS database, and Economics, Econometrics and Finance were the most common. The United States and China published the most ESG papers, and Korea published the 6th most papers in the world. This study is meaningful in that it analyzed the main research trends of ESG using text mining techniques such as LDA and topic modeling. It was confirmed that ESG is being conducted in various fields, not in a specific field, and it is differentiated from previous studies in that it analyzed various influencing factors and ripple effects of ESG.

A Study on Plagiarism Detection and Document Classification Using Association Analysis (연관분석을 이용한 효과적인 표절검사 및 문서분류에 관한 연구)

  • Hwang, Insoo
    • The Journal of Information Systems
    • /
    • v.23 no.3
    • /
    • pp.127-142
    • /
    • 2014
  • Plagiarism occurs when the content is copied without permission or citation, and the problem of plagiarism has rapidly increased because of the digital era of resources available on the World Wide Web. An important task in plagiarism detection is measuring and determining similar text portions between a given pair of documents. One of the main difficulties of this task is that not all similar text fragments are examples of plagiarism, since thematic coincidences also tend to produce portions of similar text. In order to handle this problem, this paper proposed association analysis in data mining to detect plagiarism. This method is able to detect common actions performed by plagiarists such as word deletion, insertion and transposition, allowing to obtain plausible portions of plagiarized text. Experimental results employing an unsupervised document classification strategy showed that the proposed method outperformed traditionally used approaches.

The Informative Support and Emotional Support Classification Model for Medical Web Forums using Text Analysis (의료 웹포럼에서의 텍스트 분석을 통한 정보적 지지 및 감성적 지지 유형의 글 분류 모델)

  • Woo, Jiyoung;Lee, Min-Jung;Ku, Yungchang
    • Journal of Information Technology Services
    • /
    • v.11 no.sup
    • /
    • pp.139-152
    • /
    • 2012
  • In the medical web forum, people share medical experience and information as patients and patents' families. Some people search medical information written in non-expert language and some people offer words of comport to who are suffering from diseases. Medical web forums play a role of the informative support and the emotional support. We propose the automatic classification model of articles in the medical web forum into the information support and emotional support. We extract text features of articles in web forum using text mining techniques from the perspective of linguistics and then perform supervised learning to classify texts into the information support and the emotional support types. We adopt the Support Vector Machine (SVM), Naive-Bayesian, decision tree for automatic classification. We apply the proposed model to the HealthBoards forum, which is also one of the largest and most dynamic medical web forum.

Using Text Network Analysis for Analyzing Academic Papers in Nursing (간호학 학술논문의 주제 분석을 위한 텍스트네크워크분석방법 활용)

  • Park, Chan Sook
    • Perspectives in Nursing Science
    • /
    • v.16 no.1
    • /
    • pp.12-24
    • /
    • 2019
  • Purpose: This study examined the suitability of using text network analysis (TNA) methodology for topic analysis of academic papers related to nursing. Methods: TNA background theories, software programs, and research processes have been described in this paper. Additionally, the research methodology that applied TNA to the topic analysis of the academic nursing papers was analyzed. Results: As background theories for the study, we explained information theory, word co-occurrence analysis, graph theory, network theory, and social network analysis. The TNA procedure was described as follows: 1) collection of academic articles, 2) text extraction, 3) preprocessing, 4) generation of word co-occurrence matrices, 5) social network analysis, and 6) interpretation and discussion. Conclusion: TNA using author-keywords has several advantages. It can utilize recognized terms such as MeSH headings or terms chosen by professionals, and it saves time and effort. Additionally, the study emphasizes the necessity of developing a sophisticated research design that explores nursing research trends in a multidimensional method by applying TNA methodology.

Research on Construction Quality Problem Prevention

  • Shaohua Jiang;Jingqi Zhang
    • International conference on construction engineering and project management
    • /
    • 2024.07a
    • /
    • pp.846-854
    • /
    • 2024
  • A project's success is directly guaranteed by the prevention of construction-related problems. Nevertheless, the prevention of quality issues frequently overlooks how issues are coupled with one another, which might result in a domino effect of quality issues. In order to solve the above problems, this work first preprocesses unstructured text data with quality problem coupling. Then the pre-processing data is used to build a knowledge base for the prevention of construction quality problems. Then the text similarity algorithm is used to mine the coupling relationship between the qualities and enrich the information in the database. Finally, some text is used as test object to verify the validity of the method. This study enriches the research around the prevention of building quality problems.

Opinion-Mining Methodology for Social Media Analytics

  • Kim, Yoosin;Jeong, Seung Ryul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.1
    • /
    • pp.391-406
    • /
    • 2015
  • Social media have emerged as new communication channels between consumers and companies that generate a large volume of unstructured text data. This social media content, which contains consumers' opinions and interests, is recognized as valuable material from which businesses can mine useful information; consequently, many researchers have reported on opinion-mining frameworks, methods, techniques, and tools for business intelligence over various industries. These studies sometimes focused on how to use opinion mining in business fields or emphasized methods of analyzing content to achieve results that are more accurate. They also considered how to visualize the results to ensure easier understanding. However, we found that such approaches are often technically complex and insufficiently user-friendly to help with business decisions and planning. Therefore, in this study we attempt to formulate a more comprehensive and practical methodology to conduct social media opinion mining and apply our methodology to a case study of the oldest instant noodle product in Korea. We also present graphical tools and visualized outputs that include volume and sentiment graphs, time-series graphs, a topic word cloud, a heat map, and a valence tree map with a classification. Our resources are from public-domain social media content such as blogs, forum messages, and news articles that we analyze with natural language processing, statistics, and graphics packages in the freeware R project environment. We believe our methodology and visualization outputs can provide a practical and reliable guide for immediate use, not just in the food industry but other industries as well.

Sentiment Analysis and Network Analysis based on Review Text (리뷰 텍스트 기반 감성 분석과 네트워크 분석에 관한 연구)

  • Kim, Yumi;Heo, Go Eun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.55 no.3
    • /
    • pp.397-417
    • /
    • 2021
  • As review text contains the experience and opinions of the customers, analyzing review text helps to understand the subject. Existing studies either only used sentiment analysis on online restaurant reviews to identify the customers' assessment on different features of the restaurant or network analysis to figure out the customers' preference. In this study, we conducted both sentiment analysis and network analysis on the review text of the restaurants with high star ratings and those with low star ratings. We compared the review text of the two groups to distinguish the difference of the two and identify what makes great restaurants great.

An Efficient Machine Learning-based Text Summarization in the Malayalam Language

  • P Haroon, Rosna;Gafur M, Abdul;Nisha U, Barakkath
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1778-1799
    • /
    • 2022
  • Automatic text summarization is a procedure that packs enormous content into a more limited book that incorporates significant data. Malayalam is one of the toughest languages utilized in certain areas of India, most normally in Kerala and in Lakshadweep. Natural language processing in the Malayalam language is relatively low due to the complexity of the language as well as the scarcity of available resources. In this paper, a way is proposed to deal with the text summarization process in Malayalam documents by training a model based on the Support Vector Machine classification algorithm. Different features of the text are taken into account for training the machine so that the system can output the most important data from the input text. The classifier can classify the most important, important, average, and least significant sentences into separate classes and based on this, the machine will be able to create a summary of the input document. The user can select a compression ratio so that the system will output that much fraction of the summary. The model performance is measured by using different genres of Malayalam documents as well as documents from the same domain. The model is evaluated by considering content evaluation measures precision, recall, F score, and relative utility. Obtained precision and recall value shows that the model is trustable and found to be more relevant compared to the other summarizers.