• Title/Summary/Keyword: Python package

Search Result 29, Processing Time 0.025 seconds

Integrative Comparison of Burrows-Wheeler Transform-Based Mapping Algorithm with de Bruijn Graph for Identification of Lung/Liver Cancer-Specific Gene

  • Ajaykumar, Atul;Yang, Jung Jin
    • Journal of Microbiology and Biotechnology
    • /
    • v.32 no.2
    • /
    • pp.149-159
    • /
    • 2022
  • Cancers of the lung and liver are the top 10 leading causes of cancer death worldwide. Thus, it is essential to identify the genes specifically expressed in these two cancer types to develop new therapeutics. Although many messenger RNA (mRNA) sequencing data related to these cancer cells are available due to the advancement of next-generation sequencing (NGS) technologies, optimized data processing methods need to be developed to identify the novel cancer-specific genes. Here, we conducted an analytical comparison between Bowtie2, a Burrows-Wheeler transform-based alignment tool, and Kallisto, which adopts pseudo alignment based on a transcriptome de Bruijn graph using mRNA sequencing data on normal cells and lung/liver cancer tissues. Before using cancer data, simulated mRNA sequencing reads were generated, and the high Transcripts Per Million (TPM) values were compared. mRNA sequencing reads data on lung/liver cancer cells were also extracted and quantified. While Kallisto could directly give the output in TPM values, Bowtie2 provided the counts. Thus, TPM values were calculated by processing the Sequence Alignment Map (SAM) file in R using package Rsubread and subsequently in python. The analysis of the simulated sequencing data revealed that Kallisto could detect more transcripts and had a higher overlap over Bowtie2. The evaluation of these two data processing methods using the known lung cancer biomarkers concludes that in standard settings without any dedicated quality control, Kallisto is more effective at producing faster and more accurate results than Bowtie2. Such conclusions were also drawn and confirmed with the known biomarkers specific to liver cancer.

Using Roots and Patterns to Detect Arabic Verbs without Affixes Removal

  • Abdulmonem Ahmed;Aybaba Hancrliogullari;Ali Riza Tosun
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.4
    • /
    • pp.1-6
    • /
    • 2023
  • Morphological analysis is a branch of natural language processing, is now a rapidly growing field. The fundamental tenet of morphological analysis is that it can establish the roots or stems of words and enable comparison to the original term. Arabic is a highly inflected and derivational language and it has a strong structure. Each root or stem can have a large number of affixes attached to it due to the non-concatenative nature of Arabic morphology, increasing the number of possible inflected words that can be created. Accurate verb recognition and extraction are necessary nearly all issues in well-known study topics include Web Search, Information Retrieval, Machine Translation, Question Answering and so forth. in this work we have designed and implemented an algorithm to detect and recognize Arbic Verbs from Arabic text.The suggested technique was created with "Python" and the "pyqt5" visual package, allowing for quick modification and easy addition of new patterns. We employed 17 alternative patterns to represent all verbs in terms of singular, plural, masculine, and feminine pronouns as well as past, present, and imperative verb tenses. All of the verbs that matched these patterns were used when a verb has a root, and the outcomes were reliable. The approach is able to recognize all verbs with the same structure without requiring any alterations to the code or design. The verbs that are not recognized by our method have no antecedents in the Arabic roots. According to our work, the strategy can rapidly and precisely identify verbs with roots, but it cannot be used to identify verbs that are not in the Arabic language. We advise employing a hybrid approach that combines many principles as a result.

Development of the Command and Data Handling System and Flight Software of BITSE

  • Park, Jongyeob;Baek, Ji-Hye;Jang, Bi-ho;Choi, Seonghwan;Kim, Jihun;Yang, Heesu;Kim, Jinhyun;Kim, Yeon-Han;Cho, Kyung-Suk;Swinski, Joseph-Paul A.;Nguyen, Hanson;Newmark, Jeffrey S.;Gopalswamy, Natchumuthuk
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.2
    • /
    • pp.57.4-57.4
    • /
    • 2019
  • BITSE is a project of balloon-borne experiments for a next-generation solar coronagraph developed by a collaboration with KASI and NASA. The coronagraph is built to observe the linearly polarized brightness of solar corona with a polarization camera, a filter wheel, and an aperture door. For the observation, the coronagraph is supported by the power distribution unit (PDU), a pointing system WASP (Wallops Arc-Second Pointer), telemetry & telecommand system SIP (Support Instrument Package) which are developed at NASA's Goddard Space Flight Center, Wallops Flight Facility, and Columbia Scientific Balloon Facility. The BITSE Command and Data Handling (C&DH) system used a cost-off-the-shelf electronics to process all data sent and received by the coronagraph, including the support system operation by RS232/422, USB3, Ethernet, and digital and analog signals. The flight software is developed using the core Flight System (cFS) which is a reusable software framework and set of reusable software applications which take advantage of a rich heritage of successful space mission of NASA. The flight software can process encoding and decoding data, control the subsystems, and provide observation autonomy. We developed a python-based testing framework to improve software reliability. The flight software development is one of the crucial contributions of KASI and an important milestone for the next project which is developing a solar coronagraph to be installed at International Space Station.

  • PDF

Development of Capacity Design Aid for Rainwater Harvesting (CARAH) with Graphical User Interface (사용자 편의 환경을 갖춘 빗물이용시설의 저류 용량 결정 프로그램(CARAH) 개발)

  • Seo, Hyowon;Jin, Youngkyu;Kang, Taeuk;Lee, Sangho
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.478-478
    • /
    • 2021
  • 전 세계적으로 많은 나라들이 기후변화에 적응하기 위해 수자원 관리 전략을 마련하고 있으며, 수자원의 근간이 되는 빗물의 효율적 사용을 위해 우리나라에서도 빗물이용시설이 많이 도입되고 있다. 본 연구에서는 사용자 편의 환경(graphical user interface; GUI)을 갖춘 빗물이용시설의 용량 결정 프로그램(capacity design aid for rainwater harvesting; CARAH)을 개발하여 관련 연구와 업무에 활용성을 높이고자 하였다. CARAH는 저수지 질량 보존식과 python의 pyswarm package에 탑재된 메타 휴리스틱 방법 중 하나인 입자 군집 최적화(particle swarm optimization; PSO) 기법을 연계하여 빗물이용시설의 최적 용량을 짧은 시간에 결정될 수 있도록 개발되었다. 그리고, C#의 Windows Forms Application을 이용하여 사용자 편의 환경을 구현하였다. CARAH의 입력 자료는 모의 기간, 유입량, 목표공급량, 공급보장률이고, 출력 자료는 공급보장률-저류조용량, 목표공급량-실공급량-미달성량, 저류용량-유입량-실공급량이다. 빗물이용시설 계획에 필요한 여러 입력 자료를 쉽게 입력할 수 있도록 구현하였고, 그래프와 표의 형태로 계산된 결과를 화면에 직접 표출함으로써 사용자가 직관적으로 확인할 수 있도록 하였다. 한편, 입·출력 자료를 포함한 분석 결과는 파일로 관리할 수 있도록 기능을 갖추어 수정 및 보완 등의 반복적 활용이 가능하도록 하였다. 개발된 프로그램의 활용성을 검토하기 위해 실제 저류지가 설계된 인천의 청라지구 1공구를 대상으로 적용하였고, 분석 결과의 적절성을 확인하였다. 본 연구에서 개발된 CARAH는 빗물이용시설의 용량 결정에 관한 효율을 높일 수 있는 프로그램이고, 누구나 쉽고 간편하게 사용할 수 있는 프로그램으로서 향후 활용성이 높을 것으로 판단된다.

  • PDF

Analysis of national R&D projects related to herbal medicine (2002-2022) (한약 관련 국가연구개발사업 분석 및 고찰 (2002-2022))

  • Anna Kim;Seungho Lee;Young-Sik Kim
    • Herbal Formula Science
    • /
    • v.31 no.2
    • /
    • pp.81-98
    • /
    • 2023
  • Objectives : This study aimed to analyze the trends in research and development projects related to herbal medicine and natural products in the field of traditional Korean medicine (TKM) over the past 20 years. Methods : Research projects were identified using "Korean medicine" as the subject heading in the National Science and Technology Information Service. The included projects investigated Korean medicine, natural products, or were related to the TKM industry. Data pre-processing and network analysis were performed using Python and Networkx package, and the network was visualized using the ForceAtlas2 visualization algorithm. Results : 1. Over the study period, 4,020 projects were conducted with a research budget of KRW 835.2 billion. Seven institutions performed over 100 projects each, accounting for 2.4% of all participating institutions, and the top 10 institutions accounted for 58.9% of total projects. 2. Obesity was the most frequently mentioned disease-related keyword. Chronic or age-related diseases such as diabetes, osteoporosis, dementia, parkinson's disease, cancer, inflammation, and asthma were also frequent research topics. Clinical research, safety, and standardization were also frequently mentioned. 3. Centrality analysis found that obesity was the only disease-related keyword identified, alongside TKM-related keywords. Standardization, safety, and clinical trials were identified as central keywords. Conclusions : The study found that research projects in TKM have focused on standardizing and ensuring the safety of herbal medicine, as well as on chronic and age-related diseases. Clinical studies aimed at verifying the effectiveness of herbal medicine were also frequent. These findings can guide future research and development in herbal medicine.

Performance Study on Odor Reduction of Indole/Skatole by Composite

  • Young-Do Kim
    • Journal of Wellbeing Management and Applied Psychology
    • /
    • v.7 no.3
    • /
    • pp.67-72
    • /
    • 2024
  • This study developed a dry composite module-type deodorization facility with Twisting airflow changes and two forms (catalyst, adsorbent) within one module. Experiments were conducted to evaluate the reduction efficiency of odor substances C8H7N and C9H9N. The device combines UV oxidation using TiO2, catalytic oxidation using MnO2, and adsorption using A/C in five different methods. Data analysis of experimental results utilized the statistical package program Python 3.12. The program applied frequency analysis of odor removal efficiency, one-way ANOVA, and post-hoc tests, with statistical significance determined by p-value to ensure reliability and validity of the measurements. Results indicated that the highest removal efficiency of C8H7N and C9H9N was achieved by the UV+A/C method, suggesting the superior effectiveness and efficiency of the developed device. Combining multiple processes and technologies within one module enhanced odor treatment efficiency compared to using a single method. The device's modularity allows for flexibility in adapting to various sewage treatment scenarios, offering easy maintenance and cost-effective deodorization. This composite reaction module device can apply multiple technologies, such as biofilters, plasma, activated carbon filters, UV-photocatalysis, and electromagnetic-chemical systems. However, this study focused on UV-photocatalysis, catalysts, and activated carbon filters. Ultimately, the research demonstrates the practical applicability of this innovative device in real sewage treatment operations, showing excellent reduction efficiency and effectiveness by integrating UV oxidation, TiO2 photocatalysis, MnO2 catalytic oxidation, and A/C adsorption within a modular system.

The Exploratory Analysis on the Registry Data of Patients with Low Back Pain Applying Correlation Analysis Method (Correlation 분석 기법을 적용한 요통 환자에 관한 레지스트리 데이터의 탐색적 분석)

  • Park, Chang-Hyun;Park, Mu-Sun;Kim, Hyung-Suk;Cha, Yun-Yeop;Kim, Soon-Joong;Ko, Youn-Suk;Oh, Min-Seok;Hwang, Eui-Hyoung;Shin, Byung-Cheul;Kim, Chang-Eop;Song, Yun-Kyung
    • Journal of Korean Medicine Rehabilitation
    • /
    • v.27 no.4
    • /
    • pp.97-109
    • /
    • 2017
  • Objectives The aim of this study is to analyze the patients who have low back pain through registry. Methods We registered patients with low back pain who visited department of korean rehabilitation medicine in university hospitals on study. We collected data from 116 subjects consisted of 51 inpatients and 65 outpatients and ruled out 8 who didn't have pattern identification data at the point of inpatient or outpatient visit so we analyzed 108 in total. We used Pearson's product moment correlation to find correlationship among variables, and analyzed statistical data using Phyton scipy library stats package. Results We set general features, region of the pain, physical examination, ROM, questionnaire results, pattern identification as variables and draw a conclusion by analyzing these variables. Conclusions Registry aimed at low back pain patients was established in department of korean rehabilitation medicine of university hospitals and exploratory analysis based on data were made. Through the registry, we expect that more advanced studies will be performed; for example, executing research which verifies effectiveness and stability of korean medical treatment or developing tools to fill the gap between pattern identification and disease identification.

Application of text-mining technique and machine-learning model with clinical text data obtained from case reports for Sasang constitution diagnosis: a feasibility study (자연어 처리에 기반한 사상체질 치험례의 텍스트 마이닝 분석과 체질 진단을 위한 머신러닝 모델 선정)

  • Jinseok Kim;So-hyun Park;Roa Jeong;Eunsu Lee;Yunseo Kim;Hyundong Sung;Jun-sang Yu
    • The Journal of Korean Medicine
    • /
    • v.45 no.3
    • /
    • pp.193-210
    • /
    • 2024
  • Objectives: We analyzed Sasang constitution case reports using text mining to derive network analysis results and designed a classification algorithm using machine learning to select a model suitable for classifying Sasang constitution based on text data. Methods: Case reports on Sasang constitution published from January 1, 2000, to December 31, 2022, were searched. As a result, 343 papers were selected, yielding 454 cases. Extracted texts were pretreated and tokenized with the Python-based KoNLPy package. Each morpheme was vectorized using TF-IDF values. Word cloud visualization and centrality analysis identified keywords mainly used for classifying Sasang constitution in clinical practice. To select the most suitable classification model for diagnosing Sasang constitution, the performance of five models-XGBoost, LightGBM, SVC, Logistic Regression, and Random Forest Classifier-was evaluated using accuracy and F1-Score. Results: Through word cloud visualization and centrality analysis, specific keywords for each constitution were identified. Logistic regression showed the highest accuracy (0.839416), while random forest classifier showed the lowest (0.773723). Based on F1-Score, XGBoost scored the highest (0.739811), and random forest classifier scored the lowest (0.643421). Conclusions: This is the first study to analyze constitution classification by applying text mining and machine learning to case reports, providing a concrete research model for follow-up research. The keywords selected through text mining were confirmed to effectively reflect the characteristics of each Sasang constitution type. Based on text data from case reports, the most suitable machine learning models for diagnosing Sasang constitution are logistic regression and XGBoost.

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

  • Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.111-136
    • /
    • 2018
  • In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.