• Title/Summary/Keyword: Sequence Mining

Search Result 163, Processing Time 0.03 seconds

The Philippines Coconut Genomics Initiatives: Updates and Opportunities for Capacity Building and Genomics Research Collaboration

  • Hayde Flandez-Galvez;Darlon V. Lantican;Anand Noel C. Manohar;Maria Luz J. Sison;Roanne R. Gardoce;Barbara L. Caoili;Alma O. Canama-Salinas;Melvin P. Dancel;Romnick A. Latina;Cris Q. Cortaga;Don Serville R. Reynoso;Michelle S. Guerrero;Susan M. Rivera;Ernesto E. Emmanuel;Cristeta Cueto;Consorcia E. Reano;Ramon L. Rivera;Don Emanuel M. Cardona;Edward Cedrick J. Fernandez ;Robert Patrick M. Cabangbang;Maria Salve C. Vasquez;Jomari C. Domingo;Reina Esther S. Caro;Alissa Carol M. Ibarra;Frenzee Kroeizha L. Pammit;Jen Daine L. Nocum;Angelica Kate G. Gumpal;Jesmar Cagayan;Ronilo M. Bajaro;Joseph P. Lagman;Cynthia R. Gulay;Noe Fernandez-Pozo;Susan R. Strickler;Lukas A. Mueller
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2022.10a
    • /
    • pp.30-30
    • /
    • 2022
  • Philippines is the second world supplier of coconut by-products. As its first major genomics project, the Philippine Genome Center program for Agriculture (PGC-Agriculture) took the challenge to sequence and assemble the whole coconut genome. The project aims to provide advance genetics tools for our collaborating coconut researchers while taking the opportunity to initiate local capacity. Combination of different NGS platforms was explored and the Philippine 'Catigan Green Dwarf' (CATD) variety was selected with the breeders to be the crop's reference genome. A high quality genome assembly of CATD was generated and used to characterize important genes of coconut towards the development of resilient and outstanding varieties especially for added high-value traits. The talk will present the significant results of the project as published in various papers including the first report of whole genome sequence of a dwarf coconut variety. Updates will include the challenges hurdled and specific applications such as gene mining for host insect resistance and screening for least damaged coconuts (thus potentially insect resistant varieties). Genome-wide DNA markers as published and genes related to coconut oil qualitative/quantitative traits will also be presented, including initial molecular/biochemical studies that support nutritional and medicinal claims. A web-based genome database is currently built for ease access and wider utility of these genomics tools. Indeed, a major milestone accomplished by the coconut genomics research team, which was facilitated with the all-out government support and strong collaboration among multidisciplinary experts and partnership with advance research institutes.

  • PDF

Mining of Caspase-7 Substrates Using a Degradomic Approach

  • Jang, Mi;Park, Byoung Chul;Kang, Sunghyun;Lee, Do Hee;Cho, Sayeon;Lee, Sang-Chul;Bae, Kwang-Hee;Park, Sung-Goo
    • Molecules and Cells
    • /
    • v.26 no.2
    • /
    • pp.152-157
    • /
    • 2008
  • Caspases play critical roles in the execution of apoptosis. Caspase-3 and caspase-7 are closely related in sequence as well as in substrate specificity. The two caspases have overlapping substrate specificities with special preference for the DEVD motif. However, they are targeted to different subcellular locations during apoptosis, implying the existence of substrates specific for one or other caspase. To identify new caspase-7 substrates, we digested cell lysates obtained from the caspase-3-deficient MCF-7 cell line with purified recombinant caspase-7, and analyzed spots that disappeared or decreased by 2-DE (we refer to this as the caspase-7 degradome). Several proteins with various cellular functions underwent caspase-7-dependent proteolysis. The substrates of capase-7 identified by the degradomic approach were rather different from those of caspase-3 (Proteomics, 4, 3429-3435, 2004). Among the candidate substrates, we confirmed that Valosin-containing protein (VCP) was cleaved by both capspase-7 and caspase-3 in vitro and during apoptosis. Cleavage occurred at both $DELD^{307}$ and $DELD^{580}$. The degradomic study yielded several candidate caspase-7 substrates and their further analysis should provide valuables clues to the functions of caspase-7 during apoptosis.

Development of SNP marker set for marker-assisted backcrossing (MABC) in cultivating tomato varieties

  • Park, GiRim;Jang, Hyun A;Jo, Sung-Hwan;Park, Younghoon;Oh, Sang-Keun;Nam, Moon
    • Korean Journal of Agricultural Science
    • /
    • v.45 no.3
    • /
    • pp.385-400
    • /
    • 2018
  • Marker-assisted backcrossing (MABC) is useful for selecting offspring with a highly recovered genetic background for a recurrent parent at early generation unlike rice and other field crops. Molecular marker sets applicable to practical MABC are scarce in vegetable crops including tomatoes. In this study, we used the National Center for Biotechnology Information- short read archive (NCBI-SRA) database that provided the whole genome sequences of 234 tomato accessions and selected 27,680 tag-single nucleotide polymorphisms (tag-SNPs) that can identify haplotypes in the tomato genome. From this SNP dataset, a total of 143 tag-SNPs that have a high polymorphism information content (PIC) value (> 0.3) and are physically evenly distributed on each chromosome were selected as a MABC marker set. This marker set was tested for its polymorphism in each pairwise cross combination constructed with 124 of the 234 tomato accessions, and a relatively high number of SNP markers polymorphic for the cross combination was observed. The reliability of the MABC SNP set was assessed by converting 18 SNPs into Luna probe-based high-resolution melting (HRM) markers and genotyping nine tomato accessions. The results show that the SNP information and HRM marker genotype matched in 98.6% of the experiment data points, indicating that our sequence analysis pipeline for SNP mining worked successfully. The tag-SNP set for the MABC developed in this study can be useful for not only a practical backcrossing program but also for cultivar identification and F1 seed purity test in tomatoes.

GEDA: New Knowledge Base of Gene Expression in Drug Addiction

  • Suh, Young-Ju;Yang, Moon-Hee;Yoon, Suk-Joon;Park, Jong-Hoon
    • BMB Reports
    • /
    • v.39 no.4
    • /
    • pp.441-447
    • /
    • 2006
  • Abuse of drugs can elicit compulsive drug seeking behaviors upon repeated administration, and ultimately leads to the phenomenon of addiction. We developed a procedure for the standardization of microarray gene expression data of rat brain in drug addiction and stored them in a single integrated database system, focusing on more effective data processing and interpretation. Another characteristic of the present database is that it has a systematic flexibility for statistical analysis and linking with other databases. Basically, we adopt an intelligent SQL querying system, as the foundation of our DB, in order to set up an interactive module which can automatically read the raw gene expression data in the standardized format. We maximize the usability of this DB, helping users study significant gene expression and identify biological function of the genes through integrated up-to-date gene information such as GO annotation and metabolic pathway. For collecting the latest information of selected gene from the database, we also set up the local BLAST search engine and non-redundant sequence database updated by NCBI server on a daily basis. We find that the present database is a useful query interface and data-mining tool, specifically for finding out the genes related to drug addiction. We apply this system to the identification and characterization of methamphetamine-induced genes' behavior in rat brain.

A Sequential Pattern Analysis for Dynamic Discovery of Customers' Preference (고객의 동적 선호 탐색을 위한 순차패턴 분석: (주)더페이스샵 사례)

  • Song, Ki-Ryong;Noh, Soeng-Ho;Lee, Jae-Kwang;Choi, Il-Young;Kim, Jae-Kyeong
    • Information Systems Review
    • /
    • v.10 no.2
    • /
    • pp.195-209
    • /
    • 2008
  • Customers' needs change every moment. Profitability of stores can't be increased anymore with an existing standardized chain store management. Accordingly, a personalized store management tool needs through prediction of customers' preference. In this study, we propose a recommending procedure using dynamic customers' preference by analyzing the transaction database. We utilize self-organizing map algorithm and association rule mining which are applied to cluster the chain stores and explore purchase sequence of customers. We demonstrate that the proposed methodology makes an effect on recommendation of products in the market which is characterized by a fast fashion and a short product life cycle.

Fault-Causing Process and Equipment Analysis of PCB Manufacturing Lines Using Data Mining Techniques (데이터마이닝 기법을 이용한 PCB 제조라인의 불량 혐의 공정 및 설비 분석)

  • Sim, Hyun Sik;Kim, Chang Ouk
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.2
    • /
    • pp.65-70
    • /
    • 2015
  • In the PCB(Printed Circuit Board) manufacturing industry, the yield is an important management factor because it affects the product cost and quality significantly. In real situation, it is very hard to ensure a high yield in a manufacturing shop because products called chips are made through hundreds of nano-scale manufacturing processes. Therefore, in order to improve the yield, it is necessary to analyze main fault process and equipment that cause low PCB yield. This paper proposes a systematic approach to discover fault-causing processes and equipment by using a logistic regression and a stepwise variable selection procedure. We tested our approach with lot trace records of real work-site. A lot trace record consists of the equipment sequence that the lot passed through and the number of faults for each fault type in the lot. We demonstrated that the test results reflected the real situation of a PCB manufacturing line.

Classification and Characteristics of Chitin/Chitosan Hydrolases (키틴/키토산 가수분해효소의 분류 및 특성)

  • Lee, Han-Seung
    • Journal of Life Science
    • /
    • v.18 no.11
    • /
    • pp.1617-1624
    • /
    • 2008
  • Chitin and chitosan, which is deacetylated form of chitin, are one of the most abundant biomass on the earth. They showed various biological activities including antimicrobial activity, heavy metal chelating, immune system activation, and have very diverse applications in food, pharmaceutical, medicinal, and environmental industry. There have been reported many chitin/chitosan-hydrolyzing enzymes, their structures and genes from three domains, archaea, bacteria, and eukarya. Carbohydrate hydrolyzing enzymes are classified in CAZy (Carbohydrate Active Enzymes) database according to their amino acid sequence similarity. Interestingly, chitinases and chitosanases are classified in various glycosyl hydrolase(GH) families, GH2, GH5, GH7, GH8, GH18, GH19, GH20, GH46, GH48, GH73, GH75, GH80, GH84, and GH85. Here, we review characteristics and structures of chitin/chitosan hydrolyzing enzymes according to glycosyl hydrolase families in order to provide information about gene mining.

Load-Balancing Rendezvous Approach for Mobility-Enabled Adaptive Energy-Efficient Data Collection in WSNs

  • Zhang, Jian;Tang, Jian;Wang, Zhonghui;Wang, Feng;Yu, Gang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.3
    • /
    • pp.1204-1227
    • /
    • 2020
  • The tradeoff between energy conservation and traffic balancing is a dilemma problem in Wireless Sensor Networks (WSNs). By analyzing the intrinsic relationship between cluster properties and long distance transmission energy consumption, we characterize three node sets of the cluster as a theoretical foundation to enhance high performance of WSNs, and propose optimal solutions by introducing rendezvous and Mobile Elements (MEs) to optimize energy consumption for prolonging the lifetime of WSNs. First, we exploit an approximate method based on the transmission distance from the different node to an ME to select suboptimal Rendezvous Point (RP) on the trajectory for ME to collect data. Then, we define data transmission routing sequence and model rendezvous planning for the cluster. In order to achieve optimization of energy consumption, we specifically apply the economic theory called Diminishing Marginal Utility Rule (DMUR) and create the utility function with regard to energy to develop an adaptive energy consumption optimization framework to achieve energy efficiency for data collection. At last, Rendezvous Transmission Algorithm (RTA) is proposed to better tradeoff between energy conservation and traffic balancing. Furthermore, via collaborations among multiple MEs, we design Two-Orbit Back-Propagation Algorithm (TOBPA) which concurrently handles load imbalance phenomenon to improve the efficiency of data collection. The simulation results show that our solutions can improve energy efficiency of the whole network and reduce the energy consumption of sensor nodes, which in turn prolong the lifetime of WSNs.

Robust Feature Selection and Shot Change Detection Method Using the Neural Networks (강인한 특징 변수 선별과 신경망을 이용한 장면 전환점 검출 기법)

  • Hong, Seung-Bum;Hong, Gyo-Young
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.7
    • /
    • pp.877-885
    • /
    • 2004
  • In this paper, we propose an enhancement shot change detection method using the neural net and the robust feature selection out of multiple features. The previous shot change detection methods usually used single feature and fixed threshold between consecutive frames. However, contents such as color, shape, background, and texture change simultaneously at shot change points in a video sequence. Therefore, in this paper, we detect the shot changes effectively using robust features, which are supplementary each other, rather than using single feature. In this paper, we use the typical CART (classification and regression tree) of data mining method to select the robust features, and the backpropagation neural net to determine the threshold of the each selected features. And to evaluation the performance of the robust feature selection, we compare the proposed method to the PCA(principal component analysis) method of the typical feature selection. According to the experimental result. it was revealed that the performance of our method had better that than the PCA method.

  • PDF

Bio Grid Computing and Biosciences Research Application (바이오그리드 컴퓨팅과 생명과학 연구에의 활용)

  • Kim, Tae-Ho;Kim, Eui-Yong;Youm, Jae-Boum;Kho, Weon-Gyu;Gwak, Heui-Chul;Joo, Hyun
    • Bioinformatics and Biosystems
    • /
    • v.2 no.2
    • /
    • pp.37-45
    • /
    • 2007
  • 생물정보학은 컴퓨터를 이용하여 방대한 양의 생물학적 데이터를 처리하고 그 결과를 분석하는 학문으로서 IT의 고속성장과 맞물려 점차 그 활용도를 넓혀가고 있다. 특히 의학, 생명과학 연구에 사용되는 데이터는 그 종류도 다양하고 크기가 매우 큰 것이 일반적인데, 이의 처리를 위해서는 고속 네트워크가 바탕이 된 그리드-컴퓨팅(Grid-Computing) 기술 접목이 필연적이다. 고속 네트워크 기술의 발전은 슈퍼컴퓨터를 대체해 컴퓨터 풀 내에 분산된 시스템들을 하나로 묶을 수 있는 그리드-컴퓨팅 분야를 선도하고 있다. 최근 생물정보학 분야에서도 이처럼 발전된 고성능 분산 컴퓨팅 기술을 이용하여 데이터의 신속한 처리와 관리의 효율성을 증대시키고 있는 추세이다. 그리드-컴퓨팅 기술은 크게 데이터 가공을 위한 응용 프로그램 개발과 데이터 관리를 위한 데이터베이스 구축으로 구분 지을 수 있다. 전자에 해당하는 생물정보 연구용 프로그램들은 mpiBLAST, ClustalW-MPI와 같은 MSA서열정렬 프로그램들을 꼽을 수 있으며, BioSimGrid, Taverna와 같은 프로젝트는 그리드-데이터베이스 (Grid-Database)기술을 바탕으로 개발되었다. 본 고에서는 미지의 생명현상을 탐구하고 연구하기 위하여 현재까지 개발된 그리드-컴퓨팅 환경과 의생명과학 연구를 위한 응용 프로그램들, 그리고 그리드-데이터베이스 기술 등을 소개한다.

  • PDF