• Title/Summary/Keyword: huge sample

Search Result 64, Processing Time 0.025 seconds

EST Analysis system for panning gene

  • Hur, Cheol-Goo;Lim, So-Hyung;Goh, Sung-Ho;Shin, Min-Su;Cho, Hwan-Gue
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.21-22
    • /
    • 2000
  • Expressed sequence tags (EFTs) are the partial segments of cDNA produced from 5 or 3 single-pass sequencing of cDNA clones, error-prone and generated in highly redundant sets. Advancement and expansion of Genomics made biologists to generate huge amount of ESTs from variety of organisms-human, microorganisms as well as plants, and the cumulated number of ESTs is over 5.3 million, As the EST data being accumulate more rapidly, it becomes bigger that the needs of the EST analysis tools for extraction of biological meaning from EST data. Among the several needs of EST analyses, the extraction of protein sequence or functional motifs from ESTs are important for the identification of their function in vivo. To accomplish that purpose the precise and accurate identification of the region where the coding sequences (CDSs) is a crucial problem to solve primarily, and it will be helpful to extract and detect of genuine CD5s and protein motifs from EST collections. Although several public tools are available for EST analysis, there is not any one to accomplish the object. Furthermore, they are not targeted to the plant ESTs but human or microorganism. Thus, to correspond the urgent needs of collaborators deals with plant ESTs and to establish the analysis system to be used as general-purpose public software we constructed the pipelined-EST analysis system by integration of public software components. The software we used are as follows - Phred/Cross-match for the quality control and vector screening, NCBI Blast for the similarity searching, ICATools for the EST clustering, Phrap for EST contig assembly, and BLOCKS/Prosite for protein motif searching. The sample data set used for the construction and verification of this system was 1,386 ESTs from human intrathymic T-cells that verified using UniGene and Nr database of NCBI. The approach for the extraction of CDSs from sample data set was carried out by comparison between sample data and protein sequences/motif database, determining matched protein sequences/motifs that agree with our defined parameters, and extracting the regions that shows similarities. In recent future, in addition to these components, it is supposed to be also integrated into our system and served that the software for the peptide mass spectrometry fingerprint analysis, one of the proteomics fields. This pipelined-EST analysis system will extend our knowledge on the plant ESTs and proteins by identification of unknown-genes.

  • PDF

A Method for Observation of Benign, Premalignant and Malignant Changes in Clinical Skin Tissue Samples via FT -IR Microspectroscopy

  • Skrebova, Natalja;Aizawa, Katsuo;Ozaki, Yukihiro;Arase, Seiji
    • Journal of Photoscience
    • /
    • v.9 no.2
    • /
    • pp.457-459
    • /
    • 2002
  • Sunlight causes various types of adverse skin changes on the sun-exposed areas of the skin, in which the most hazardous one is the induction of malignant skin tumours. FT -IR spectra were obtained from specimens excised from normal skin, BCCs, SCCs, MMs, nevi, lesions of solar keratosis and Bowen's disease. Tissue samples from freshly frozen specimens were cut into 2 sections in strictly sequential order to be stained with H & E for histopathological analysis, and then to be air-dried on CaF$_2$ slide glasses for further spectral data acquisition from defined area of interest. Intra- and inter-sample variations were estimated within grouped lesion categories according to each skin component. Mean spectra for each type of tissue pathology in the 800-1800 $cm^{-1}$ / region was interpreted using the classical group frequency approach that showed the most visible differences in spectra of benign, premalignant and malignant changes directly related to protein conformation and nucleic acid bases. The relative intensity of the nucleic acid peak was increased with progression to malignancy. In addition, PCA was able to evaluate and maximise the differences in the spectra by reducing the number of variables characterizing each patient and pathology category. This type of approach to non-destructively estimate the complexity of IR-spectra of inhomogeneous samples such as skin demonstrates the advantage of FT -IR microspectroscopy to be able to observe diseased states (benign, premalignant, malignant) and distinguish them from normal against a huge background of inter- and intra-subject variability.

  • PDF

Assessment of Carbon Sequestration Potential in Degraded and Non-Degraded Community Forests in Terai Region of Nepal

  • Joshi, Rajeev;Singh, Hukum;Chhetri, Ramesh;Yadav, Karan
    • Journal of Forest and Environmental Science
    • /
    • v.36 no.2
    • /
    • pp.113-121
    • /
    • 2020
  • This study was carried out in degraded and non-degraded community forests (CF) in the Terai region of Kanchanpur district, Nepal. A total of 63 concentric sample plots each of 500 ㎡ was laid in the inventory for estimating above and below-ground biomass of forests by using systematic random sampling with a sampling intensity of 0.5%. Mallotus philippinensis and Shorea robusta were the most dominant species in degraded and non-degraded CF accounting Importance Value Index (I.V.I) of 97.16 and 178.49, respectively. Above-ground tree biomass carbon in degraded and non-degraded community forests was 74.64±16.34 t ha-1 and 163.12±20.23 t ha-1, respectively. Soil carbon sequestration in degraded and non-degraded community forests was 42.55±3.10 t ha-1 and 54.21±3.59 t ha-1, respectively. Hence, the estimated total carbon stock was 152.68±22.95 t ha-1 and 301.08±27.07 t ha-1 in degraded and non-degraded community forests, respectively. It was found that the carbon sequestration in the non-degraded community forest was 1.97 times higher than in the degraded community forest. CO2 equivalent in degraded and non-degraded community forests was 553 t ha-1 and 1105 t ha-1, respectively. Statistical analysis showed a significant difference between degraded and non-degraded community forests in terms of its total biomass and carbon sequestration potential (p<0.05). Studies indicate that the community forest has huge potential and can reward economic benefits from carbon trading to benefit from the REDD+/CDM mechanism by promoting the sustainable conservation of community forests.

Analysis of Component for Determining Illegal Gasoline (가짜휘발유 판정을 위한 성분 분석)

  • Lim, Young-Kwan;Won, Ki-Yoe;Kang, Byung-Seok;Park, So-Hwi;Jung, Seong;Go, Young-Hoon;Kim, Seong-Soo;Jung, Gil-Hyoung
    • Tribology and Lubricants
    • /
    • v.36 no.3
    • /
    • pp.161-167
    • /
    • 2020
  • Petroleum is the most used energy source in Korea with a usage rate of 39.5% among the available 1st energy source. The price of liquid petroleum products in Korea includes a lot of tax such as transportation·environment·energy tax. Thus, illegal production and distribution of liquid petroleum is widespread because of its huge price difference, including its tax-free nature, from that of the normal product. Generally, illegal petroleum product is produced by illegally mixing liquid petroleum with other similar petroleum alternatives. In such case, it is easy to distinguish whether the product is illegal by analyzing its physical properties and typical components. However, if one the components of original petroleum product is added to illegal petroleum, distinguishing between the two petroleum products will be difficult. In this research, we inspect illegally produced gasoline, which is mixed with methyl tertiary butyl ether (MTBE) as an octane booster. This illegal gasoline shows a high octane number and oxygen content. Further, we analyze the different types of green dyes used in illegal gasoline through high performance liquid chromatography (HPLC). We conduct component analyses on the simulated sample obtained from premium gasoline and MTBE. Finally, the illegal gasoline is defined as premium gasoline with 10% MTBE. The findings of this study suggest that illegal petroleum can be identified through an analytic method of components and simulated samples.

Web-Based Distributed Visualization System for Large Scale Geographic Data (대용량 지형 데이터를 위한 웹 기반 분산 가시화 시스템)

  • Hwang, Gyu-Hyun;Yun, Seong-Min;Park, Sang-Hun
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.6
    • /
    • pp.835-848
    • /
    • 2011
  • In this paper, we propose a client server based distributed/parallel system to effectively visualize huge geographic data. The system consists of a web-based client GUI program and a distributed/parallel server program which runs on multiple PC clusters. To make the client program run on mobile devices as well as PCs, the graphical user interface has been designed by using JOGL, the java-based OpenGL graphics library, and sending the information about current available memory space and maximum display resolution the server can minimize the amount of tasks. PC clusters used to play the role of the server access requested geographic data from distributed disks, and properly re-sample them, then send the results back to the client. To minimize the latency happened in repeatedly access the distributed stored geography data, cache data structures have been maintained in both every nodes of the server and the client.

Data Communication Prediction Model in Multiprocessors based on Robust Estimation (로버스트 추정을 이용한 다중 프로세서에서의 데이터 통신 예측 모델)

  • Jun Janghwan;Lee Kangwoo
    • The KIPS Transactions:PartA
    • /
    • v.12A no.3 s.93
    • /
    • pp.243-252
    • /
    • 2005
  • This paper introduces a noble modeling technique to build data communication prediction models in multiprocessors, using Least-Squares and Robust Estimation methods. A set of sample communication rates are collected by using a few small input data sets into workload programs. By applying estimation methods to these samples, we can build analytic models that precisely estimate communication rates for huge input data sets. The primary advantage is that, since the models depend only on data set size not on the specifications of target systems or workloads, they can be utilized to various systems and applications. In addition, the fact that the algorithmic behavioral characteristics of workloads are reflected into the models entitles them to model diverse other performance metrics. In this paper, we built models for cache miss rates which are the main causes of data communication in shared memory multiprocessor systems. The results present excellent prediction error rates; below $1\%$ for five cases out of 12, and about $3\%$ for the rest cases.

Improving the Map/Reduce Model through Data Distribution and Task Progress Scheduling (데이터 분배 및 태스크 진행 스케쥴링을 통한 맵/리듀스 모델의 성능 향상)

  • Hwang, In-Sung;Chung, Kyung-Yong;Rim, Kee-Wook;Lee, Jung-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.10
    • /
    • pp.78-85
    • /
    • 2010
  • Map/Reduce is the programing model which can implement the Cloud Computing recently has been noticed. The model operates an application program processing amount of data using a lot of computers. It is important to plan the mechanism of separating the data in proper size and distributing that to a cluster consisted of computing node in efficient for using the computing nodes very well. Besides that, planning a process of Map phases and Reduce phases also influences the performance of Map/Reduce. This paper suggests the effectively distributing scheme that separates a huge data and operates Map task in the considering the performance of computing node and network status. And we make the Reduce task can be processed quickly through the tuning the mechanism of Map and Reduce task operation. Using the two Map/Reduce sample application, we experimented the suggestion and we evaluate suggestion considered it in how impact the Map/Reduce performance.

A Study on Application of ARIMA and Neural Networks for Time Series Forecasting of Port Traffic (항만물동량 예측력 제고를 위한 ARIMA 및 인공신경망모형들의 비교 연구)

  • Shin, Chang-Hoon;Jeong, Su-Hyun
    • Journal of Navigation and Port Research
    • /
    • v.35 no.1
    • /
    • pp.83-91
    • /
    • 2011
  • The accuracy of forecasting is remarkably important to reduce total cost or to increase customer services, so it has been studied by many researchers. In this paper, the artificial neural network (ANN), one of the most popular nonlinear forecasting methods, is compared with autoregressive integrated moving average(ARIMA) model through performing a prediction of container traffic. It uses a hybrid methodology that combines both the linear ARIAM and the nonlinear ANN model to improve forecasting performance. Also, it compares the methodology with other models in performance for prediction. In designing network structure, this work specially applies the genetic algorithm which is known as the effectively optimal algorithm in the huge and complex sample space. It includes the time delayed neural network (TDNN) as well as multi-layer perceptron (MLP) which is the most popular neural network model. Experimental results indicate that both ANN and Hybrid models outperform ARIMA model.

Predictors of Mortality in Patients with COVID-19: A Systematic Review and Meta-analysis (코로나바이러스감염증-19 (COVID-19) 환자들의 사망관련 인자에 대한 연구: 체계적 문헌고찰 및 메타분석)

  • Kim, Woorim;Han, Ji Min;Lee, Kyung Eun
    • Korean Journal of Clinical Pharmacy
    • /
    • v.30 no.3
    • /
    • pp.169-176
    • /
    • 2020
  • Background: Most meta-analyses of risk factors for severe or critical outcomes in patients with COVID-19 only included studies conducted in China and this causes difficulties in generalization. Therefore, this study aimed to systematically evaluate the risk factors in patients with COVID-19 from various countries. Methods: PubMed, Embase, and Web of Science were searched for studies published on the mortality risk in patients with COVID-19 from January 1 to May 7, 2020. Pooled estimates were calculated as odds ratio (OR) with 95% confidence interval (CI) using the random-effects model. Results: We analyzed data from seven studies involving 26,542 patients in total in this systematic review and meta-analysis. Among the patients, 2,337 deaths were recorded (8.8%). Elderly patients and males showed significantly higher mortality rates than young patients and females; the OR values were 3.6 (95% CI 2.5-5.1) and 1.2 (95% CI 1.0-1.3), respectively. Among comorbidities, hypertension (OR 2.3, 95% CI 1.1-4.6), diabetes (OR 2.2, 95% CI 1.2-3.9), cardiovascular disease (OR 3.1, 95% CI 1.5-6.3), chronic obstructive pulmonary disease (OR 4.4, 95% CI 1.7-11.5), and chronic kidney disease (OR 4.2, 95% CI 2.0-8.6) were significantly associated with increased mortalities. Conclusion: This meta-analysis, involving a huge global sample, employed a systematic method for synthesizing quantitative results of studies on the risk factors for mortality in patients with COVID-19. It is helpful for clinicians to identify patients with poor prognosis and improve the allocation of health resources to patients who need them most.

Bayesian Method for Modeling Male Breast Cancer Survival Data

  • Khan, Hafiz Mohammad Rafiqullah;Saxena, Anshul;Rana, Sagar;Ahmed, Nasar Uddin
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.2
    • /
    • pp.663-669
    • /
    • 2014
  • Background: With recent progress in health science administration, a huge amount of data has been collected from thousands of subjects. Statistical and computational techniques are very necessary to understand such data and to make valid scientific conclusions. The purpose of this paper was to develop a statistical probability model and to predict future survival times for male breast cancer patients who were diagnosed in the USA during 1973-2009. Materials and Methods: A random sample of 500 male patients was selected from the Surveillance Epidemiology and End Results (SEER) database. The survival times for the male patients were used to derive the statistical probability model. To measure the goodness of fit tests, the model building criterions: Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), and Deviance Information Criteria (DIC) were employed. A novel Bayesian method was used to derive the posterior density function for the parameters and the predictive inference for future survival times from the exponentiated Weibull model, assuming that the observed breast cancer survival data follow such type of model. The Markov chain Monte Carlo method was used to determine the inference for the parameters. Results: The summary results of certain demographic and socio-economic variables are reported. It was found that the exponentiated Weibull model fits the male survival data. Statistical inferences of the posterior parameters are presented. Mean predictive survival times, 95% predictive intervals, predictive skewness and kurtosis were obtained. Conclusions: The findings will hopefully be useful in treatment planning, healthcare resource allocation, and may motivate future research on breast cancer related survival issues.