• 제목/요약/키워드: huge sample

검색결과 64건 처리시간 0.025초

EST Analysis system for panning gene

  • Hur, Cheol-Goo;Lim, So-Hyung;Goh, Sung-Ho;Shin, Min-Su;Cho, Hwan-Gue
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2000년도 International Symposium on Bioinformatics
    • /
    • pp.21-22
    • /
    • 2000
  • Expressed sequence tags (EFTs) are the partial segments of cDNA produced from 5 or 3 single-pass sequencing of cDNA clones, error-prone and generated in highly redundant sets. Advancement and expansion of Genomics made biologists to generate huge amount of ESTs from variety of organisms-human, microorganisms as well as plants, and the cumulated number of ESTs is over 5.3 million, As the EST data being accumulate more rapidly, it becomes bigger that the needs of the EST analysis tools for extraction of biological meaning from EST data. Among the several needs of EST analyses, the extraction of protein sequence or functional motifs from ESTs are important for the identification of their function in vivo. To accomplish that purpose the precise and accurate identification of the region where the coding sequences (CDSs) is a crucial problem to solve primarily, and it will be helpful to extract and detect of genuine CD5s and protein motifs from EST collections. Although several public tools are available for EST analysis, there is not any one to accomplish the object. Furthermore, they are not targeted to the plant ESTs but human or microorganism. Thus, to correspond the urgent needs of collaborators deals with plant ESTs and to establish the analysis system to be used as general-purpose public software we constructed the pipelined-EST analysis system by integration of public software components. The software we used are as follows - Phred/Cross-match for the quality control and vector screening, NCBI Blast for the similarity searching, ICATools for the EST clustering, Phrap for EST contig assembly, and BLOCKS/Prosite for protein motif searching. The sample data set used for the construction and verification of this system was 1,386 ESTs from human intrathymic T-cells that verified using UniGene and Nr database of NCBI. The approach for the extraction of CDSs from sample data set was carried out by comparison between sample data and protein sequences/motif database, determining matched protein sequences/motifs that agree with our defined parameters, and extracting the regions that shows similarities. In recent future, in addition to these components, it is supposed to be also integrated into our system and served that the software for the peptide mass spectrometry fingerprint analysis, one of the proteomics fields. This pipelined-EST analysis system will extend our knowledge on the plant ESTs and proteins by identification of unknown-genes.

  • PDF

A Method for Observation of Benign, Premalignant and Malignant Changes in Clinical Skin Tissue Samples via FT -IR Microspectroscopy

  • Skrebova, Natalja;Aizawa, Katsuo;Ozaki, Yukihiro;Arase, Seiji
    • Journal of Photoscience
    • /
    • 제9권2호
    • /
    • pp.457-459
    • /
    • 2002
  • Sunlight causes various types of adverse skin changes on the sun-exposed areas of the skin, in which the most hazardous one is the induction of malignant skin tumours. FT -IR spectra were obtained from specimens excised from normal skin, BCCs, SCCs, MMs, nevi, lesions of solar keratosis and Bowen's disease. Tissue samples from freshly frozen specimens were cut into 2 sections in strictly sequential order to be stained with H & E for histopathological analysis, and then to be air-dried on CaF$_2$ slide glasses for further spectral data acquisition from defined area of interest. Intra- and inter-sample variations were estimated within grouped lesion categories according to each skin component. Mean spectra for each type of tissue pathology in the 800-1800 $cm^{-1}$ / region was interpreted using the classical group frequency approach that showed the most visible differences in spectra of benign, premalignant and malignant changes directly related to protein conformation and nucleic acid bases. The relative intensity of the nucleic acid peak was increased with progression to malignancy. In addition, PCA was able to evaluate and maximise the differences in the spectra by reducing the number of variables characterizing each patient and pathology category. This type of approach to non-destructively estimate the complexity of IR-spectra of inhomogeneous samples such as skin demonstrates the advantage of FT -IR microspectroscopy to be able to observe diseased states (benign, premalignant, malignant) and distinguish them from normal against a huge background of inter- and intra-subject variability.

  • PDF

Assessment of Carbon Sequestration Potential in Degraded and Non-Degraded Community Forests in Terai Region of Nepal

  • Joshi, Rajeev;Singh, Hukum;Chhetri, Ramesh;Yadav, Karan
    • Journal of Forest and Environmental Science
    • /
    • 제36권2호
    • /
    • pp.113-121
    • /
    • 2020
  • This study was carried out in degraded and non-degraded community forests (CF) in the Terai region of Kanchanpur district, Nepal. A total of 63 concentric sample plots each of 500 ㎡ was laid in the inventory for estimating above and below-ground biomass of forests by using systematic random sampling with a sampling intensity of 0.5%. Mallotus philippinensis and Shorea robusta were the most dominant species in degraded and non-degraded CF accounting Importance Value Index (I.V.I) of 97.16 and 178.49, respectively. Above-ground tree biomass carbon in degraded and non-degraded community forests was 74.64±16.34 t ha-1 and 163.12±20.23 t ha-1, respectively. Soil carbon sequestration in degraded and non-degraded community forests was 42.55±3.10 t ha-1 and 54.21±3.59 t ha-1, respectively. Hence, the estimated total carbon stock was 152.68±22.95 t ha-1 and 301.08±27.07 t ha-1 in degraded and non-degraded community forests, respectively. It was found that the carbon sequestration in the non-degraded community forest was 1.97 times higher than in the degraded community forest. CO2 equivalent in degraded and non-degraded community forests was 553 t ha-1 and 1105 t ha-1, respectively. Statistical analysis showed a significant difference between degraded and non-degraded community forests in terms of its total biomass and carbon sequestration potential (p<0.05). Studies indicate that the community forest has huge potential and can reward economic benefits from carbon trading to benefit from the REDD+/CDM mechanism by promoting the sustainable conservation of community forests.

가짜휘발유 판정을 위한 성분 분석 (Analysis of Component for Determining Illegal Gasoline)

  • 임영관;원기요;강병석;박소휘;정성;고영훈;김성수;정길형
    • Tribology and Lubricants
    • /
    • 제36권3호
    • /
    • pp.161-167
    • /
    • 2020
  • Petroleum is the most used energy source in Korea with a usage rate of 39.5% among the available 1st energy source. The price of liquid petroleum products in Korea includes a lot of tax such as transportation·environment·energy tax. Thus, illegal production and distribution of liquid petroleum is widespread because of its huge price difference, including its tax-free nature, from that of the normal product. Generally, illegal petroleum product is produced by illegally mixing liquid petroleum with other similar petroleum alternatives. In such case, it is easy to distinguish whether the product is illegal by analyzing its physical properties and typical components. However, if one the components of original petroleum product is added to illegal petroleum, distinguishing between the two petroleum products will be difficult. In this research, we inspect illegally produced gasoline, which is mixed with methyl tertiary butyl ether (MTBE) as an octane booster. This illegal gasoline shows a high octane number and oxygen content. Further, we analyze the different types of green dyes used in illegal gasoline through high performance liquid chromatography (HPLC). We conduct component analyses on the simulated sample obtained from premium gasoline and MTBE. Finally, the illegal gasoline is defined as premium gasoline with 10% MTBE. The findings of this study suggest that illegal petroleum can be identified through an analytic method of components and simulated samples.

대용량 지형 데이터를 위한 웹 기반 분산 가시화 시스템 (Web-Based Distributed Visualization System for Large Scale Geographic Data)

  • 황규현;윤성민;박상훈
    • 한국멀티미디어학회논문지
    • /
    • 제14권6호
    • /
    • pp.835-848
    • /
    • 2011
  • 본 논문에서는 방대한 지형 데이터의 효과적 가시화를 위한 클라이언스-서버 기반의 분산/병렬 시스템을 제안한다. 이 시스템은 웹 기반으로 수행되는 클라이언트 GUI 프로그램과 복수의 PC 클러스터에서 구동되는 분산/병렬 서버 프로그램으로 구성된다. PC 뿐만 아니라 모바일 기기에서도 클라이언트 프로그램이 수행될 수 있도록 자바 기반의 OpenGL 그래픽스 라이브러리인 JOGL을 사용하여 GUI를 설계하였으며, 사용하는 기기의 현재 사용 가능한 메모리 크기와 화면의 최대 해상도 정보를 서버에게 전달하여 서버의 작업을 최소화하였다. 서버로 사용된 PC 클러스터는 분산된 지형 데이터를 액세스하고 이를 클라이언트로부터 받은 정보에 따라 적절히 리샘플링 한 후 이를 다시 전송하는 작업을 담당한다. 서버의 각 노드들뿐만 아니라 클라이언트까지 캐시 자료구조를 유지함으로써 분산된 방대한 지형 데이터의 반복 접근 시 발생되는 지연 시간을 최소화하도록 설계하였다.

로버스트 추정을 이용한 다중 프로세서에서의 데이터 통신 예측 모델 (Data Communication Prediction Model in Multiprocessors based on Robust Estimation)

  • 전장환;이강우
    • 정보처리학회논문지A
    • /
    • 제12A권3호
    • /
    • pp.243-252
    • /
    • 2005
  • 본 논문에서는 최소제곱 추정기법과 로버스트 추정기법을 사용하여 다중 프로세서 시스템에서의 데이터 통신의 빈도를 모델링하는 방법을 제안한다. 몇 가지의 서로 다른 크기의 작은 입력 데이터들을 작업부하 프로그램에 부과하여 그때마다의 통신 빈도를 측정하고, 이 측정된 값들에 두 가지 통계적 추정기법을 순차적으로 적용함으로써 통신 빈도를 정확히 예측할 수 있는 모델을 구축하는 방법이다. 이 모델링 기법은 작업부하나 목표시스템의 구조적인 사양에 무관하게 입력 데이터의 크기에만 의존하므로 다양한 작업부하와 목표시스템에 대하여 그대로 적용할 수 있는 장점이 있다. 또한 목표시스템에서 작업부하의 알고리즘적 동적특성이 수학적인 공식으로 반영되므로 데이터 통신이외의 성능 데이터를 모델링하는 데에도 적용할 수 있다. 본 논문에서는 대표적인 다중 프로세서인 공유메모리 시스템에서 데이터 통신을 유발하는 핵심 요소인 캐시접근실패의 빈도에 대한 모델을 구하였으며, 12번의 실험 중 5번의 경우에는 $1\%$ 미만, 나머지 경우에는 $3\%$ 내외의 대단히 정확한 예측 오차율을 보였다.

데이터 분배 및 태스크 진행 스케쥴링을 통한 맵/리듀스 모델의 성능 향상 (Improving the Map/Reduce Model through Data Distribution and Task Progress Scheduling)

  • 황인성;정경용;임기욱;이정현
    • 한국콘텐츠학회논문지
    • /
    • 제10권10호
    • /
    • pp.78-85
    • /
    • 2010
  • Map/Reduce 는 최근에 많은 주목을 받고 있는 클라우드 컴퓨팅을 구현하는 프로그래밍 모델이다. 이 모델은 여러 대의 컴퓨터를 이용해서 규모가 큰 데이터를 처리하는 어플리케이션에서 사용된다. 따라서 구성된 컴퓨터들을 효율적으로 사용하기 위해서 데이터를 적당한 크기로 나눈 다음 각각의 컴퓨터에 효율적으로 분배시키는 과정을 결정하는 것이 중요하다. 또한 모델을 구성하고 있는 Map 단계와 Reduce 단계를 실행하는 계획도 성능에 많은 영향을 줄 수 있다. 본 논문에서는 대용량의 데이터를 분리해서 Map 태스크를 실행하는 클라우드 컴퓨팅 노드의 성능과 네트워크의 상태를 고려한 후 각각의 컴퓨팅 노드에게 효율적으로 분배하는 방법을 제안한다. 그리고 Map 단계와 Reduce 단계에서 진행하는 방식을 튜닝하여 Reduce 작업의 처리속도를 향상시켰다. 제안된 방법은 대표적인 두 개의 Map/Reduce 어플리케이션을 이용하여 실험하고 조건에 따라 성능에 어떠한 결과를 미치는지 평가했다.

항만물동량 예측력 제고를 위한 ARIMA 및 인공신경망모형들의 비교 연구 (A Study on Application of ARIMA and Neural Networks for Time Series Forecasting of Port Traffic)

  • 신창훈;정수현
    • 한국항해항만학회지
    • /
    • 제35권1호
    • /
    • pp.83-91
    • /
    • 2011
  • 예측의 정확성은 비용의 감소나 고객서비스의 제고를 위해 필수적으로 선행되어야 하기에 현재까지도 많은 연구자들에 의해 연구되고 있는 분야이다. 본 연구에서는 국내 항만의 컨테이너 물동량 예측에 있어 대표적인 비선형예측모형인 인공신경망모형과 ARIMA모형에 대한 비교연구를 수행하는데 목적을 두었고, 컨테이너 물동량 예측력 제고를 위해 ARIMA모형과 인공신경망(ANN)모형을 결합한 하이브리드모형을 사용해 다른 모형들과 예측성과를 비교하고자 한다. 특히 인공신경망모형의 네트워크 구조 설계에 부분에 있어 방대하며 복잡한 탐색공간에서도 전역해 찾기에 효과적인 기법으로 알려져 있는 유전알고리즘을 사용함과 동시에 인공신경망의 대표적인 모형으로 알려진 다층 퍼셉트론(MLP)뿐만 아니라 시간지연네트워크(TDNN)를 사용해 예측성과를 비교하였다. 그 결과 ANN모형과 하이브리드모형이 ARIMA모형보다 더 뛰어난 예측성과를 보이는 것으로 나왔다.

코로나바이러스감염증-19 (COVID-19) 환자들의 사망관련 인자에 대한 연구: 체계적 문헌고찰 및 메타분석 (Predictors of Mortality in Patients with COVID-19: A Systematic Review and Meta-analysis)

  • 김우림;한지민;이경은
    • 한국임상약학회지
    • /
    • 제30권3호
    • /
    • pp.169-176
    • /
    • 2020
  • Background: Most meta-analyses of risk factors for severe or critical outcomes in patients with COVID-19 only included studies conducted in China and this causes difficulties in generalization. Therefore, this study aimed to systematically evaluate the risk factors in patients with COVID-19 from various countries. Methods: PubMed, Embase, and Web of Science were searched for studies published on the mortality risk in patients with COVID-19 from January 1 to May 7, 2020. Pooled estimates were calculated as odds ratio (OR) with 95% confidence interval (CI) using the random-effects model. Results: We analyzed data from seven studies involving 26,542 patients in total in this systematic review and meta-analysis. Among the patients, 2,337 deaths were recorded (8.8%). Elderly patients and males showed significantly higher mortality rates than young patients and females; the OR values were 3.6 (95% CI 2.5-5.1) and 1.2 (95% CI 1.0-1.3), respectively. Among comorbidities, hypertension (OR 2.3, 95% CI 1.1-4.6), diabetes (OR 2.2, 95% CI 1.2-3.9), cardiovascular disease (OR 3.1, 95% CI 1.5-6.3), chronic obstructive pulmonary disease (OR 4.4, 95% CI 1.7-11.5), and chronic kidney disease (OR 4.2, 95% CI 2.0-8.6) were significantly associated with increased mortalities. Conclusion: This meta-analysis, involving a huge global sample, employed a systematic method for synthesizing quantitative results of studies on the risk factors for mortality in patients with COVID-19. It is helpful for clinicians to identify patients with poor prognosis and improve the allocation of health resources to patients who need them most.

Bayesian Method for Modeling Male Breast Cancer Survival Data

  • Khan, Hafiz Mohammad Rafiqullah;Saxena, Anshul;Rana, Sagar;Ahmed, Nasar Uddin
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제15권2호
    • /
    • pp.663-669
    • /
    • 2014
  • Background: With recent progress in health science administration, a huge amount of data has been collected from thousands of subjects. Statistical and computational techniques are very necessary to understand such data and to make valid scientific conclusions. The purpose of this paper was to develop a statistical probability model and to predict future survival times for male breast cancer patients who were diagnosed in the USA during 1973-2009. Materials and Methods: A random sample of 500 male patients was selected from the Surveillance Epidemiology and End Results (SEER) database. The survival times for the male patients were used to derive the statistical probability model. To measure the goodness of fit tests, the model building criterions: Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), and Deviance Information Criteria (DIC) were employed. A novel Bayesian method was used to derive the posterior density function for the parameters and the predictive inference for future survival times from the exponentiated Weibull model, assuming that the observed breast cancer survival data follow such type of model. The Markov chain Monte Carlo method was used to determine the inference for the parameters. Results: The summary results of certain demographic and socio-economic variables are reported. It was found that the exponentiated Weibull model fits the male survival data. Statistical inferences of the posterior parameters are presented. Mean predictive survival times, 95% predictive intervals, predictive skewness and kurtosis were obtained. Conclusions: The findings will hopefully be useful in treatment planning, healthcare resource allocation, and may motivate future research on breast cancer related survival issues.