• Title/Summary/Keyword: Sequence Mining

Search Result 163, Processing Time 0.029 seconds

A Study on the Fault Process and Equipment Analysis of Plastic Ball Grid Array Manufacturing Using Data-Mining Techniques

  • Sim, Hyun Sik
    • Journal of Information Processing Systems
    • /
    • v.16 no.6
    • /
    • pp.1271-1280
    • /
    • 2020
  • The yield and quality of a micromanufacturing process are important management factors. In real-world situations, it is difficult to achieve a high yield from a manufacturing process because the products are produced through multiple nanoscale manufacturing processes. Therefore, it is necessary to identify the processes and equipment that lead to low yields. This paper proposes an analytical method to identify the processes and equipment that cause a defect in the plastic ball grid array (PBGA) during the manufacturing process using logistic regression and stepwise variable selection. The proposed method was tested with the lot trace records of a real work site. The records included the sequence of equipment that the lot had passed through and the number of faults of each type in the lot. We demonstrated that the test results reflect the real situation in a PBGA manufacturing process, and the major equipment parameters were then controlled to confirm the improvement in yield; the yield improved by approximately 20%.

A Study on Geology and Mineralization in San Luis Potosi, Mexico (멕시코 산 루이스 포토시주의 지질 및 광화작용에 대한 고찰)

  • Oh, Il Hwan;Heo, Chul Ho
    • Journal of the Korean earth science society
    • /
    • v.40 no.2
    • /
    • pp.163-176
    • /
    • 2019
  • The Potosinian geological basement in central Mexico is comprised of the Upper Paleozoic metamorphic rocks, which crop out on the Sierra de Catorce nucleus located in the northeastern part of the state. The sedimentary sequence that covers unconformably the Paelozoic basement is represented by an Upper Triassic marine sedimentary sequence, correlating to the Zacatecas Formation and the Upper Triassic continental Huizachal Formation red beds, which in turn are covered either by La Joja Formation Jurassic red beds or by Upper Jurassic marine sediments. This sequence is overlain by the conformable Cretaceous calcareous marine sedimentary rocks in all the state of San Luis Potosi. The Cenozoic sequence unconformably covers some of the aforementioned rocks and is represented by undifferentiated volcanic rocks as well as by marine clastic rocks. The existing intrusive igneous rocks are felsic to intermediate composition, and they intrude the metamorphic basement and sedimentary rocks. Conglomerates with evaporitic sediments were deposited during the Pleistocene. The Quaternary sequence includes basalt flows, piedmont deposits, alluvium, and occasionally evaporites and caliche layers. In the state of San Luis Potosi, a great diversity of mineral deposit types is known as both metallic and nonmetallic. The host rocks of these deposits vary from one another including formations that represent from Paleozoic up to Tertiary. The mineralization age corresponds approximately to Tertiary (75%), and is mainly epigenetic. Conclusively, the data on geology and mineralization in San Luis Potosi, Mexico are helpful to predict a hidden ore body and select promising mineralized zone(s) when the domestic company makes inroads in the mining sector of Mexico.

Sedimentary History and Tectonics in the Southeastern Continental Shelf of Korea based on High Resolution Shallow Seismic Data. (고해상탄성파탐사자료에 의한 한국남동대륙붕의 퇴적사 및 조구조운동)

  • Min Geon Hong;Park Yong Ahn
    • The Korean Journal of Petroleum Geology
    • /
    • v.5 no.1_2 s.6
    • /
    • pp.1-8
    • /
    • 1997
  • Seismic stratigraphic analysis of the high resolution profiles obtained from the southeastern shelf of Korea divided the deposits into 4 sequences; 1) sequence D, 2) sequence C, 3) sequence B and 4) sequence A (Holocene sediments). Sequence D was deposited in shallow-water environment at west of the Yangsan Fault as the basin subsided. On the other hand, the eastern part was formed at the slope front. Landward part of the slope-front fill sediments were eroded and redeposited nearby slope due to the syndepositional tilting of the basin. This tilting probably resulted from the continuous closing of the Ulleung Basin. Sequence C is made of stacked successions of the lowstand fluvial sediments, transgressive sediments and marine highstand sediments derived from the paleo-river in the western part of the Yangsan Fault. Sequence C in the eastern part of the Yanshan Fault was formed at the shelf break. Progradation of the lowstand sediments resulted in broadening of the shelf. Sequence C in the eastern part was also tilted but the tilting was weaker than in Sequence D. During the formation of sequence B the tilting stopped and the point source instead of the line source started in both sides of the Yangsan Fault. Sequence B was composed of the highstand systems tract partially preserved around the Yokji island, lowstand systems tract mainly preserved in the Korea Trough and transgressive systems tract. After the stop of the tilting, the force of compression due to the closing of the Ulleung Basin may be released by the strike-slip faults instead of tilting.

  • PDF

Product Recommendation System on VLDB using k-means Clustering and Sequential Pattern Technique (k-means 클러스터링과 순차 패턴 기법을 이용한 VLDB 기반의 상품 추천시스템)

  • Shim, Jang-Sup;Woo, Seon-Mi;Lee, Dong-Ha;Kim, Yong-Sung;Chung, Soon-Key
    • The KIPS Transactions:PartD
    • /
    • v.13D no.7 s.110
    • /
    • pp.1027-1038
    • /
    • 2006
  • There are many technical problems in the recommendation system based on very large database(VLDB). So, it is necessary to study the recommendation system' structure and the data-mining technique suitable for the large scale Internet shopping mail. Thus we design and implement the product recommendation system using k-means clustering algorithm and sequential pattern technique which can be used in large scale Internet shopping mall. This paper processes user information by batch processing, defines the various categories by hierarchical structure, and uses a sequential pattern mining technique for the search engine. For predictive modeling and experiment, we use the real data(user's interest and preference of given category) extracted from log file of the major Internet shopping mall in Korea during 30 days. And we define PRP(Predictive Recommend Precision), PRR(Predictive Recommend Recall), and PF1(Predictive Factor One-measure) for evaluation. In the result of experiments, the best recommendation time and the best learning time of our system are much as O(N) and the values of measures are very excellent.

The Training Data Generation and a Technique of Phylogenetic Tree Generation using Decision Tree (트레이닝 데이터 생성과 의사 결정 트리를 이용한 계통수 생성 방법)

  • Chae, Deok-Jin;Sin, Ye-Ho;Cheon, Tae-Yeong;Go, Heung-Seon;Ryu, Geun-Ho;Hwang, Bu-Hyeon
    • The KIPS Transactions:PartD
    • /
    • v.10D no.6
    • /
    • pp.897-906
    • /
    • 2003
  • The traditional animal phylogenetic tree is to align the body structure of the animal phylums from simple to complex based on the initial development character. Currently, molecular systematics research based on the molecular, it is on the fly, is again estimating prior trend and show the new genealogy and interest of the evolution. In this paper, we generate the training set which is obtained from a DNA sequence ans apply to the classification. We made use of the mitochondrial DNA for the experiment, and then proved the accuracy using the MEGA program which is anaysis program, it is used in the biology field. Although the result of the mining has to proved through biological experiment, it can provede the methodology for the efficient classify and can reduce the time and effort to the experiment.

The Development and Selection of SSR Markers for Identification of Peanut (Arachis hypogaea L.) Varieties in Korea

  • Han, Sang-Ik;Bae, Suk-Bok;Ha, Tae Joung;Lee, Myong-Hee;Jang, Ki-Chang;Seo, Woo-Duck;Park, Geum-Yong;Kang, Hang-Won
    • Korean Journal of Breeding Science
    • /
    • v.43 no.2
    • /
    • pp.133-138
    • /
    • 2011
  • The groundnut or cultivated peanut (Arachis hypogaea L.) in Korea consists of 36 domestic varieties which have been developed and registered as cultivars for the public during last 25 years. To screen and identify of Korean peanut varieties and genetic resources, we present a simple and reliable method. A methodology based on simple sequence repeat (SSR) markers developed and widely used for prominent gene identification and variety discrimination. For identification of those 36 Korean peanut varieties, 238 unique peanut SSR markers were selected from some previously reported results, synthesized and used for polymerase chain reaction (PCR). Data were taken through acryl amide gel electrophoresis and changed into proper formats for application of data mining analysis using Biomine (all-in-one functional genomics data mining program). Consequently, twelve SSR primers were investigated and revealed the differences between those 36 varieties. These primer pairs amplified 27 alleles with an average of 2.3 allele per primer pair. In addition, those results showed genetic relationship by classification method within 36 varieties. The approach described here could be applied to monitoring of our varieties and adapting to peanut breeding program.

Evaluation and Genome Mining of Bacillus stercoris Isolate B.PNR1 as Potential Agent for Fusarium Wilt Control and Growth Promotion of Tomato

  • Rattana Pengproh;Thanwanit Thanyasiriwat;Kusavadee Sangdee;Juthaporn Saengprajak;Praphat Kawicha;Aphidech Sangdee
    • The Plant Pathology Journal
    • /
    • v.39 no.5
    • /
    • pp.430-448
    • /
    • 2023
  • Recently, strategies for controlling Fusarium oxysporum f. sp. lycopersici (Fol), the causal agent of Fusarium wilt of tomato, focus on using effective biocontrol agents. In this study, an analysis of the biocontrol and plant growth promoting (PGP) attributes of 11 isolates of loamy soil Bacillus spp. has been conducted. Among them, the isolates B.PNR1 and B.PNR2 inhibited the mycelial growth of Fol by inducing abnormal fungal cell wall structures and cell wall collapse. Moreover, broad-spectrum activity against four other plant pathogenic fungi, F. oxysporum f. sp. cubense race 1 (Foc), Sclerotium rolfsii, Colletotrichum musae, and C. gloeosporioides were noted for these isolates. These two Bacillus isolates produced indole acetic acid, phosphate solubilization enzymes, and amylolytic and cellulolytic enzymes. In the pot experiment, the culture filtrate from B.PNR1 showed greater inhibition of the fungal pathogens and significantly promoted the growth of tomato plants more than those of the other treatments. Isolate B.PNR1, the best biocontrol and PGP, was identified as Bacillus stercoris by its 16S rRNA gene sequence and whole genome sequencing analysis (WGS). The WGS, through genome mining, confirmed that the B.PNR1 genome contained genes/gene cluster of a nonribosomal peptide synthetase/polyketide synthase, such as fengycin, surfactin, bacillaene, subtilosin A, bacilysin, and bacillibactin, which are involved in antagonistic and PGP activities. Therefore, our finding demonstrates the effectiveness of B. stercoris strain B.PNR1 as an antagonist and for plant growth promotion, highlighting the use of this microorganism as a biocontrol agent against the Fusarium wilt pathogen and PGP abilities in tomatoes.

An Efficient Subsequence Matching Method Based on Index Interpolation (인덱스 보간법에 기반한 효율적인 서브시퀀스 매칭 기법)

  • Loh Woong-Kee;Kim Sang-Wook
    • The KIPS Transactions:PartD
    • /
    • v.12D no.3 s.99
    • /
    • pp.345-354
    • /
    • 2005
  • Subsequence matching is one of the most important operations in the field of data mining. The existing subsequence matching algorithms use only one index, and their performance gets worse as the difference between the length of a query sequence and the site of windows, which are subsequences of a same length extracted from data sequences to construct the index, increases. In this paper, we propose a new subsequence matching method based on index interpolation to overcome such a problem. An index interpolation method constructs two or more indexes, and performs search ing by selecting the most appropriate index among them according to the given query sequence length. In this paper, we first examine the performance trend with the difference between the query sequence length and the window size through preliminary experiments, and formulate a search cost model that reflects the distribution of query sequence lengths in the view point of the physical database design. Next, we propose a new subsequence matching method based on the index interpolation to improve search performance. We also present an algorithm based on the search cost formula mentioned above to construct optimal indexes to get better search performance. Finally, we verify the superiority of the proposed method through a series of experiments using real and synthesized data sets.

The extension of the largest generalized-eigenvalue based distance metric Dij1) in arbitrary feature spaces to classify composite data points

  • Daoud, Mosaab
    • Genomics & Informatics
    • /
    • v.17 no.4
    • /
    • pp.39.1-39.20
    • /
    • 2019
  • Analyzing patterns in data points embedded in linear and non-linear feature spaces is considered as one of the common research problems among different research areas, for example: data mining, machine learning, pattern recognition, and multivariate analysis. In this paper, data points are heterogeneous sets of biosequences (composite data points). A composite data point is a set of ordinary data points (e.g., set of feature vectors). We theoretically extend the derivation of the largest generalized eigenvalue-based distance metric Dij1) in any linear and non-linear feature spaces. We prove that Dij1) is a metric under any linear and non-linear feature transformation function. We show the sufficiency and efficiency of using the decision rule $\bar{{\delta}}_{{\Xi}i}$(i.e., mean of Dij1)) in classification of heterogeneous sets of biosequences compared with the decision rules min𝚵iand median𝚵i. We analyze the impact of linear and non-linear transformation functions on classifying/clustering collections of heterogeneous sets of biosequences. The impact of the length of a sequence in a heterogeneous sequence-set generated by simulation on the classification and clustering results in linear and non-linear feature spaces is empirically shown in this paper. We propose a new concept: the limiting dispersion map of the existing clusters in heterogeneous sets of biosequences embedded in linear and nonlinear feature spaces, which is based on the limiting distribution of nucleotide compositions estimated from real data sets. Finally, the empirical conclusions and the scientific evidences are deduced from the experiments to support the theoretical side stated in this paper.

Mining the Proteome of Fusobacterium nucleatum subsp. nucleatum ATCC 25586 for Potential Therapeutics Discovery: An In Silico Approach

  • Habib, Abdul Musaweer;Islam, Md. Saiful;Sohel, Md.;Mazumder, Md. Habibul Hasan;Sikder, Mohd. Omar Faruk;Shahik, Shah Md.
    • Genomics & Informatics
    • /
    • v.14 no.4
    • /
    • pp.255-264
    • /
    • 2016
  • The plethora of genome sequence information of bacteria in recent times has ushered in many novel strategies for antibacterial drug discovery and facilitated medical science to take up the challenge of the increasing resistance of pathogenic bacteria to current antibiotics. In this study, we adopted subtractive genomics approach to analyze the whole genome sequence of the Fusobacterium nucleatum, a human oral pathogen having association with colorectal cancer. Our study divulged 1,499 proteins of F. nucleatum, which have no homolog's in human genome. These proteins were subjected to screening further by using the Database of Essential Genes (DEG) that resulted in the identification of 32 vitally important proteins for the bacterium. Subsequent analysis of the identified pivotal proteins, using the Kyoto Encyclopedia of Genes and Genomes (KEGG) Automated Annotation Server (KAAS) resulted in sorting 3 key enzymes of F. nucleatum that may be good candidates as potential drug targets, since they are unique for the bacterium and absent in humans. In addition, we have demonstrated the three dimensional structure of these three proteins. Finally, determination of ligand binding sites of the 2 key proteins as well as screening for functional inhibitors that best fitted with the ligands sites were conducted to discover effective novel therapeutic compounds against F. nucleatum.