• Title/Summary/Keyword: Large tag data

Search Result 67, Processing Time 0.027 seconds

MarSel : The LD-based Marker Selection System for the Large-scale Datasets (MarSel : Large-scale Dataset에 대한 LD기반의 Marker 선택 시스템)

  • 김상준;여상수;김성권
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10b
    • /
    • pp.253-255
    • /
    • 2004
  • 인간(human)에게 나타나는 다양성(variation)은 인체의 유전체(genome) 안에서 발생된 SNP(Single Nucleotide Polymorphism)에 의해 나타난다고 알려져 있다. 유전체내의 SNP과 다양성에 대한 연관 연구(Associate study)를 할 때에 약 30여 억 개로 추정되는 염기서열(DNA sequence)물 모두 분석한다면 많은 비용과 시간을 필요로 할 것이다. 이런 비용과 시간을 줄이기 위친 적은 수의 대표 SNP(=tagSNP)을 찾는 연구가 현재 진행 중이다. 우리는 LD계수|D;|을 block 분할에 이용하여 생물학적인 의미를 부여한 후, 전산적인 최적해를 찾는 접근을 이용했다. 또한, 기존 연구에서는 large-scale data에 대한 처리가 불가능해서 chromosome의 일부분의 데이터에 대해서안 분석이 시도되었다. 더욱 광범위한 분석을 위해서 chromosome 단위의 처리가 필요하다. 우리는 chromosome단위의 SNP data를 한 번에 처리가 가능한 시스템인 MarSel를 구현하였다

  • PDF

Recent Advances in DNA Sequencing by End-labeled Free-Solution Electrophoresis (ELFSE)

  • Won, Jong-In
    • Biotechnology and Bioprocess Engineering:BBE
    • /
    • v.11 no.3
    • /
    • pp.179-186
    • /
    • 2006
  • End-Labeled Free-Solution Electrophoresis (ELFSE) is a new technique that is a promising bioconjugate method for DNA sequencing (or separation) and genotyping by both capillary and microfluidic device electrophoresis. Because ELFSE enables high-resolution electrophoretic separation in aqueous buffer alone (i.e., without a polymer matrix), it eliminates the need to load viscous polymer networks into electrophoresis microchannels. To achieve microchannel DNA separations with high performance, ELFSE requires monodisperse perturbing entities (i.e., drag-tags), which create a large amount of frictional drag when pulled behind DNA during free-solution electrophoresis, and which have other properties suitable for microchannel electrophoresis. In this article, the theoretical concepts of ELFSE and the required characteristics of the drag-tag molecules for the ultimate performance of ELFSE are reviewed. Additionally, the merits and limitations of current drag-tags are also discussed in the context of recent experimental data of ELFSE separation (or sequencing).

Similar Contents Recommendation Model Based On Contents Meta Data Using Language Model (언어모델을 활용한 콘텐츠 메타 데이터 기반 유사 콘텐츠 추천 모델)

  • Donghwan Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.27-40
    • /
    • 2023
  • With the increase in the spread of smart devices and the impact of COVID-19, the consumption of media contents through smart devices has significantly increased. Along with this trend, the amount of media contents viewed through OTT platforms is increasing, that makes contents recommendations on these platforms more important. Previous contents-based recommendation researches have mostly utilized metadata that describes the characteristics of the contents, with a shortage of researches that utilize the contents' own descriptive metadata. In this paper, various text data including titles and synopses that describe the contents were used to recommend similar contents. KLUE-RoBERTa-large, a Korean language model with excellent performance, was used to train the model on the text data. A dataset of over 20,000 contents metadata including titles, synopses, composite genres, directors, actors, and hash tags information was used as training data. To enter the various text features into the language model, the features were concatenated using special tokens that indicate each feature. The test set was designed to promote the relative and objective nature of the model's similarity classification ability by using the three contents comparison method and applying multiple inspections to label the test set. Genres classification and hash tag classification prediction tasks were used to fine-tune the embeddings for the contents meta text data. As a result, the hash tag classification model showed an accuracy of over 90% based on the similarity test set, which was more than 9% better than the baseline language model. Through hash tag classification training, it was found that the language model's ability to classify similar contents was improved, which demonstrated the value of using a language model for the contents-based filtering.

Study on Tag, Trust and Probability Matrix Factorization Based Social Network Recommendation

  • Liu, Zhigang;Zhong, Haidong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.5
    • /
    • pp.2082-2102
    • /
    • 2018
  • In recent years, social network related applications such as WeChat, Facebook, Twitter and so on, have attracted hundreds of millions of people to share their experience, plan or organize, and attend social events with friends. In these operations, plenty of valuable information is accumulated, which makes an innovative approach to explore users' preference and overcome challenges in traditional recommender systems. Based on the study of the existing social network recommendation methods, we find there is an abundant information that can be incorporated into probability matrix factorization (PMF) model to handle challenges such as data sparsity in many recommender systems. Therefore, the research put forward a unified social network recommendation framework that combine tags, trust between users, ratings with PMF. The uniformed method is based on three existing recommendation models (SoRecUser, SoRecItem and SoRec), and the complexity analysis indicates that our approach has good effectiveness and can be applied to large-scale datasets. Furthermore, experimental results on publicly available Last.fm dataset show that our method outperforms the existing state-of-art social network recommendation approaches, measured by MAE and MRSE in different data sparse conditions.

Implementation & Verification of RFID Gen2 Protocol on FPGA Prototyping board (FPGA를 이용한 RFID Gen2 protocol의 구현 및 검증)

  • Je, Young-Dai;Kim, Jae-Lim;Jang, Il-Su;Yang, Hoon-Gee
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.05a
    • /
    • pp.869-872
    • /
    • 2008
  • This paper presents the VHDL implementation procedure of the passive RFID tag in Ultra High Frequency RFID system. The operation of the tag compatible with the EPCglobal Class1 Generation2(GEN2) protocol is verified by timing simulation after synthesis and implementation on prototyping board. Due to the reading range with relatively large distance, a passive tag needs digital processor which facilitates faster decoding, encoding and state transition for enhancement of the interrogation rate. Also with UART communication, verify a inventory Round in Gen2 Protocol. The verification results with the fastest data rate, 640kbps, and multi tags environment scenario show that the implemented tag spend 1.4ms transmitting the 96bits EPC to reader.

  • PDF

Development of Long-Range RFID Reader System supporting Sensor Tag (센서태그를 지원하는 장거리 RFID 리더 시스템 개발)

  • Shin, Dong-Beom;Kim, Dae-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.6C
    • /
    • pp.626-633
    • /
    • 2009
  • ISO/IEC/WD 24753 defines new modem specifications for a long-range RFID communications and application protocol for a sensor tag system. According to the standard, the frequency offset of the tag is 4%. In general wireless communications systems, it is known that a coherent receiver is superior to a non-coherent receiver. However, if the frequency offset is large, it is difficult to restore the original data accurately with a coherent receiver, and the performance of a coherent receiver is easily degraded. In this paper, a non-coherent receiver structure is adopted to solve the frequency offset problem of long-range RFID communications. We designed a frequency estimation block to find an optimal frequency from the received signal with 4% frequency offset and proposed a start frame delimiter (SFD) detection algorithm to determine the start position of the payload. The frequency estimation block finds the optimal frequency from the received signal using 9-correlators. And the SFD detection block searches the received signal to find the start position of the payload with dual correlator. We implemented a long-range RFID reader with the proposed methods and evaluated its performance in a wired/wireless test network. The implemented long-range RFID reader showed more superior performance than the commercial RFID reader in terms of recognition range.

HTML Text Extraction Using Tag Path and Text Appearance Frequency (태그 경로 및 텍스트 출현 빈도를 이용한 HTML 본문 추출)

  • Kim, Jin-Hwan;Kim, Eun-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.12
    • /
    • pp.1709-1715
    • /
    • 2021
  • In order to accurately extract the necessary text from the web page, the method of specifying the tag and style attributes where the main contents exist to the web crawler has a problem in that the logic for extracting the main contents. This method needs to be modified whenever the web page configuration is changed. In order to solve this problem, the method of extracting the text by analyzing the frequency of appearance of the text proposed in the previous study had a limitation in that the performance deviation was large depending on the collection channel of the web page. Therefore, in this paper, we proposed a method of extracting texts with high accuracy from various collection channels by analyzing not only the frequency of appearance of text but also parent tag paths of text nodes extracted from the DOM tree of web pages.

Tracking of Yellowtail Seriola quinqueradiata Migration Using Pop-up Satellite Archival Tag (PSAT) and Oceanic Environments Data (위성전자표지와 해양환경자료를 이용한 방어(Seriola quinqueradiata) 이동경로 추적 연구)

  • Kim, Changsin;Yang, Jigwan;Kang, Sujin;Lee, Seung-Jong;Kang, Sukyung
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.54 no.5
    • /
    • pp.787-797
    • /
    • 2021
  • Yellowtail Seriola quinqueradiata tagged with a Pop-up Satellite Archival Tag (PSAT) was released off the coast of near the Moseulpo, Jeju Island and the ecological data during about 40 days was obtained. However, it is difficult to determine the spatial location of underwater ecological data. To improve the accuracy of estimating the Yellowtail migration route using temperature, suitable background field of the oceanic environment data was evaluated and used for input data. After developing of the tracking algorithm for migration route estimation, three experiment cases were estimated with ecological data among the surface layer, the mixed layer, and the whole water column. All tracking experiments move from western to eastern Jeju Island. Additionally, tracking experiment using 3D ocean numerical model reveal that it is possible to estimate the migration route using the fish ecological data of the entire water column. Therefore, using a large number of ecological data and a high-accuracy ocean numerical model to estimate the migration route seems to be a way to increase the accuracy of the tracking experiment. Moreover, the tracking algorithm of this study can be applied to small pelagic fishery using small archival electronic tags to track the migration route.

RFID Middleware System based on XML for Processing Large-Scale Data (대용량 데이터처리를 위한 XML기반의 RFID 미들웨어시스템)

  • Park, Byoung-Seob
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.7
    • /
    • pp.31-38
    • /
    • 2007
  • We implement the RFID middleware system based on XML for large-scale data processing. The Implemented middleware system are consist of the reader interface for tag data collection, the event manager for a data filtering, and application interface for the RFID application. The implemented RFID middleware system is to support both a fixed type's reader and portable type's reader. we analyze the middleware function with four application accessing protocol, HTTP, XML, JMS, and SOAP, and demonstrate a filtering speed in terms of CPU utilization.

Applied Computational Tools for Crop Genome Research

  • Love Christopher G;Batley Jacqueline;Edwards David
    • Journal of Plant Biotechnology
    • /
    • v.5 no.4
    • /
    • pp.193-195
    • /
    • 2003
  • A major goal of agricultural biotechnology is the discovery of genes or genetic loci which are associated with characteristics beneficial to crop production. This knowledge of genetic loci may then be applied to improve crop breeding. Agriculturally important genes may also benefit crop production through transgenic technologies. Recent years have seen an application of high throughput technologies to agricultural biotechnology leading to the production of large amounts of genomic data. The challenge today is the effective structuring of this data to permit researchers to search, filter and importantly, make robust associations within a wide variety of datasets. At the Plant Biotechnology Centre, Primary Industries Research Victoria in Melbourne, Australia, we have developed a series of tools and computational pipelines to assist in the processing and structuring of genomic data to aid its application to agricultural biotechnology resear-ch. These tools include a sequence database, ASTRA, for the processing and annotation of expressed sequence tag data. Tools have also been developed for the discovery of simple sequence repeat (SSR) and single nucleotide polymorphism (SNP) molecular markers from large sequence datasets. Application of these tools to Brassica research has assisted in the production of genetic and comparative physical maps as well as candidate gene discovery for a range of agronomically important traits.