• Title/Summary/Keyword: data science department

Search Result 26,952, Processing Time 0.059 seconds

Intelligent Robust Base-Station Research in Harsh Outdoor Wilderness Environments for Wildsense

  • Ahn, Junho;Mysore, Akshay;Zybko, Kati;Krumm, Caroline;Lee, Dohyeon;Kim, Dahyeon;Han, Richard;Mishra, Shivakant;Hobbs, Thompson
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.3
    • /
    • pp.814-836
    • /
    • 2021
  • Wildlife ecologists and biologists recapture deer to collect tracking data from deer collars or wait for a drop-off of a deer collar construction that is automatically detached and disconnected. The research teams need to manage a base camp with medical trailers, helicopters, and airplanes to capture deer or wait for several months until the deer collar drops off of the deer's neck. We propose an intelligent robust base-station research with a low-cost and time saving method to obtain recording sensor data from their collars to a listener node, and readings are obtained without opening the weatherproof deer collar. We successfully designed the and implemented a robust base station system for automatically collecting data of the collars and listener motes in harsh wilderness environments. Intelligent solutions were also analyzed for improved data collections and pattern predictions with drone-based detection and tracking algorithms.

Bioinformatics services for analyzing massive genomic datasets

  • Ko, Gunhwan;Kim, Pan-Gyu;Cho, Youngbum;Jeong, Seongmun;Kim, Jae-Yoon;Kim, Kyoung Hyoun;Lee, Ho-Yeon;Han, Jiyeon;Yu, Namhee;Ham, Seokjin;Jang, Insoon;Kang, Byunghee;Shin, Sunguk;Kim, Lian;Lee, Seung-Won;Nam, Dougu;Kim, Jihyun F.;Kim, Namshin;Kim, Seon-Young;Lee, Sanghyuk;Roh, Tae-Young;Lee, Byungwook
    • Genomics & Informatics
    • /
    • v.18 no.1
    • /
    • pp.8.1-8.10
    • /
    • 2020
  • The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and ensuing computational problems. In Korea, the amount of genomic data has been increasing rapidly in the recent years. Leveraging these big data requires researchers to use large-scale computational resources and analysis pipelines. A promising solution for addressing this computational challenge is cloud computing, where CPUs, memory, storage, and programs are accessible in the form of virtual machines. Here, we present a cloud computing-based system, Bio-Express, that provides user-friendly, cost-effective analysis of massive genomic datasets. Bio-Express is loaded with predefined multi-omics data analysis pipelines, which are divided into genome, transcriptome, epigenome, and metagenome pipelines. Users can employ predefined pipelines or create a new pipeline for analyzing their own omics data. We also developed several web-based services for facilitating downstream analysis of genome data. Bio-Express web service is freely available at https://www. bioexpress.re.kr/.

Improving Data Accuracy Using Proactive Correlated Fuzzy System in Wireless Sensor Networks

  • Barakkath Nisha, U;Uma Maheswari, N;Venkatesh, R;Yasir Abdullah, R
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.9
    • /
    • pp.3515-3538
    • /
    • 2015
  • Data accuracy can be increased by detecting and removing the incorrect data generated in wireless sensor networks. By increasing the data accuracy, network lifetime can be increased parallel. Network lifetime or operational time is the time during which WSN is able to fulfill its tasks by using microcontroller with on-chip memory radio transceivers, albeit distributed sensor nodes send summary of their data to their cluster heads, which reduce energy consumption gradually. In this paper a powerful algorithm using proactive fuzzy system is proposed and it is a mixture of fuzzy logic with comparative correlation techniques that ensure high data accuracy by detecting incorrect data in distributed wireless sensor networks. This proposed system is implemented in two phases there, the first phase creates input space partitioning by using robust fuzzy c means clustering and the second phase detects incorrect data and removes it completely. Experimental result makes transparent of combined correlated fuzzy system (CCFS) which detects faulty readings with greater accuracy (99.21%) than the existing one (98.33%) along with low false alarm rate.

Randomized Response Model with Discrete Quantitative Attribute by Three-Stage Cluster Sampling

  • Lee, Gi-Sung;Hong, Ki-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.4
    • /
    • pp.1067-1082
    • /
    • 2003
  • In this paper, we propose a randomized response model with discrete quantitative attribute by three-stage cluster sampling for obtaining discrete quantitative data by using the Liu & Chow model(1976), when the population was made up of sensitive discrete quantitative clusters. We obtain the minimum variance by calculating the optimum number of fsu, ssu, tsu under the some given constant cost. And we obtain the minimum cost under the some given accuracy.

  • PDF

Semiparametric kernel logistic regression with longitudinal data

  • Shim, Joo-Yong;Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.2
    • /
    • pp.385-392
    • /
    • 2012
  • Logistic regression is a well known binary classification method in the field of statistical learning. Mixed-effect regression models are widely used for the analysis of correlated data such as those found in longitudinal studies. We consider kernel extensions with semiparametric fixed effects and parametric random effects for the logistic regression. The estimation is performed through the penalized likelihood method based on kernel trick, and our focus is on the efficient computation and the effective hyperparameter selection. For the selection of optimal hyperparameters, cross-validation techniques are employed. Numerical results are then presented to indicate the performance of the proposed procedure.

Study on Decision-Making Factors of Big Data Application in Enterprises: Using Company S as an Example

  • Huang, Yun Kuei;Yang, Wen I.;Chan, Ching Sen
    • East Asian Journal of Business Economics (EAJBE)
    • /
    • v.4 no.1
    • /
    • pp.5-15
    • /
    • 2016
  • With vigorous development of global network community, smart phones and mobile devices, enterprises can rapidly collect various kinds of data from internal and external environments. How to discover valuable information and transform it into new business opportunities from big data which grow rapidly is an extremely important issue for current enterprises. This study treats Company S as the subject and tries to find the factors of big data application in enterprises by a modified Decision Making Trial and Evaluation Laboratory (DEMATEL) and perceived benefits - perceived barriers relation matrix as reference for big data application and management of managers or marketing personnel in other organizations or related industry.

Estimations of Parameters in Multi-component Series Systems Using Masked Data

  • Sarhan Ammar M.;Abouammoh A.M.;Al-Ameri Mansour
    • International Journal of Reliability and Applications
    • /
    • v.7 no.1
    • /
    • pp.41-53
    • /
    • 2006
  • The exact cause of the system's failure is often unknown in the masked system lifetime data. In such type of data, there are two observable quantities, namely (i) the systems time to failure and (ii) the set of systems components that contains the component, which might cause the system to fail. Our objective in this paper is to use the maximum likelihood procedure in the presence of masked data to make inference for the reliability of the system's components. We assume a multi-component series system where each component has a constant failure rate. Different cases that permit for closed form solutions of point estimates are considered. The results obtained in this paper generalize other published results.

  • PDF

Location-Based Services for Dynamic Range Queries

  • Park Kwangjin;Song Moonbae;Hwang Chong-Sun
    • Journal of Communications and Networks
    • /
    • v.7 no.4
    • /
    • pp.478-488
    • /
    • 2005
  • To conserve the usage of energy, indexing techniques have been developed in a wireless mobile environment. However, the use of interleaved index segments in a broadcast cycle increases the average access latency for the clients. In this paper, we present the broadcast-based location dependent data delivery scheme (BBS) for dynamic range queries. In the BBS, broadcasted data objects are sorted sequentially based on their locations, and the server broadcasts the location dependent data along with an index segment. Then, we present a data prefetching and caching scheme, designed to reduce the query response time. The performance of this scheme is investigated in relation to various environmental variables, such as the distributions of the data objects, the average speed of the clients, and the size of the service area.

A Data Mining Procedure for Unbalanced Binary Classification (불균형 이분 데이터 분류분석을 위한 데이터마이닝 절차)

  • Jung, Han-Na;Lee, Jeong-Hwa;Jun, Chi-Hyuck
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.36 no.1
    • /
    • pp.13-21
    • /
    • 2010
  • The prediction of contract cancellation of customers is essential in insurance companies but it is a difficult problem because the customer database is large and the target or cancelled customers are a small proportion of the database. This paper proposes a new data mining approach to the binary classification by handling a large-scale unbalanced data. Over-sampling, clustering, regularized logistic regression and boosting are also incorporated in the proposed approach. The proposed approach was applied to a real data set in the area of insurance and the results were compared with some other classification techniques.

Scalable Big Data Pipeline for Video Stream Analytics Over Commodity Hardware

  • Ayub, Umer;Ahsan, Syed M.;Qureshi, Shavez M.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.4
    • /
    • pp.1146-1165
    • /
    • 2022
  • A huge amount of data in the form of videos and images is being produced owning to advancements in sensor technology. Use of low performance commodity hardware coupled with resource heavy image processing and analyzing approaches to infer and extract actionable insights from this data poses a bottleneck for timely decision making. Current approach of GPU assisted and cloud-based architecture video analysis techniques give significant performance gain, but its usage is constrained by financial considerations and extremely complex architecture level details. In this paper we propose a data pipeline system that uses open-source tools such as Apache Spark, Kafka and OpenCV running over commodity hardware for video stream processing and image processing in a distributed environment. Experimental results show that our proposed approach eliminates the need of GPU based hardware and cloud computing infrastructure to achieve efficient video steam processing for face detection with increased throughput, scalability and better performance.