• Title/Summary/Keyword: Data-science

Search Result 55,842, Processing Time 0.063 seconds

An Unified Spatial Index and Visualization Method for the Trajectory and Grid Queries in Internet of Things

  • Han, Jinju;Na, Chul-Won;Lee, Dahee;Lee, Do-Hoon;On, Byung-Won;Lee, Ryong;Park, Min-Woo;Lee, Sang-Hwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.9
    • /
    • pp.83-95
    • /
    • 2019
  • Recently, a variety of IoT data is collected by attaching geosensors to many vehicles that are on the road. IoT data basically has time and space information and is composed of various data such as temperature, humidity, fine dust, Co2, etc. Although a certain sensor data can be retrieved using time, latitude and longitude, which are keys to the IoT data, advanced search engines for IoT data to handle high-level user queries are still limited. There is also a problem with searching large amounts of IoT data without generating indexes, which wastes a great deal of time through sequential scans. In this paper, we propose a unified spatial index model that handles both grid and trajectory queries using a cell-based space-filling curve method. also it presents a visualization method that helps user grasp intuitively. The Trajectory query is to aggregate the traffic of the trajectory cells passed by taxi on the road searched by the user. The grid query is to find the cells on the road searched by the user and to aggregate the fine dust. Based on the generated spatial index, the user interface quickly summarizes the trajectory and grid queries for specific road and all roads, and proposes a Web-based prototype system that can be analyzed intuitively through road and heat map visualization.

Development of HDF Browser for the Utilization of EOC Imagery

  • Seo, Hee-Kyung;Ahn, Seok-Beom;Park, Eun-Chul;Hahn, Kwang-Soo;Choi, Joon-Soo;Kim, Choen
    • Korean Journal of Remote Sensing
    • /
    • v.18 no.1
    • /
    • pp.61-69
    • /
    • 2002
  • The purpose of Electro-Optical Camera (EOC), the primary payload of KOMPSAT-1, is to collect high resolution visible imagery of the Earth including Korean Peninsula. EOC images will be distributed to the public or many user groups including government, public corporations, academic or research institutes. KARI will offer the online service to the users through internet. Some application, e.g., generation of Digital Elevation Model (DEM), needs a secondary data such as satellite ephemeris data, attitude data to process the EOC imagery. EOC imagery with these ancillary information will be distributed in a file of Hierarchical Data Format (HDF) file formal. HDF is a physical file format that allows storage of many different types of scientific data including images, multidimensional data arrays, record oriented data, and point data. By the lack of public domain softwares supporting HDF file format, many public users may not access EOC data without difficulty. The purpose of this research is to develop a browsing system of EOC data for the general users not only for scientists who are the main users of HDF. The system is PC-based and huts user-friendly interface.

Data-Adaptive ECOC for Multicategory Classification

  • Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.1
    • /
    • pp.25-36
    • /
    • 2008
  • Error Correcting Output Codes (ECOC) can improve generalization performance when applied to multicategory classification problem. In this study we propose a new criterion to select hyperparameters included in ECOC scheme. Instead of margins of a data we propose to use the probability of misclassification error since it makes the criterion simple. Using this we obtain an upper bound of leave-one-out error of OVA(one vs all) method. Our experiments from real and synthetic data indicate that the bound leads to good estimates of parameters.

  • PDF

Semi-supervised regression based on support vector machine

  • Seok, Kyungha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.2
    • /
    • pp.447-454
    • /
    • 2014
  • In many practical machine learning and data mining applications, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. Therefore semi-supervised learning algorithms have attracted much attentions. However, previous research mainly focuses on classication problems. In this paper, a semi-supervised regression method based on support vector regression (SVR) formulation that is proposed. The estimator is easily obtained via the dual formulation of the optimization problem. The experimental results with simulated and real data suggest superior performance of the our proposed method compared with standard SVR.

Mixed-effects LS-SVR for longitudinal dat

  • Cho, Dae-Hyeon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.2
    • /
    • pp.363-369
    • /
    • 2010
  • In this paper we propose a mixed-effects least squares support vector regression (LS-SVR) for longitudinal data. We add a random-effect term in the optimization function of LS-SVR to take random effects into LS-SVR for analyzing longitudinal data. We also present the model selection method that employs generalized cross validation function for choosing the hyper-parameters which affect the performance of the mixed-effects LS-SVR. A simulated example is provided to indicate the usefulness of mixed-effect method for analyzing longitudinal data.

Increasing Splicing Site Prediction by Training Gene Set Based on Species

  • Ahn, Beunguk;Abbas, Elbashir;Park, Jin-Ah;Choi, Ho-Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.11
    • /
    • pp.2784-2799
    • /
    • 2012
  • Biological data have been increased exponentially in recent years, and analyzing these data using data mining tools has become one of the major issues in the bioinformatics research community. This paper focuses on the protein construction process in higher organisms where the deoxyribonucleic acid, or DNA, sequence is filtered. In the process, "unmeaningful" DNA sub-sequences (called introns) are removed, and their meaningful counterparts (called exons) are retained. Accurate recognition of the boundaries between these two classes of sub-sequences, however, is known to be a difficult problem. Conventional approaches for recognizing these boundaries have sought for solely enhancing machine learning techniques, while inherent nature of the data themselves has been overlooked. In this paper we present an approach which makes use of the data attributes inherent to species in order to increase the accuracy of the boundary recognition. For experimentation, we have taken the data sets for four different species from the University of California Santa Cruz (UCSC) data repository, divided the data sets based on the species types, then trained a preprocessed version of the data sets on neural network(NN)-based and support vector machine(SVM)-based classifiers. As a result, we have observed that each species has its own specific features related to the splice sites, and that it implies there are related distances among species. To conclude, dividing the training data set based on species would increase the accuracy of predicting splicing junction and propose new insight to the biological research.

A Case Study of the Curriculum of Data Science for Elementary School Teachers (초등교사 대상의 기초 데이터 과학 교육의 사례 연구)

  • Jo, Junghee
    • Journal of The Korean Association of Information Education
    • /
    • v.25 no.6
    • /
    • pp.899-906
    • /
    • 2021
  • Data science is a discipline comprised of the academic fields of statistics, computer science, information technology, and domain knowledge. It analyzes data and derives meaningful results using complex technologies. Data science, along with artificial intelligence, is a core technology of the 4th industrial revolution; consequently, universities and companies worldwide are actively developing programs to develop data scientists who require high levels of expertise. In line with this undertaking, the field of elementary education has recognized the importance of data science education and so various studies have been conducted to develop curricula designed to help students understand how to use data. This paper proposes a curriculum for the purpose of educating elementary school teachers who are mostly non-majors in the computer field about data science. Satisfaction analysis was conducted based on questionnaires collected from students to analyze the effectiveness of the data science education proposed in this paper.

Automatic Detection of Cow's Oestrus in Audio Surveillance System

  • Chung, Y.;Lee, J.;Oh, S.;Park, D.;Chang, H.H.;Kim, S.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.26 no.7
    • /
    • pp.1030-1037
    • /
    • 2013
  • Early detection of anomalies is an important issue in the management of group-housed livestock. In particular, failure to detect oestrus in a timely and accurate way can become a limiting factor in achieving efficient reproductive performance. Although a rich variety of methods has been introduced for the detection of oestrus, a more accurate and practical method is still required. In this paper, we propose an efficient data mining solution for the detection of oestrus, using the sound data of Korean native cows (Bos taurus coreanea). In this method, we extracted the mel frequency cepstrum coefficients from sound data with a feature dimension reduction, and use the support vector data description as an early anomaly detector. Our experimental results show that this method can be used to detect oestrus both economically (even a cheap microphone) and accurately (over 94% accuracy), either as a standalone solution or to complement known methods.

Fishery R&D Big Data Platform and Metadata Management Strategy (수산과학 빅데이터 플랫폼 구축과 메타 데이터 관리방안)

  • Kim, Jae-Sung;Choi, Youngjin;Han, Myeong-Soo;Hwang, Jae-Dong;Cho, Wan-Sup
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.93-103
    • /
    • 2019
  • In this paper, we introduce a big data platform and a metadata management technique for fishery science R & D information. The big data platform collects and integrates various types of fisheries science R & D information and suggests how to build it in the form of a data lake. In addition to existing data collected and accumulated in the field of fisheries science, we also propose to build a big data platform that supports diverse analysis by collecting unstructured big data such as satellite image data, research reports, and research data. Next, by collecting and managing metadata during data extraction, preprocessing and storage, systematic management of fisheries science big data is possible. By establishing metadata in a standard form along with the construction of a big data platform, it is meaningful to suggest a systematic and continuous big data management method throughout the data lifecycle such as data collection, storage, utilization and distribution.

  • PDF

Development of Terra MODIS data pre-processing system on WWW

  • Takeuchi, W.;Nemoto, T.;Baruah, P.J.;Ochi, S.;Yasuoka, Y.
    • Proceedings of the KSRS Conference
    • /
    • 2002.10a
    • /
    • pp.569-572
    • /
    • 2002
  • Terra MODIS is one of the few space-borne sensors currently capable of acquiring radiometric data over the range of view angles. Institute of Industrial Science, University of Tokyo, has been receiving Terra MODIS data at Tokyo since May 2001 and Asian Institute of Technology at Bangkok since May 2001. They can cover whole East Asia and is expected to monitor environmental changes regularly such as deforestation, forest fires, floods and typhoon. Over eight hundred scenes have been archived in the storage system and they occupy 2 TB of disk space so far. In this study, MODIS data processing system on WWW is developed including following functions: spectral subset (250m, 500m, 1000m channels), radiometric correction to radiance, spatial subset of geocoded data as a rectangular area with latitude-longitude grid system in HDF format, generation of a quick look file in JPEG format. Users will be notified just after all the process have finished via e-mail. Using this system enables us to process MODIS data on WWW with a few input parameters and download the processed data by FTP access. An easy to use interface is expected to promote the use of MODIS data. This system is available via the Internet on the following URL from September 1 2002, "http : //webmodis.iis.u-tokyo.ac.jp/".

  • PDF