• Title/Summary/Keyword: Data Collecting

Search Result 2,212, Processing Time 0.028 seconds

Development of urban river data management platform(I) (도시하천관리 연계 플랫폼 개발(I))

  • Lee, Sunghack;Shim, Kyucheoul;Koo, Bonhyun
    • Journal of Korea Water Resources Association
    • /
    • v.52 no.12
    • /
    • pp.1087-1098
    • /
    • 2019
  • In this study, we developed an integrated urban river data platform that collects, cleans, and provides data for urban river management. The urban river integrated data platform has the function of collecting data provided by various institutions using the Open API service. The collected data is purified through pre-processing and loaded into a database. The collected data can be reviewed and analyzed using a visualization system and provided through the Open API, so that it can be used as individual input data by combining them in the urban river model. In addition, the development system for real-time data was developed to apply real-time data to urban river models. Through this, users will be able to reduce the time and effort required for data collection, pre-processing and input data construction, thereby increasing efficiency and scalability in the development of urban river models and systems.

Correlation Between the “seeing FWHM” of Satellite Optical Observations and Meteorological Data at the OWL-Net Station, Mongolia

  • Bae, Young-Ho;Jo, Jung Hyun;Yim, Hong-Suh;Park, Young-Sik;Park, Sun-Youp;Moon, Hong Kyu;Choi, Young-Jun;Jang, Hyun-Jung;Roh, Dong-Goo;Choi, Jin;Park, Maru;Cho, Sungki;Kim, Myung-Jin;Choi, Eun-Jung;Park, Jang-Hyun
    • Journal of Astronomy and Space Sciences
    • /
    • v.33 no.2
    • /
    • pp.137-146
    • /
    • 2016
  • The correlation between meteorological data collected at the optical wide-field patrol network (OWL-Net) Station No. 1 and the seeing of satellite optical observation data was analyzed. Meteorological data and satellite optical observation data from June 2014 to November 2015 were analyzed. The analyzed meteorological data were the outdoor air temperature, relative humidity, wind speed, and cloud index data, and the analyzed satellite optical observation data were the seeing full-width at half-maximum (FWHM) data. The annual meteorological pattern for Mongolia was analyzed by collecting meteorological data over four seasons, with data collection beginning after the installation and initial set-up of the OWL-Net Station No. 1 in Mongolia. A comparison of the meteorological data and the seeing of the satellite optical observation data showed that the seeing degrades as the wind strength increases and as the cloud cover decreases. This finding is explained by the bias effect, which is caused by the fact that the number of images taken on the less cloudy days was relatively small. The seeing FWHM showed no clear correlation with either temperature or relative humidity.

A Study on the Construction of RDM in an Organization Using Big Data and Block Chain (빅데이터와 블록체인을 활용한 조직내 RDM 구축방안)

  • Lee, Kyung-Hee;Choi, Youngjin;Cho, Wan-Sup
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.127-139
    • /
    • 2019
  • Research Data Management (RDM) is a system that encompasses people, policies, resources and technologies that provide and support directions in producing, collecting, using, and preserving research data. RDMs consist of a wide range of activities, including supporting the creation of data management plans (DMPs), building data collections and repositories, and digital preservation and distribution. In advanced countries, systems for RDMs and related organizations are well organized and functioning, but in Korea, the management system is insufficient due to low level of data awareness. In this paper, we propose a plan to establish a research data management system suitable for the reality. In particular, it is important to reflect in RDM that the construction of big data platforms for the collection and management of big data in each field and organization is increasing rapidly. Also, we will discuss how to provide data provision and researchers' data sovereignty using blockchain technology, and propose a P2P-based decentralized RDM scheme.

  • PDF

A Big Data Preprocessing using Statistical Text Mining (통계적 텍스트 마이닝을 이용한 빅 데이터 전처리)

  • Jun, Sunghae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.5
    • /
    • pp.470-476
    • /
    • 2015
  • Big data has been used in diverse areas. For example, in computer science and sociology, there is a difference in their issues to approach big data, but they have same usage to analyze big data and imply the analysis result. So the meaningful analysis and implication of big data are needed in most areas. Statistics and machine learning provide various methods for big data analysis. In this paper, we study a process for big data analysis, and propose an efficient methodology of entire process from collecting big data to implying the result of big data analysis. In addition, patent documents have the characteristics of big data, we propose an approach to apply big data analysis to patent data, and imply the result of patent big data to build R&D strategy. To illustrate how to use our proposed methodology for real problem, we perform a case study using applied and registered patent documents retrieved from the patent databases in the world.

An Approach to Survey Data with Nonresponse: Evaluation of KEPEC Data with BMI (무응답이 있는 설문조사연구의 접근법 : 한국노인약물역학코호트 자료의 평가)

  • Baek, Ji-Eun;Kang, Wee-Chang;Lee, Young-Jo;Park, Byung-Joo
    • Journal of Preventive Medicine and Public Health
    • /
    • v.35 no.2
    • /
    • pp.136-140
    • /
    • 2002
  • Objectives : A common problem with analyzing survey data involves incomplete data with either a nonresponse or missing data. The mail questionnaire survey conducted for collecting lifestyle variables on the members of the Korean Elderly Phamacoepidemiologic Cohort(KEPEC) in 1996 contains some nonresponse or missing data. The proper statistical method was applied to evaluate the missing pattern of a specific KEPEC data, which had no missing data in the independent variable and missing data in the response variable, BMI. Methods : The number of study subjects was 8,689 elderly people. Initially, the BMI and significant variables that influenced the BMI were categorized. After fitting the log-linear model, the probabilities of the people on each category were estimated. The EM algorithm was implemented using a log-linear model to determine the missing mechanism causing the nonresponse. Results : Age, smoking status, and a preference of spicy hot food were chosen as variables that influenced the BMI. As a result of fitting the nonignorable and ignorable nonresponse log-linear model considering these variables, the difference in the deviance in these two models was 0.0034(df=1). Conclusion : There is a lot of risk if an inference regarding the variables and large samples is made without considering the pattern of missing data. On the basis of these results, the missing data occurring in the BMI is the ignorable nonresponse. Therefore, when analyzing the BMI in KEPEC data, the inference can be made about the data without considering the missing data.

A method of event data stream processing for ALE Middleware (ALE 미들웨어를 위한 이벤트 데이터 처리 방법)

  • Noh, Young-Sik;Lee, Dong-Cheol;Byun, Yung-Cheol
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.9
    • /
    • pp.1554-1563
    • /
    • 2008
  • As the interests on RFID technologies increase, a lot of research activities on RFID middleware systems to handle the data acquired by RFID readers are going on actively. Meanwhile, even though various kinds of RFID middleware methodologies and related techniques have been proposed, the common data type which is dealt with in those systems is an EPC code, mainly. Also, there are few researches of the implementation of collecting the stream data queued from RFID readers endlessly and without blocking, classifying the data into some groups according to usage, and sending the resulting data to specific applications. In this paper, we propose the method of data handling in RFID middleware to efficiently process an EPC event stream data using detail filtering, checking of data modification, creation of data set to transfer, data grouping, and various kinds of RFID data format transform. Our method is based on a de facto international standard interface defined in the ALE middleware specification by EPCglobal, and application and service users can directly set various kinds of conditions to handle the stream data.

Outlier Detection of Real-Time Reservoir Water Level Data Using Threshold Model and Artificial Neural Network Model (임계치 모형과 인공신경망 모형을 이용한 실시간 저수지 수위자료의 이상치 탐지)

  • Kim, Maga;Choi, Jin-Yong;Bang, Jehong;Lee, Jaeju
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.61 no.1
    • /
    • pp.107-120
    • /
    • 2019
  • Reservoir water level data identify the current water storage of the reservoir, and they are utilized as primary data for management and research of agricultural water. For the reservoir storage management, Korea Rural Community Corporation (KRC) installed water level stations at around 1,600 agricultural reservoirs and has been collecting the water level data every 10 minutes. However, various kinds of outliers due to noise and erroneous problems are frequently appearing because of environmental and physical causes. Therefore, it is necessary to detect outlier and improve the quality of reservoir water level data to utilize the water level data in purpose. This study was conducted to detect and classify outlier and normal data using two different models including the threshold model and the artificial neural network (ANN) model. The results were compared to evaluate the performance of the models. The threshold model identifies the outlier by setting the upper/lower bound of water level data and variation data and by setting bandwidth of water level data as a threshold of regarding erroneous water level. The ANN model was trained with prepared training dataset as normal data (T) and outlier (F), and the ANN model operated for identifying the outlier. The models are evaluated with reference data which were collected reservoir water level data in daily by KRC. The outlier detection performance of the threshold model was better than the ANN model, but ANN model showed better detection performance for not classifying normal data as outlier.

ECU Data Integrity Verification System Using Blockchain (블록체인을 활용한 ECU 데이터 무결성 검증 시스템)

  • Sang-Pil, Byeon;Ho-Yoon, Kim;Seung-Soo, Shin
    • Journal of Industrial Convergence
    • /
    • v.20 no.11
    • /
    • pp.57-63
    • /
    • 2022
  • If ECU data, which is responsible for collecting and processing data such as sensors and signals of automobiles, is manipulated by an attack, it can cause damage to the driver. In this paper, we propose a system that verifies the integrity of automotive ECU data using blockchain. Since the car and the server encrypt data using the session key to transmit and receive data, reliability is ensured in the communication process. The server verifies the integrity of the transmitted data using a hash function, and if there is no problem in the data, it is stored in the blockchain and off-chain distributed storage. The ECU data hash value is stored in the blockchain and cannot be tampered with, and the original ECU data is stored in a distributed storage. Using the verification system, users can verify attacks and tampering with ECU data, and malicious users can access ECU data and perform integrity verification when data is tampered with. It can be used according to the user's needs in situations such as insurance, car repair, trading and sales. For future research, it is necessary to establish an efficient system for real-time data integrity verification.

Web-based Data Integration System Implementation for Reliability Improvement of a Product (제품의 신뢰성 향상를 위한 웹 기반 데이터 통합 시스템 구축에 관한 연구)

  • Kyung, Tae-Won;Kim, Sang-Kuk
    • Information Systems Review
    • /
    • v.7 no.2
    • /
    • pp.117-128
    • /
    • 2005
  • This study proposes an integrated monitoring system for data reliability improvement in a steel manufacturing industry. The data obtained from existing steel manufacturing process is not micro data which is gathered at the occurring point, but average value (macro data) which is gathered from the occurring point to ending point. This kind of macro data is not only difficult for a detailed analysis for an error causing factor, but it might cause a fatal influence as well on the quality of produced goods even if the error is within an error tolerance. And during the process of steel production, thousands of data is produced in a second, thus requiring database plan to manage abundant amount of data. Therefore, the following proposed system is capable of collecting as well as analyzing all the data generated from the process of product production. And the system was able to raise the efficiency of the database server by planning the database to handle large capacity data. Also, by applying web-based technology, inquiries and analysis of data with no limit on time and space was possible with PC connected to the intranet. Hence, the system was able to work on effective quality improvement of manufactured products, plus able to raise the reliability of the product. Also, accumulated data from long period of time was used for fundamental material for new controlling model, operation technology, and new product development.

Design of Distributed Hadoop Full Stack Platform for Big Data Collection and Processing (빅데이터 수집 처리를 위한 분산 하둡 풀스택 플랫폼의 설계)

  • Lee, Myeong-Ho
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.7
    • /
    • pp.45-51
    • /
    • 2021
  • In accordance with the rapid non-face-to-face environment and mobile first strategy, the explosive increase and creation of many structured/unstructured data every year demands new decision making and services using big data in all fields. However, there have been few reference cases of using the Hadoop Ecosystem, which uses the rapidly increasing big data every year to collect and load big data into a standard platform that can be applied in a practical environment, and then store and process well-established big data in a relational database. Therefore, in this study, after collecting unstructured data searched by keywords from social network services based on Hadoop 2.0 through three virtual machine servers in the Spring Framework environment, the collected unstructured data is loaded into Hadoop Distributed File System and HBase based on the loaded unstructured data, it was designed and implemented to store standardized big data in a relational database using a morpheme analyzer. In the future, research on clustering and classification and analysis using machine learning using Hive or Mahout for deep data analysis should be continued.