• Title/Summary/Keyword: Large Scale Data

Search Result 2,773, Processing Time 0.033 seconds

Comparative Study of Dimension Reduction Methods for Highly Imbalanced Overlapping Churn Data

  • Lee, Sujee;Koo, Bonhyo;Jung, Kyu-Hwan
    • Industrial Engineering and Management Systems
    • /
    • v.13 no.4
    • /
    • pp.454-462
    • /
    • 2014
  • Retention of possible churning customer is one of the most important issues in customer relationship management, so companies try to predict churn customers using their large-scale high-dimensional data. This study focuses on dealing with large data sets by reducing the dimensionality. By using six different dimension reduction methods-Principal Component Analysis (PCA), factor analysis (FA), locally linear embedding (LLE), local tangent space alignment (LTSA), locally preserving projections (LPP), and deep auto-encoder-our experiments apply each dimension reduction method to the training data, build a classification model using the mapped data and then measure the performance using hit rate to compare the dimension reduction methods. In the result, PCA shows good performance despite its simplicity, and the deep auto-encoder gives the best overall performance. These results can be explained by the characteristics of the churn prediction data that is highly correlated and overlapped over the classes. We also proposed a simple out-of-sample extension method for the nonlinear dimension reduction methods, LLE and LTSA, utilizing the characteristic of the data.

An Efficient Data Transmission to Cloud Storage using USB Hijacking (USB 하이재킹을 이용한 클라우드 스토리지로의 효율적인 데이터 전송 기법)

  • Eom, Hyun-Chul;No, Jae-Chun
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.6
    • /
    • pp.47-55
    • /
    • 2011
  • The performance of data transmission from mobile devices to cloud storages is limited by the amount of data being transferred, communication speed and battery consumption of mobile devices. Especially, when the large-scale data communication takes place using mobile devices, such as smart phones, the performance turbulence and power consumption become an obstacle to establish the reliable communication environment. In this paper, we present an efficient data transmission method using USB Hijacking. In our approach, the synchronization to transfer a large amount of data between mobile devices and user PC is executed by using USB Hijacking. Also, there is no need to concern about data capacity and battery consumption in the data communication. We presented several experimental results to verify the effectiveness and suitability of our approach.

Data anomaly detection for structural health monitoring using a combination network of GANomaly and CNN

  • Liu, Gaoyang;Niu, Yanbo;Zhao, Weijian;Duan, Yuanfeng;Shu, Jiangpeng
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.53-62
    • /
    • 2022
  • The deployment of advanced structural health monitoring (SHM) systems in large-scale civil structures collects large amounts of data. Note that these data may contain multiple types of anomalies (e.g., missing, minor, outlier, etc.) caused by harsh environment, sensor faults, transfer omission and other factors. These anomalies seriously affect the evaluation of structural performance. Therefore, the effective analysis and mining of SHM data is an extremely important task. Inspired by the deep learning paradigm, this study develops a novel generative adversarial network (GAN) and convolutional neural network (CNN)-based data anomaly detection approach for SHM. The framework of the proposed approach includes three modules : (a) A three-channel input is established based on fast Fourier transform (FFT) and Gramian angular field (GAF) method; (b) A GANomaly is introduced and trained to extract features from normal samples alone for class-imbalanced problems; (c) Based on the output of GANomaly, a CNN is employed to distinguish the types of anomalies. In addition, a dataset-oriented method (i.e., multistage sampling) is adopted to obtain the optimal sampling ratios between all different samples. The proposed approach is tested with acceleration data from an SHM system of a long-span bridge. The results show that the proposed approach has a higher accuracy in detecting the multi-pattern anomalies of SHM data.

Building Large-scale CityGML Feature for Digital 3D Infrastructure (디지털 3D 인프라 구축을 위한 대규모 CityGML 객체 생성 방법)

  • Jang, Hanme;Kim, HyunJun;Kang, HyeYoung
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.39 no.3
    • /
    • pp.187-201
    • /
    • 2021
  • Recently, the demand for a 3D urban spatial information infrastructure for storing, operating, and analyzing a large number of digital data produced in cities is increasing. CityGML is a 3D spatial information data standard of OGC (Open Geospatial Consortium), which has strengths in the exchange and attribute expression of city data. Cases of constructing 3D urban spatial data in CityGML format has emerged on several cities such as Singapore and New York. However, the current ecosystem for the creation and editing of CityGML data is limited in constructing CityGML data on a large scale because of lack of completeness compared to commercial programs used to construct 3D data such as sketchup or 3d max. Therefore, in this study, a method of constructing CityGML data is proposed using commercial 3D mesh data and 2D polygons that are rapidly and automatically produced through aerial LiDAR (Light Detection and Ranging) or RGB (Red Green Blue) cameras. During the data construction process, the original 3D mesh data was geometrically transformed so that each object could be expressed in various CityGML LoD (Levels of Detail), and attribute information extracted from the 2D spatial information data was used as a supplement to increase the utilization as spatial information. The 3D city features produced in this study are CityGML building, bridge, cityFurniture, road, and tunnel. Data conversion for each feature and property construction method were presented, and visualization and validation were conducted.

Hazelcast Vs. Ignite: Opportunities for Java Programmers

  • Maxim, Bartkov;Tetiana, Katkova;S., Kruglyk Vladyslav;G., Murtaziev Ernest;V., Kotova Olha
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.406-412
    • /
    • 2022
  • Storing large amounts of data has always been a big problem from the beginning of computing history. Big Data has made huge advancements in improving business processes by finding the customers' needs using prediction models based on web and social media search. The main purpose of big data stream processing frameworks is to allow programmers to directly query the continuous stream without dealing with the lower-level mechanisms. In other words, programmers write the code to process streams using these runtime libraries (also called Stream Processing Engines). This is achieved by taking large volumes of data and analyzing them using Big Data frameworks. Streaming platforms are an emerging technology that deals with continuous streams of data. There are several streaming platforms of Big Data freely available on the Internet. However, selecting the most appropriate one is not easy for programmers. In this paper, we present a detailed description of two of the state-of-the-art and most popular streaming frameworks: Apache Ignite and Hazelcast. In addition, the performance of these frameworks is compared using selected attributes. Different types of databases are used in common to store the data. To process the data in real-time continuously, data streaming technologies are developed. With the development of today's large-scale distributed applications handling tons of data, these databases are not viable. Consequently, Big Data is introduced to store, process, and analyze data at a fast speed and also to deal with big users and data growth day by day.

A Study on the Korea Distribution Promotion Policy and Adjustment Policy (국내 유통진흥정책과 유통조정정책에 대한 고찰)

  • Kim, Dae-Yun;Kwon, Sung-Ku
    • Journal of Distribution Science
    • /
    • v.11 no.4
    • /
    • pp.89-97
    • /
    • 2013
  • Purpose - The purpose of this study is to systematically review the background of the Korean distribution promotion policy and distribution adjustment policies along with related regulations and policies. Research design, data, and methodology - Domestic distribution policy and relevant laws were examined through a review of existing research literature. The results of the development process of the domestic distribution policy, promotion policies, and adjustment policies are summarized below. Results - The results are summarized as follows. First, the purpose of the development of the domestic distribution promotion policy was to strengthen the competitiveness of the small and medium business industry through structural advancement of the small and medium industry. By expanding the managerial base for the small and medium industry, a new balance could be created in the national economy. There was a requirement for an early assistance policy for small and medium businesses as a base of these businesses in the distribution industry developed from their original model of catering to a traditional market of retail shops. Since 1996, there was a need for this early assistance policy due to the expansion and rapid growth of large scale stores causing a change in the consumption pattern for distribution markets and the decline of large enterprises. Second, the government supports small and medium business distribution through distribution promotion policies by supporting an organization promoting small business and supporting innovation in the distribution system. Third, in 1961 a business mediation system was established to protect small and medium industries. The Small and Medium Business Administration advises conglomerates to postpone acquisitions, restrain expansion of the business, or to reduce business scale if small businesses undergo an adverse effect such as decreasing demand because large companies are expanding into their areas. Fourth, the Distribution Adjustment Policy managed large-scale store regulation as follows: ① limitation on construction by urban planning ordinance, ② limitation on location based on traffic impact assessments, ③ regulation based on business guidelines by chiefs of autonomous bodies, ④ regulation on mandatory holidays and limitation of business hours. This large-scale store regulation is a policy introduced by authority to increase competitiveness of small and medium business distribution by the government. Conclusions - As discussed in this study, the distribution promotion policy and distribution adjustment policy are government distribution policies focused on the protection of the small and medium distribution businesses. This study is timely, since it was planned when the strengthening of the revisions of the Distribution Industry Development Act, aimed to protect small and medium retailers and merchants, was under discussion. The significance of this study is that it offers insights for the development of new policies in the future and an opportunity to consider the background of the distribution policy by the government.

  • PDF

Thermal and Electrical Energy Mix Optimization(EMO) Method for Real Large-scaled Residential Town Plan

  • Kang, Cha-Nyeong;Cho, Soo-Hwan
    • Journal of Electrical Engineering and Technology
    • /
    • v.13 no.1
    • /
    • pp.513-520
    • /
    • 2018
  • Since Paris Climate Change Conference in 2015, many policies to reduce the emission of greenhouse gas have been accelerating, which are mainly related to renewable energy resources and micro-grid. Presently, the technology development and demonstration projects are mostly focused on diversifying the power resources by adding wind turbine, photo-voltaic and battery storage system in the island-type small micro-grid. It is expected that the large-scaled micro-grid projects based on the regional district and town/complex city, e.g. the block type micro-grid project in Daegu national industrial complex will proceed in the near future. In this case, the economic cost or the carbon emission can be optimized by the efficient operation of energy mix and the appropriate construction of electric and heat supplying facilities such as cogeneration, renewable energy resources, BESS, thermal storage and the existing heat and electricity supplying networks. However, when planning a large residential town or city, the concrete plan of the energy infrastructure has not been established until the construction plan stage and provided by the individual energy suppliers of water, heat, electricity and gas. So, it is difficult to build the efficient energy portfolio considering the characteristics of town or city. This paper introduces an energy mix optimization(EMO) method to determine the optimal capacity of thermal and electric resources which can be applied in the design stage of the real large-scaled residential town or city, and examines the feasibility of the proposed method by applying the real heat and electricity demand data of large-scale residential towns with thousands of households and by comparing the result of HOMER simulation developed by National Renewable Energy Laboratory(NREL).

Analysis of Eco-Area Application Characteristics of Apartment Complexes : Focusing on Eco-Area Ratio, Eco-Area Diversity, and Eco-Area Connectivity (공동주택단지 생태면적 적용 특성 분석 : 생태면적률, 생태면적 다양성, 생태면적 연계성을 중심으로)

  • Seung-Bin An;Chan-Ho Kim;Chang-Soo Lee
    • Land and Housing Review
    • /
    • v.15 no.1
    • /
    • pp.77-97
    • /
    • 2024
  • This study aims to examine the distinctions in evaluation index items between overseas and domestic ecological area-related systems, derive analytical indicators, and assess recently completed apartment complexes before and after the implementation of overall ecological area ratios. The objective is to analyse variances in the application of ecological area characteristics, categorizing them into ecological area analysis indicators and presenting their implications. The spatial scope covers completed apartment complexes in both metropolitan and non-metropolitan areas. Thirty-six completed apartment complexes were selected for analysis, and basic ecological area data were compiled. Subsequently, the data was utilized to categorize three analysis indicators-ecological area ratio, ecological area diversity, and ecological area connectivity-by metropolitan and non-metropolitan areas, as well as by type of apartment complex (sale housing versus rental housing) and size (large-scale, medium-scale, and small-scale). Results of the analysis indicate higher ecological area ratios and greater diversity in ecological area spatial types in metropolitan areas compared to non-metropolitan areas, and in pre-sale housing complexes compared to rental housing complexes. Mediumand large-scale apartment complexes exhibit higher ecological area ratios, with ecological area diversity being more pronounced. Ecological area connectivity reveals more numerous and varied connection points and types in metropolitan areas than in non-metropolitan areas. Implications of this study suggest that large-scale development should prioritize securing ecological area ratios and diversity in apartment complexes. Enhancing biodiversity necessitates establishing connections within and beyond the ecological area network of the complex. Future research should focus on linking the ecological area network within the complex.

Assessment of whipping and springing on a large container vessel

  • Barhoumi, Mondher;Storhaug, Gaute
    • International Journal of Naval Architecture and Ocean Engineering
    • /
    • v.6 no.2
    • /
    • pp.442-458
    • /
    • 2014
  • Wave induced vibrations increase the fatigue and extreme loading, but this is normally neglected in design. The industry view on this is changing. Wave induced vibrations are often divided into springing and whipping, and their relative contribution to fatigue and extreme loading varies depending on ship design. When it comes to displacement vessels, the contribution from whipping on fatigue and extreme loading is particularly high for certain container vessels. A large modern design container vessel with high bow flare angle and high service speed has been considered. The container vessel was equipped with a hull monitoring system from a recognized supplier of HMON systems. The vessel has been operating between Asia and Europe for a few years and valuable data has been collected. Also model tests have been carried out of this vessel to investigate fatigue and extreme loading, but model tests are often limited to head seas. For the full scale measurements, the correlation between stress data and wind data has been investigated. The wave and vibration damage are shown versus heading and Beaufort strength to indicate general trends. The wind data has also been compared to North Atlantic design environment. Even though it has been shown that the encountered wind data has been much less severe than in North Atlantic, the extreme loading defined by IACS URS11 is significantly exceeded when whipping is included. If whipping may contribute to collapse, then proper seamanship may be useful in order to limit the extreme loading. The vibration damage is also observed to be high from head to beam seas, and even present in stern seas, but fatigue damage in general is low on this East Asia to Europe trade.

An Analysis of the Overhead of Multiple Buffer Pool Scheme on InnoDB-based Database Management Systems (InnoDB 기반 DBMS에서 다중 버퍼 풀 오버헤드 분석)

  • Song, Yongju;Lee, Minho;Eom, Young Ik
    • Journal of KIISE
    • /
    • v.43 no.11
    • /
    • pp.1216-1222
    • /
    • 2016
  • The advent of large-scale web services has resulted in gradual increase in the amount of data used in those services. These big data are managed efficiently by DBMS such as MySQL and MariaDB, which use InnoDB engine as their storage engine, since InnoDB guarantees ACID and is suitable for handling large-scale data. To improve I/O performance, InnoDB caches data and index of its database through a buffer pool. It also supports multiple buffer pools to mitigate lock contentions. However, the multiple buffer pool scheme leads to the additional data consistency overhead. In this paper, we analyze the overhead of the multiple buffer pool scheme. In our experimental results, although multiple buffer pool scheme mitigates the lock contention by up to 46.3%, throughput of DMBS is significantly degraded by up to 50.6% due to increased disk I/O and fsync calls.