• Title/Summary/Keyword: Large Scale Data

Search Result 2,796, Processing Time 0.041 seconds

Similarity measurement based on Min-Hash for Preserving Privacy

  • Cha, Hyun-Jong;Yang, Ho-Kyung;Song, You-Jin
    • International Journal of Advanced Culture Technology
    • /
    • v.10 no.2
    • /
    • pp.240-245
    • /
    • 2022
  • Because of the importance of the information, encryption algorithms are heavily used. Raw data is encrypted and secure, but problems arise when the key for decryption is exposed. In particular, large-scale Internet sites such as Facebook and Amazon suffer serious damage when user data is exposed. Recently, research into a new fourth-generation encryption technology that can protect user-related data without the use of a key required for encryption is attracting attention. Also, data clustering technology using encryption is attracting attention. In this paper, we try to reduce key exposure by using homomorphic encryption. In addition, we want to maintain privacy through similarity measurement. Additionally, holistic similarity measurements are time-consuming and expensive as the data size and scope increases. Therefore, Min-Hash has been studied to efficiently estimate the similarity between two signatures Methods of measuring similarity that have been studied in the past are time-consuming and expensive as the size and area of data increases. However, Min-Hash allowed us to efficiently infer the similarity between the two sets. Min-Hash is widely used for anti-plagiarism, graph and image analysis, and genetic analysis. Therefore, this paper reports privacy using homomorphic encryption and presents a model for efficient similarity measurement using Min-Hash.

Access Control Mechanism for CouchDB

  • Ashwaq A., Al-otaibi;Reem M., Alotaibi;Nermin, Hamza
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.12
    • /
    • pp.107-115
    • /
    • 2022
  • Recently, big data applications need another database different from the Relation database. NoSQL databases are used to save and handle massive amounts of data. NoSQL databases have many advantages over traditional databases like flexibility, efficiently processing data, scalability, and dynamic schemas. Most of the current applications are based on the web, and the size of data is in increasing. NoSQL databases are expected to be used on a more and large scale in the future. However, NoSQL suffers from many security issues, and one of them is access control. Many recent applications need Fine-Grained Access control (FGAC). The integration of the NoSQL databases with FGAC will increase their usability in various fields. It will offer customized data protection levels and enhance security in NoSQL databases. There are different NoSQL database models, and a document-based database is one type of them. In this research, we choose the CouchDB NoSQL document database and develop an access control mechanism that works at a fain-grained level. The proposed mechanism uses role-based access control of CouchDB and restricts read access to work at the document level. The experiment shows that our mechanism effectively works at the document level in CouchDB with good execution time.

A Kafka-based Data Sharing Method for Educational Video Services (교육 동영상 공유 서비스의 카프카 기반 데이터 공유 방안)

  • Lee, Hyeon sup;Kim, Jin-Deog
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.574-576
    • /
    • 2021
  • It is necessary to introduce micro-service techniques when constructing large-scale operating systems or systems that take into account scalability. Kafka is a message queue with the pub/sub model, which has features that are well applied to distributed environments and is also suitable for microservices in that it can utilize various data sources. In this paper, we propose a data sharing method for educational video sharing services using Apache's Kafka. The proposed system builds a Kafka cluster for the educational video sharing service to share various data, and also uses a spark cluster to link with recommendation systems based on similarities in educational videos. We also present a way to share various data sources, such as files, various DBMS, etc.

  • PDF

The Productivity Impact of Working from Home and the Moderating Effect of Task Characteristics: An Empirical Investigation of Field Data (재택근무가 업무 생산성에 미치는 영향과 업무 특성의 조절 효과: 대규모 현장 데이터를 활용한 실증 분석)

  • Jae-Young Kim;Dong-Joo Lee
    • Asia-Pacific Journal of Business
    • /
    • v.15 no.1
    • /
    • pp.113-129
    • /
    • 2024
  • Purpose - This study aims to empirically identify the quantitative effects of work from home (WFH) on employee productivity using field data. Design/methodology/approach - Based on large-scale field data from a South Korean company which introduced the WFH arrangement in 2020, we conducted fixed effect and moderating effect analyses using individual-level panel data over sixty-three weeks. Findings - The empirical analysis generated several findings. It was found that overall, WFH has a positive effect on productivity. However, the productivity impact of WFH was found to vary depending on task characteristics. Specifically, WFH led to over 20% increase in productivity for simple and repetitive tasks. On the other hand, no significant productivity impact was observed for professional and knowledge-based tasks. Research implications or Originality - As the first study based on field data from South Korea, this study offers convincing causal evidence of the moderating impact of task characteristics on the relationship between WFH and productivity. Further, the above findings provide managers with practical insights concerning their work arrangement decisions.

Informatics for protein identification by tandem mass spectrometry; Focused on two most-widely applied algorithms, Mascot and SEQUEST

  • Sohn, Chang-Ho;Jung, Jin-Woo;Kang, Gum-Yong;Kim, Kwang-Pyo
    • Bioinformatics and Biosystems
    • /
    • v.1 no.2
    • /
    • pp.89-94
    • /
    • 2006
  • Mass spectrometry (MS) is widely applied for high throughput proteomics analysis. When large-scale proteome analysis experiments are performed, it generates massive amount of data. To search these proteomics data against protein databases, fully automated database search algorithms, such as Mascot and SEQUEST are routinely employed. At present, it is critical to reduce false positives and false negatives during such analysis. In this review we have focused on aspects of automated protein identification using tandem mass spectrometry (MS/MS) spectra and validation of the protein identifications of two most common automated protein identification algorithms Mascot and SEQUEST.

  • PDF

Internal pressure in a low-rise building with existing envelope openings and sudden breaching

  • Tecle, Amanuel S.;Bitsuamlak, Girma T.;Aly, Aly Mousaad
    • Wind and Structures
    • /
    • v.16 no.1
    • /
    • pp.25-46
    • /
    • 2013
  • This paper presents a boundary-layer wind tunnel (BLWT) study on the effect of variable dominant openings on steady and transient responses of wind-induced internal pressure in a low-rise building. The paper presents a parametric study focusing on differences and similarities between transient and steady-state responses, the effects of size and locations of dominant openings and vent openings, and the effects of wind angle of attack. In addition, the necessity of internal volume correction during sudden breaching, i.e., a transient response experiment was investigated. A comparison of the BLWT data with ASCE 7-2010, as well as with limited large-scale data obtained at a 'Wall of Wind' facility, is presented.

Modeling and Evaluation on the Dispersion of Air Pollutants in the Large Scale Thermal Power Plant (대단위발전소의 대기오염물질 확산에 관한 모델링 및 평가에 관한 연구)

  • Chun, Sang-Ki;Lee, Sung-Chul
    • Journal of Environmental Impact Assessment
    • /
    • v.6 no.2
    • /
    • pp.81-92
    • /
    • 1997
  • This paper presents the results from the comparison analysis and evaluation between the air pollutant dispersion modeling results and the observation data in the area within a 10 km radius from the Boryong thermal power plants. The observation data used in this study were the air pollutant concentrations which had been continuously measured from 8 locations around the Boryong power plants by TMS(tele-monitoring system) for 3 months from September to November, 1996. The short-term and long-term predictions were carried out using ISC3 model and LPDM(Lagrangian Panicle Dispersion Model). The results of ISC3 modeling in a short-term showed highly as 0.7 in a correlation coefficient, but in a long-term showed just 0.54. On the other hand, LPDM showed 0.78 in a correlation coefficient for a long-term, but in a short-term showed highly value than the observation concentrations.

  • PDF

Remote Parallel Pseudo-Dynamic Testings Using Internet on Base Isolated Bridge (인터넷을 이용한 원격병렬 유사동적실험 : 면진교량에 대하여)

  • 윤정방;김재민;김남식;심종민;구기영
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 2000.04b
    • /
    • pp.304-307
    • /
    • 2000
  • This paper presents a numerical simulation study for remote parallel pseudo-dynamic testings using Internet. In this testing method, experimental facilities located at different places can be parallelly used for testing a large-scale structure with many components subjected to severe nonlinear behavior. Example analysis is carried out on a base- isolated bridge for earthquake loading. The results indicate that the time required for data communication between two facilities located 250km apart through Internet for t 000 time steps is about 20 minutes, which is fairly equivalent to the time required for pseudo-dynamic testing. This testing method can be more powerful, as the data transmitting technique through Internet improves.

  • PDF

A VR-based Tile Display System for the Distributed Visualization (분산 가시화를 위한 가상현실 타일 디스플레이 시스템의 개발)

  • Cha, Moo-Hyun;Lee, Jae-Kyung;Hwang, Jin-Sang;Han, Soon-Hung
    • Korean Journal of Computational Design and Engineering
    • /
    • v.15 no.3
    • /
    • pp.167-177
    • /
    • 2010
  • In recent years, the use of high-resolution tiled display system which does not have restrictions on the size of the screen and implements various layout of tile is increasing in order to evaluate the digital mock-up in physical scale or explore large engineering data set in detail. In this study, we developed multi-channel distributed visualization system which provides a virtual reality-based visual contents using 3D open-source graphics engine. Efficient data structures and exchange methods were proposed as a scene synchronization technology in PC cluster environments. DLP-Cube based tiled visualization system which provides $5{\times}2$ layout of display wall was developed and we validated our approach using this system. In addition, we introduced integrated control program that administrates PC cluster environment in remote and controls the layout of display channels.

FINDING COSMIC SHOCKS: SYNTHETIC X-RAY ANALYSIS OF A COSMOLOGICAL SIMULATION

  • HALLMAN ERIC J.;RYU DONGSU;KANG HYESUNG;JONES T. W.
    • Journal of The Korean Astronomical Society
    • /
    • v.37 no.5
    • /
    • pp.593-596
    • /
    • 2004
  • We introduce a method of identifying evidence of shocks in the X-ray emitting gas in clusters of galaxies. Using information from synthetic observations of simulated clusters, we do a blind search of the synthetic image plane. The locations of likely shocks found using this method closely match those of shocks identified in the simulation hydrodynamic data. Though this method assumes nothing about the geometry of the shocks, the general distribution of shocks as a function of Mach number in the cluster hydrodynamic data can be extracted via this method. Characterization of the cluster shock distribution is critical to understanding production of cosmic rays in clusters and the use of shocks as dynamical tracers.