• Title/Summary/Keyword: 데이터 집계

Search Result 215, Processing Time 0.027 seconds

Privacy-Preserving Collection and Analysis of Medical Microdata

  • Jong Wook Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.5
    • /
    • pp.93-100
    • /
    • 2024
  • With the advent of the Fourth Industrial Revolution, cutting-edge technologies such as artificial intelligence, big data, the Internet of Things, and cloud computing are driving innovation across industries. These technologies are generating massive amounts of data that many companies are leveraging. However, there is a notable reluctance among users to share sensitive information due to the privacy risks associated with collecting personal data. This is particularly evident in the healthcare sector, where the collection of sensitive information such as patients' medical conditions poses significant challenges, with privacy concerns hindering data collection and analysis. This research presents a novel technique for collecting and analyzing medical data that not only preserves privacy, but also effectively extracts statistical information. This method goes beyond basic data collection by incorporating a strategy to efficiently mine statistical data while maintaining privacy. Performance evaluations using real-world data have shown that the propose technique outperforms existing methods in extracting meaningful statistical insights.

Evaluation of Metro Services based on Transit Smart Card Data (A Case Study of Incheon Line 1) (스마트카드 데이터를 활용한 도시철도 서비스 평가 (인천 1호선의 차내혼잡과 정시성을 중심으로))

  • Eom, Jin-Ki;Choi, Myoung-Hun;Kim, Dae-Sung;Lee, Jun;Song, Ji-Young
    • Journal of the Korean Society for Railway
    • /
    • v.15 no.1
    • /
    • pp.80-87
    • /
    • 2012
  • This study analyzed the quality of a commuter rail service of Incheon line 1 with respect to two service measures such as occupancy (crowdedness) and punctuality based on transit smart card data collected in 2009. In order to analyze the metro services by individual fleet, we aggregated the personal level card data into the fleet operated in each planned schedule. The results show a low level of service for both crowdedness and punctuality during peak hours at the line segment from 'Gyeyang' to 'International business district'. Further, a close relationship between vehicle occupancy and punctuality is found, which illustrates high passenger demand causes successive metro delay.

Construction of Incremental Federated Learning System using Flower (Flower을 사용한 점진적 연합학습시스템 구성)

  • Yun-Hee Kang;Myungju Kang
    • Journal of Platform Technology
    • /
    • v.11 no.4
    • /
    • pp.80-88
    • /
    • 2023
  • To construct a learning model in the field of artificial intelligence, a dataset should be collected and be delivered to the central server where the learning model is constructed. Federated learning is a machine learning method building a global learning model without transmitting data located in a client side in a collaborative manner. It can be used to protect privacy, and after constructing a local trained model on individual clients, the parameters of the local model are aggregated centrally to update the global model. In this paper, we reuse the existing learning parameter to improve federated learning, describe incremental federated learning. For this work, we do experiments using the federated learning framework named Flower, and evaluate the experiment results with regard to elapsed time and precision when executing optimization algorithms.

  • PDF

Robust Data, Event, and Privacy Services in Real-Time Embedded Sensor Network Systems (실시간 임베디드 센서 네트워크 시스템에서 강건한 데이터, 이벤트 및 프라이버시 서비스 기술)

  • Jung, Kang-Soo;Kapitanova, Krasimira;Son, Sang-H.;Park, Seog
    • Journal of KIISE:Databases
    • /
    • v.37 no.6
    • /
    • pp.324-332
    • /
    • 2010
  • The majority of event detection in real-time embedded sensor network systems is based on data fusion that uses noisy sensor data collected from complicated real-world environments. Current research has produced several excellent low-level mechanisms to collect sensor data and perform aggregation. However, solutions that enable these systems to provide real-time data processing using readings from heterogeneous sensors and subsequently detect complex events of interest in real-time fashion need further research. We are developing real-time event detection approaches which allow light-weight data fusion and do not require significant computing resources. Underlying the event detection framework is a collection of real-time monitoring and fusion mechanisms that are invoked upon the arrival of sensor data. The combination of these mechanisms and the framework has the potential to significantly improve the timeliness and reduce the resource requirements of embedded sensor networks. In addition to that, we discuss about a privacy that is foundation technique for trusted embedded sensor network system and explain anonymization technique to ensure privacy.

OLAP System and Performance Evaluation for Analyzing Web Log Data (웹 로그 분석을 위한 OLAP 시스템 및 성능 평가)

  • 김지현;용환승
    • Journal of Korea Multimedia Society
    • /
    • v.6 no.5
    • /
    • pp.909-920
    • /
    • 2003
  • Nowadays, IT for CRM has been growing and developed rapidly. Typical techniques are statistical analysis tools, on-line multidimensional analytical processing (OLAP) tools, and data mining algorithms (such neural networks, decision trees, and association rules). Among customer data, web log data is very important and to use these data efficiently, applying OLAP technology to analyze multi-dimensionally. To make OLAP cube, we have to precalculate multidimensional summary results in order to get fast response. But as the number of dimensions and sparse cells increases, data explosion occurs seriously and the performance of OLAP decreases. In this paper, we presented why the web log data sparsity occurs and then what kinds of sparsity patterns generate in the two and t.he three dimensions for OLAP. Based on this research, we set up the multidimensional data models and query models for benchmark with each sparsity patterns. Finally, we evaluated the performance of three OLAP systems (MS SQL 2000 Analysis Service, Oracle Express and C-MOLAP).

  • PDF

Iceberg Query Evaluation Technical Using a Cuboid Prefix Tree (큐보이드 전위트리를 이용한 빙산질의 처리)

  • Han, Sang-Gil;Yang, Woo-Sock;Lee, Won-Suk
    • Journal of KIISE:Databases
    • /
    • v.36 no.3
    • /
    • pp.226-234
    • /
    • 2009
  • A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to the characteristics of a data stream, it is impossible to save all the data elements of a data stream. Therefore it is necessary to define a new synopsis structure to store the summary information of a data stream. For this purpose, this paper proposes a cuboid prefix tree that can be effectively employed in evaluating an iceberg query over data streams. A cuboid prefix tree only stores those itemsets that consist of grouping attributes used in GROUP BY query. In addition, a cuboid prefix tree can compute multiple iceberg queries simultaneously by sharing their common sub-expressions. A cuboid prefix tree evaluates an iceberg query over an infinitely generated data stream while efficiently reducing memory usage and processing time, which is verified by a series of experiments.

A Strategy for Inference Control of Official Statistics - Centering around the Patent Application Expense Support Project - (공식통계의 추론통제 전략 - 정부의 특허경비지원사업 사례를 중심으로 -)

  • Lee, Duck-Sung;Choi, In-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.11
    • /
    • pp.199-211
    • /
    • 2009
  • Official statistics which are collected for governments and the community can be used to assess the effectiveness of governments' policies and programs. Thus, official statistics should be collected and presented based on correct findings. Erroneous official statistics will lead to lower quality results in assessing those policies and programs. Many statistical agencies, today, use on-line analytical processing (OLAP) data cubes which support OLAP tasks like aggregation and subtotals as a key part of their dissemination strategy of official statistics. Confidentiality protection in data cubes also should be made. However, sensitive parts of data cubes including micro data may be disclosed by malicious inferences. The authors have suggested an inference control process in OLAP data cubes which preventing erroneous cube creating and securing cubes against privacy breaches. The objective of this study is to establish a strategy for inference control of official statistics using the inference control process by taking the case of the Patent Application Expense Support Project.

Research on supporting the group by clause reflecting XML data characteristics in XQuery (XQuery에서의 XML 데이터 특성을 고려한 group by 지원을 위한 질의 표현 기법에 대한 연구)

  • Lee Min-Soo;Cho Hye-Young;Oh Jung-Sun;Kim Yun-Mi;Song Soo-Kyung
    • The KIPS Transactions:PartD
    • /
    • v.13D no.4 s.107
    • /
    • pp.501-512
    • /
    • 2006
  • XML is the most popular platform-independent data expression which is used to communicate between loosely coupled heterogeneous systems such as B2B Applications or Workflow systems. The powerful query language XQuery has been developed to support diverse needs for querying XML documents. XQuery is designed to configure results from diverse data sources into a uniquely structured query result. Therefore, it became the standard for the XML query language. Although the latest XQuery supports heavy search functions including iterations, the grouping mechanism for data is too primitive and makes the query expression difficult and complex. Therefore, this work is focused on supporting the groupby clause in the query expression to process XQuery grouping. We suggest it to be a more efficient way to process grouping for restructuring and aggregation functions on XML data. We propose an XQuery EBNF that includes the groupby clause and implemented an XQuery processing system with grouping functions based on the eXist Native XML Database.

Development of Web-based Network Diligence and Indolence Management System (웹 기반 네트워크 근태 관리 시스템 개발)

  • Choi, Woo-Sik;Kim, Byung-Joon;An, Beong-Ku
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.11 no.1
    • /
    • pp.151-158
    • /
    • 2011
  • In recent diligence and indolence management systems, server and client are not separated and data are not convert into data base. Therefore these recent systems have several weaknesses such as data modification and management. In this paper, we propose a new Web-based Network Diligence and Indolence Management system (WNDIM) to solve the weakness and improve the system performance of recent system. The main features and contributions of the proposed system are as follows. First, server and client are separated, and all data are converted into data base, Second, with the construction of APM server the data modification and management are efficiently operated. Third, a political decision of the system is defined. Fourth, the system can efficiently support user-oriented services such as the collection of diligence and indolence data and the sum of off-duty days. From the implementation and performance evaluation of the proposed WNDIM, we can see the system can efficiently support the diligence and indolence management, currently we are using the proposed WNDIM system in the real filed.

Proposal of a Monitoring System to Determine the Possibility of Contact with Confirmed Infectious Diseases Using K-means Clustering Algorithm and Deep Learning Based Crowd Counting (K-평균 군집화 알고리즘 및 딥러닝 기반 군중 집계를 이용한 전염병 확진자 접촉 가능성 여부 판단 모니터링 시스템 제안)

  • Lee, Dongsu;ASHIQUZZAMAN, AKM;Kim, Yeonggwang;Sin, Hye-Ju;Kim, Jinsul
    • Smart Media Journal
    • /
    • v.9 no.3
    • /
    • pp.122-129
    • /
    • 2020
  • The possibility that an asymptotic coronavirus-19 infected person around the world is not aware of his infection and can spread it to people around him is still a very important issue in that the public is not free from anxiety and fear over the spread of the epidemic. In this paper, the K-means clustering algorithm and deep learning-based crowd aggregation were proposed to determine the possibility of contact with confirmed cases of infectious diseases. As a result of 300 iterations of all input learning images, the PSNR value was 21.51, and the final MAE value for the entire data set was 67.984. This means the average absolute error between observations and the average absolute error of fewer than 4,000 people in each CCTV scene, including the calculation of the distance and infection rate from the confirmed patient and the surrounding persons, the net group of potential patient movements, and the prediction of the infection rate.