• Title/Summary/Keyword: Data Deluge

Search Result 14, Processing Time 0.022 seconds

Correspondence Strategy for Big Data's New Customer Value and Creation of Business (빅 데이터의 새로운 고객 가치와 비즈니스 창출을 위한 대응 전략)

  • Koh, Joon-Cheol;Lee, Hae-Uk;Jeong, Jee-Youn;Kim, Kyung-Sik
    • Journal of the Korea Safety Management & Science
    • /
    • v.14 no.4
    • /
    • pp.229-238
    • /
    • 2012
  • Within last 10 years, internet has become a daily activity, and humankind had to face the Data Deluge, a dramatic increase of digital data (Economist 2012). Due to exponential increase in amount of digital data, large scale data has become a big issue and hence the term 'big data' appeared. There is no official agreement in quantitative and detailed definition of the 'big data', but the meaning is expanding to its value and efficacy. Big data not only has the standardized personal information (internal) like customer information, but also has complex data of external, atypical, social, and real time data. Big data's technology has the concept that covers wide range technology, including 'data achievement, save/manage, analysis, and application'. To define the connected technology of 'big data', there are Big Table, Cassandra, Hadoop, MapReduce, Hbase, and NoSQL, and for the sub-techniques, Text Mining, Opinion Mining, Social Network Analysis, Cluster Analysis are gaining attention. The three features that 'bid data' needs to have is about creating large amounts of individual elements (high-resolution) to variety of high-frequency data. Big data has three defining features of volume, variety, and velocity, which is called the '3V'. There is increase in complexity as the 4th feature, and as all 4features are satisfied, it becomes more suitable to a 'big data'. In this study, we have looked at various reasons why companies need to impose 'big data', ways of application, and advanced cases of domestic and foreign applications. To correspond effectively to 'big data' revolution, paradigm shift in areas of data production, distribution, and consumption is needed, and insight of unfolding and preparing future business by considering the unpredictable market of technology, industry environment, and flow of social demand is desperately needed.

Development of Scoring Model on Customer Attrition Probability by Using Data Mining Techniques

  • Han, Sang-Tae;Lee, Seong-Keon;Kang, Hyun-Cheol;Ryu, Dong-Kyun
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.1
    • /
    • pp.271-280
    • /
    • 2002
  • Recently, many companies have applied data mining techniques to promote competitive power in the field of their business market. In this study, we address how data mining, that is a technique to enable to discover knowledge from a deluge of data, Is used in an executed project in order to support decision making of an enterprise. Also, we develope scoring model on customer attrition probability for automobile-insurance company using data mining techniques. The development of scoring model in domestic insurance is given as an example concretely.

Performance of Distributed Database System built on Multicore Systems

  • Kim, Kangseok
    • Journal of Internet Computing and Services
    • /
    • v.18 no.6
    • /
    • pp.47-53
    • /
    • 2017
  • Recently, huge datasets have been generating rapidly in a variety of fields. Then, there is an urgent need for technologies that will allow efficient and effective processing of huge datasets. Therefore the problems of partitioning a huge dataset effectively and alleviating the processing overhead of the partitioned data efficiently have been a critical factor for scalability and performance in distributed database system. In our work we utilized multicore servers to provide scalable service to our distributed system. The partitioning of database over multicore servers have emerged from a need for new architectural design of distributed database system from scalability and performance concerns in today's data deluge. The system allows uniform access through a web service interface to concurrently distributed databases over multicore servers, using SQMD (Single Query Multiple Database) mechanism based on publish/subscribe paradigm. We will present performance results with the distributed database system built on multicore server, which is time intensive with traditional architectures. We will also discuss future works.

A data management system for microbial genome projects

  • Ki-Bong Kim;Hyeweon Nam;Hwajung Seo and Kiejung Park
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.83-85
    • /
    • 2000
  • A lot of microbial genome sequencing projects is being done in many genome centers around the world, since the first genome, Haemophilus influenzae, was sequenced in 1995. The deluge of microbial genome sequence data demands new and highly automatic data flow system in order for genome researchers to manage and analyze their own bulky sequence data from low-level to high-level. In such an aspect, we developed the automatic data management system for microbial genome projects, which consists mainly of local database, analysis programs, and user-friendly interface. We designed and implemented the local database for large-scale sequencing projects, which makes systematic and consistent data management and retrieval possible and is tightly coupled with analysis programs and web-based user interface, That is, parsing and storage of the results of analysis programs in local database is possible and user can retrieve the data in any level of data process by means of web-based graphical user interface. Contig assembly, homology search, and ORF prediction, which are essential in genome projects, make analysis programs in our system. All but Contig assembly program are open as public domain. These programs are connected with each other by means of a lot of utility programs. As a result, this system will maximize the efficiency in cost and time in genome research.

  • PDF

Classification and Resolution of Conflicts for Integration of Heterogeneous Information Based on XML Schema (XML Schema 기반 이질 정보 통합의 충돌 분류와 해결 방안)

  • 권석훈;이경하;이규철
    • Journal of Information Technology Applications and Management
    • /
    • v.10 no.3
    • /
    • pp.55-74
    • /
    • 2003
  • Due to the evolution of computer systems and the proliferation of Internet, numerous information resources have been constructed. The deluge of information makes the need to integrate information, which are distributed on the internet and are handled in heterogeneous systems. Recently, most of the XML -based information integration systems use XML DTD(Document Type Definition) for describing integrated global schema. However, DTD has some limitations in modeling local information resources such as datatypes. Although W3C's XML Schema is more flexible and powerful than XML DTD in specifying integrated global schema, it has more complex problems in resolving conflicts than using DTD. In this paper, we provide a taxonomy of conflict problems in integration information resources using XML Schema, and propose conflict resolution mechanism using XQuery.

  • PDF

Data anomaly detection for structural health monitoring of bridges using shapelet transform

  • Arul, Monica;Kareem, Ahsan
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.93-103
    • /
    • 2022
  • With the wider availability of sensor technology through easily affordable sensor devices, several Structural Health Monitoring (SHM) systems are deployed to monitor vital civil infrastructure. The continuous monitoring provides valuable information about the health of the structure that can help provide a decision support system for retrofits and other structural modifications. However, when the sensors are exposed to harsh environmental conditions, the data measured by the SHM systems tend to be affected by multiple anomalies caused by faulty or broken sensors. Given a deluge of high-dimensional data collected continuously over time, research into using machine learning methods to detect anomalies are a topic of great interest to the SHM community. This paper contributes to this effort by proposing a relatively new time series representation named "Shapelet Transform" in combination with a Random Forest classifier to autonomously identify anomalies in SHM data. The shapelet transform is a unique time series representation based solely on the shape of the time series data. Considering the individual characteristics unique to every anomaly, the application of this transform yields a new shape-based feature representation that can be combined with any standard machine learning algorithm to detect anomalous data with no manual intervention. For the present study, the anomaly detection framework consists of three steps: identifying unique shapes from anomalous data, using these shapes to transform the SHM data into a local-shape space and training machine learning algorithms on this transformed data to identify anomalies. The efficacy of this method is demonstrated by the identification of anomalies in acceleration data from an SHM system installed on a long-span bridge in China. The results show that multiple data anomalies in SHM data can be automatically detected with high accuracy using the proposed method.

The Effect of Flood Discharge due to Dam Breach on Downstream Channel (댐붕괴시 홍수가 하천하류에 미치는 영향)

  • Ahn, Sang-Jin;Lee, Jun-Geun;Yeon, In-Sung;You, Hyung-Gyu
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2006.05a
    • /
    • pp.1666-1670
    • /
    • 2006
  • The purpose of this study is to analyze how a downstream channel is affected in case of hypothetical dam failure. The object of it is Hwacheon dam basin within the basin of North Han river. This study has analyzed the influence on Pyeonghwa(Peace) dam and Hwacheon dam supposing that the Imnam dam in North Korea on the upper stream of North Han river is failed hypothetically at the MFWL(maximum flood water level) by a deluge of rain. The model applied at the main study is NWS(National Weather Service) FLDWAV(Flood Wave Routing Model). Dam breach characteristics data are analyzed by making nine hypothetical scenarios on the basis of other studies on the shape and size of dam breach, time of failure and so on. Expected peak discharge through the breach is verified to have the propriety in comparison with empirical function which is developed on the basis of the case of dam breach in the foreign countries and it is observed that peak discharge is more increasing, as the time of breach gets shorter and the breach width gets bigger. As a result of main study, even though the Imnam dam is hypothetically failed down, there has no influence on the Hwacheon dam of the downstream as the extended Pyeonghwa dam on the downstream controls the volume of discharge properly.

  • PDF

Secure Knowledge Management for Prevent illegal data leakage by Internal users (내부 사용자에 의한 불법 데이터 유출 방지를 위한 안전한 지식관리 시스템)

  • Seo, Dae-Hee;Baek, Jang-Mi;Lee, Min-Kyung;Yoon, Mi-Yeon;Cho, Dong-Sub
    • Journal of Internet Computing and Services
    • /
    • v.11 no.2
    • /
    • pp.73-84
    • /
    • 2010
  • Rapid development of Internet has increased users' desire for more information, and as a result, it created 'deluge of information', generating so much information. Especially, profit-pursuing corporations have done a lot of research to secure its own technological power. However, damages caused by illegal copy of information by illegal outside users or insiders are coming to the fore as social problem. Therefore, this paper is to propose secure knowledge management system to prevent illegal copy of data by insiders. The proposed scheme is a secure knowledge management system that carries out explicit authentication for internal users using 2MAC and provides data based on the authentication, thereby preventing illegal copy of data by insiders.

X2RD: Storing and Querying XML Data Using XPath To Relational Database (X2RD: XPath를 이용한 XML 데이터의 관계형 데이터베이스로의 저장과 질의)

  • Oh, Sang-Yoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.3
    • /
    • pp.57-64
    • /
    • 2009
  • XML has become a do facto standard for structured document and data on the Web. An XML data deluge over the network will be more, since XML based standards such as Web Service and Semantic Web gets popular. There are efforts to store and query XML documents in a relational database system and recent efforts focus on how to provide such operations using XPath and XQuery. In this paper, we present study about those research efforts and we propose a new scheme to stoγe and query XML documents in a relational database using XPath query. The scheme uses a 'shred' method to store and translates XPath queries to SQL. We also present our empirical experiments using a RDBMS.

Legislation Cases, Management Policies and Countermeasures on Scientific Data -Focusing Australia, the United States and China- (과학데이터에 관한 입법례와 관리정책 그리고 대응방안 -호주, 미국, 중국을 중심으로-)

  • Yoon, Chong-Min;Kim, Kyubin
    • Journal of Korea Technology Innovation Society
    • /
    • v.16 no.1
    • /
    • pp.63-100
    • /
    • 2013
  • Research data means data in the form of facts, observations, images, computer program results, recordings, measurements or experiences on which an argument, theory, test or hypothesis, or another research output is based. Data may be numerical, descriptive, visual or tactile. Scientific research is changing because of the paradigm shift. It is all being affected by the data deluge, and a data-intensive science paradigm is emerging. Hence, paradigm shift in scientific research led to increase of value and importance of scientific data. Essential to the creative research and development for scientific data can be reused efficiently is the sharing and utilization of establishing management system. Establishing of management system for sharing and utilization of scientific data should be done at the national level, but compared with Europe, Australia, the United States, China, the management system of Korea doesn't have not linkage or efficiency or internal stability. Australia, the United States, China continues to expand a Mid- and Long-Term policy making, legislation, its investment in infrastructure, so as to promote the utilization of data, such as collection, management and maintenance of scientific data through the relevant agencies at the national level. This study consider legislation cases and management policies of the above countries to the end to that establish management system for the efficient and fair sharing and utilization of scientific data and the legal system, and that provide scientific data legislation and policies related to the future of our country.

  • PDF