Search | Korea Science

Metadata extraction using AI and advanced metadata research for web services (AI를 활용한 메타데이터 추출 및 웹서비스용 메타데이터 고도화 연구)

Sung Hwan Park
- The Journal of the Convergence on Culture Technology
- /
- v.10 no.2
- /
- pp.499-503
- /
- 2024
Broadcasting programs are provided to various media such as Internet replay, OTT, and IPTV services as well as self-broadcasting. In this case, it is very important to provide keywords for search that represent the characteristics of the content well. Broadcasters mainly use the method of manually entering key keywords in the production process and the archive process. This method is insufficient in terms of quantity to secure core metadata, and also reveals limitations in recommending and using content in other media services. This study supports securing a large number of metadata by utilizing closed caption data pre-archived through the DTV closed captioning server developed in EBS. First, core metadata was automatically extracted by applying Google's natural language AI technology. The next step is to propose a method of finding core metadata by reflecting priorities and content characteristics as core research contents. As a technology to obtain differentiated metadata weights, the importance was classified by applying the TF-IDF calculation method. Successful weight data were obtained as a result of the experiment. The string metadata obtained by this study, when combined with future string similarity measurement studies, becomes the basis for securing sophisticated content recommendation metadata from content services provided to other media.
https://doi.org/10.17703/JCCT.2024.10.2.499 인용 PDF

A Collaborative Video Annotation and Browsing System using Linked Data (링크드 데이터를 이용한 협업적 비디오 어노테이션 및 브라우징 시스템)

Lee, Yeon-Ho;Oh, Kyeong-Jin;Sean, Vi-Sal;Jo, Geun-Sik
- Journal of Intelligence and Information Systems
- /
- v.17 no.3
- /
- pp.203-219
- /
- 2011
Previously common users just want to watch the video contents without any specific requirements or purposes. However, in today's life while watching video user attempts to know and discover more about things that appear on the video. Therefore, the requirements for finding multimedia or browsing information of objects that users want, are spreading with the increasing use of multimedia such as videos which are not only available on the internet-capable devices such as computers but also on smart TV and smart phone. In order to meet the users. requirements, labor-intensive annotation of objects in video contents is inevitable. For this reason, many researchers have actively studied about methods of annotating the object that appear on the video. In keyword-based annotation related information of the object that appeared on the video content is immediately added and annotation data including all related information about the object must be individually managed. Users will have to directly input all related information to the object. Consequently, when a user browses for information that related to the object, user can only find and get limited resources that solely exists in annotated data. Also, in order to place annotation for objects user's huge workload is required. To cope with reducing user's workload and to minimize the work involved in annotation, in existing object-based annotation automatic annotation is being attempted using computer vision techniques like object detection, recognition and tracking. By using such computer vision techniques a wide variety of objects that appears on the video content must be all detected and recognized. But until now it is still a problem facing some difficulties which have to deal with automated annotation. To overcome these difficulties, we propose a system which consists of two modules. The first module is the annotation module that enables many annotators to collaboratively annotate the objects in the video content in order to access the semantic data using Linked Data. Annotation data managed by annotation server is represented using ontology so that the information can easily be shared and extended. Since annotation data does not include all the relevant information of the object, existing objects in Linked Data and objects that appear in the video content simply connect with each other to get all the related information of the object. In other words, annotation data which contains only URI and metadata like position, time and size are stored on the annotation sever. So when user needs other related information about the object, all of that information is retrieved from Linked Data through its relevant URI. The second module enables viewers to browse interesting information about the object using annotation data which is collaboratively generated by many users while watching video. With this system, through simple user interaction the query is automatically generated and all the related information is retrieved from Linked Data and finally all the additional information of the object is offered to the user. With this study, in the future of Semantic Web environment our proposed system is expected to establish a better video content service environment by offering users relevant information about the objects that appear on the screen of any internet-capable devices such as PC, smart TV or smart phone.
https://doi.org/10.13088/jiis.2011.17.3.203 인용 PDF KSCI

Implementation of Reporting Tool Supporting OLAP and Data Mining Analysis Using XMLA (XMLA를 사용한 OLAP과 데이타 마이닝 분석이 가능한 리포팅 툴의 구현)

Choe, Jee-Woong;Kim, Myung-Ho
- Journal of KIISE:Computing Practices and Letters
- /
- v.15 no.3
- /
- pp.154-166
- /
- 2009
Database query and reporting tools, OLAP tools and data mining tools are typical front-end tools in Business Intelligence environment which is able to support gathering, consolidating and analyzing data produced from business operation activities and provide access to the result to enterprise's users. Traditional reporting tools have an advantage of creating sophisticated dynamic reports including SQL query result sets, which look like documents produced by word processors, and publishing the reports to the Web environment, but data source for the tools is limited to RDBMS. On the other hand, OLAP tools and data mining tools have an advantage of providing powerful information analysis functions on each own way, but built-in visualization components for analysis results are limited to tables or some charts. Thus, this paper presents a system that integrates three typical front-end tools to complement one another for BI environment. Traditional reporting tools only have a query editor for generating SQL statements to bring data from RDBMS. However, the reporting tool presented by this paper can extract data also from OLAP and data mining servers, because editors for OLAP and data mining query requests are added into this tool. Traditional systems produce all documents in the server side. This structure enables reporting tools to avoid repetitive process to generate documents, when many clients intend to access the same dynamic document. But, because this system targets that a few users generate documents for data analysis, this tool generates documents at the client side. Therefore, the tool has a processing mechanism to deal with a number of data despite the limited memory capacity of the report viewer in the client side. Also, this reporting tool has data structure for integrating data from three kinds of data sources into one document. Finally, most of traditional front-end tools for BI are dependent on data source architecture from specific vendor. To overcome the problem, this system uses XMLA that is a protocol based on web service to access to data sources for OLAP and data mining services from various vendors.
PDF KSCI

Automatic gasometer reading system using selective optical character recognition (관심 문자열 인식 기술을 이용한 가스계량기 자동 검침 시스템)

Lee, Kyohyuk;Kim, Taeyeon;Kim, Wooju
- Journal of Intelligence and Information Systems
- /
- v.26 no.2
- /
- pp.1-25
- /
- 2020
In this paper, we suggest an application system architecture which provides accurate, fast and efficient automatic gasometer reading function. The system captures gasometer image using mobile device camera, transmits the image to a cloud server on top of private LTE network, and analyzes the image to extract character information of device ID and gas usage amount by selective optical character recognition based on deep learning technology. In general, there are many types of character in an image and optical character recognition technology extracts all character information in an image. But some applications need to ignore non-of-interest types of character and only have to focus on some specific types of characters. For an example of the application, automatic gasometer reading system only need to extract device ID and gas usage amount character information from gasometer images to send bill to users. Non-of-interest character strings, such as device type, manufacturer, manufacturing date, specification and etc., are not valuable information to the application. Thus, the application have to analyze point of interest region and specific types of characters to extract valuable information only. We adopted CNN (Convolutional Neural Network) based object detection and CRNN (Convolutional Recurrent Neural Network) technology for selective optical character recognition which only analyze point of interest region for selective character information extraction. We build up 3 neural networks for the application system. The first is a convolutional neural network which detects point of interest region of gas usage amount and device ID information character strings, the second is another convolutional neural network which transforms spatial information of point of interest region to spatial sequential feature vectors, and the third is bi-directional long short term memory network which converts spatial sequential information to character strings using time-series analysis mapping from feature vectors to character strings. In this research, point of interest character strings are device ID and gas usage amount. Device ID consists of 12 arabic character strings and gas usage amount consists of 4 ~ 5 arabic character strings. All system components are implemented in Amazon Web Service Cloud with Intel Zeon E5-2686 v4 CPU and NVidia TESLA V100 GPU. The system architecture adopts master-lave processing structure for efficient and fast parallel processing coping with about 700,000 requests per day. Mobile device captures gasometer image and transmits to master process in AWS cloud. Master process runs on Intel Zeon CPU and pushes reading request from mobile device to an input queue with FIFO (First In First Out) structure. Slave process consists of 3 types of deep neural networks which conduct character recognition process and runs on NVidia GPU module. Slave process is always polling the input queue to get recognition request. If there are some requests from master process in the input queue, slave process converts the image in the input queue to device ID character string, gas usage amount character string and position information of the strings, returns the information to output queue, and switch to idle mode to poll the input queue. Master process gets final information form the output queue and delivers the information to the mobile device. We used total 27,120 gasometer images for training, validation and testing of 3 types of deep neural network. 22,985 images were used for training and validation, 4,135 images were used for testing. We randomly splitted 22,985 images with 8:2 ratio for training and validation respectively for each training epoch. 4,135 test image were categorized into 5 types (Normal, noise, reflex, scale and slant). Normal data is clean image data, noise means image with noise signal, relfex means image with light reflection in gasometer region, scale means images with small object size due to long-distance capturing and slant means images which is not horizontally flat. Final character string recognition accuracies for device ID and gas usage amount of normal data are 0.960 and 0.864 respectively.
https://doi.org/10.13088/jiis.2020.26.2.001 인용 PDF KSCI

Implementation Strategy of Global Framework for Climate Service through Global Initiatives in AgroMeteorology for Agriculture and Food Security Sector (선도적 농림기상 국제협력을 통한 농업과 식량안보분야 전지구기후 서비스체계 구축 전략)

Lee, Byong-Lyol;Rossi, Federica;Motha, Raymond;Stefanski, Robert
- Korean Journal of Agricultural and Forest Meteorology
- /
- v.15 no.2
- /
- pp.109-117
- /
- 2013
The Global Framework on Climate Services (GFCS) will guide the development of climate services that link science-based climate information and predictions with climate-risk management and adaptation to climate change. GFCS structure is made up of 5 pillars; Observations/Monitoring (OBS), Research/ Modeling/ Prediction (RES), Climate Services Information System (CSIS) and User Interface Platform (UIP) which are all supplemented with Capacity Development (CD). Corresponding to each GFCS pillar, the Commission for Agricultural Meteorology (CAgM) has been proposing "Global Initiatives in AgroMeteorology" (GIAM) in order to facilitate GFCS implementation scheme from the perspective of AgroMeteorology - Global AgroMeteorological Outlook System (GAMOS) for OBS, Global AgroMeteorological Pilot Projects (GAMPP) for RES, Global Federation of AgroMeteorological Society (GFAMS) for UIP/RES, WAMIS next phase for CSIS/UIP, and Global Centers of Research and Excellence in AgroMeteorology (GCREAM) for CD, through which next generation experts will be brought up as virtuous cycle for human resource procurements. The World AgroMeteorological Information Service (WAMIS) is a dedicated web server in which agrometeorological bulletins and advisories from members are placed. CAgM is about to extend its service into a Grid portal to share computer resources, information and human resources with user communities as a part of GFCS. To facilitate ICT resources sharing, a specialized or dedicated Data Center or Production Center (DCPC) of WMO Information System for WAMIS is under implementation by Korea Meteorological Administration. CAgM will provide land surface information to support LDAS (Land Data Assimilation System) of next generation Earth System as an information provider. The International Society for Agricultural Meteorology (INSAM) is an Internet market place for agrometeorologists. In an effort to strengthen INSAM as UIP for research community in AgroMeteorology, it was proposed by CAgM to establish Global Federation of AgroMeteorological Society (GFAMS). CAgM will try to encourage the next generation agrometeorological experts through Global Center of Excellence in Research and Education in AgroMeteorology (GCREAM) including graduate programmes under the framework of GENRI as a governing hub of Global Initiatives in AgroMeteorology (GIAM of CAgM). It would be coordinated under the framework of GENRI as a governing hub for all global initiatives such as GFAMS, GAMPP, GAPON including WAMIS II, primarily targeting on GFCS implementations.
https://doi.org/10.5532/KJAFM.2013.15.2.109 인용 PDF KSCI

A Study on Design of Agent based Nursing Records System in Attending System (에이전트기반 개방병원 간호기록시스템 설계에 관한 연구)

Kim, Kyoung-Hwan
- Journal of Intelligence and Information Systems
- /
- v.16 no.2
- /
- pp.73-94
- /
- 2010
The attending system is a medical system that allows doctors in clinics to use the extra equipment in hospitals-beds, laboratory, operating room, etc-for their patient's care under a contract between the doctors and hospitals. Therefore, the system is very beneficial in terms of the efficiency of the usage of medical resources. However, it is necessary to develop a strong support system to strengthen its weaknesses and supplement its merits. If doctors use hospital beds under the attending system of hospitals, they would be able to check a patient's condition often and provide them with nursing care services. However, the current attending system lacks delivery and assistance support. Thus, for the successful performance of the attending system, a networking system should be developed to facilitate communication between the doctors and nurses. In particular, the nursing records in the attending system could help doctors monitor the patient's condition and provision of nursing care services. A nursing record is the formal documentation associated with nursing care. It is merely a data repository that helps nurses to track their activities; nursing records thus represent a resource of primary information that can be reused. In order to maximize their usefulness, nursing records have been introduced as part of computerized patient records. However, nursing records are internal data that are not disclosed by hospitals. Moreover, the lack of standardization of the record list makes it difficult to share nursing records. Under the attending system, nurses would want to minimize the amount of effort they have to put in for the maintenance of additional records. Hence, they would try to maintain the current level of nursing records in the form of record lists and record attributes, while doctors would require more detailed and real-time information about their patients in order to monitor their condition. Therefore, this study developed a system for assisting in the maintenance and sharing of the nursing records under the attending system. In contrast to previous research on the functionality of computer-based nursing records, we have emphasized the practical usefulness of nursing records from the viewpoint of the actual implementation of the attending system. We suggested that nurses could design a nursing record dictionary for their convenience, and that doctors and nurses could confirm the definitions that they looked up in the dictionary through negotiations with intelligent agents. Such an agent-based system could facilitate networking among medical institutes. Multi-agent systems are a widely accepted paradigm for the distribution and sharing of computation workloads in the scientific community. Agent-based systems have been developed with differences in functional cooperation, coordination, and negotiation. To increase such communication, a framework for a multi-agent based system is proposed in this study. The agent-based approach is useful for developing a system that promotes trade-offs between transactions involving multiple attributes. A brief summary of our contributions follows. First, we propose an efficient and accurate utility representation and acquisition mechanism based on a preference scale while minimizing user interactions with the agent. Trade-offs between various transaction attributes can also be easily computed. Second, by providing a multi-attribute negotiation framework based on the attribute utility evaluation mechanism, we allow both the doctors in charge and nurses to negotiate over various transaction attributes in the nursing record lists that are defined by the latter. Third, we have designed the architecture of the nursing record management server and a system of agents that provides support to the doctors and nurses with regard to the framework and mechanisms proposed above. A formal protocol has also been developed to create and control the communication required for negotiations. We verified the realization of the system by developing a web-based prototype. The system was implemented using ASP and IIS5.1.
PDF KSCI

A Study on Usability of Open Source Software for Developing Records System : A Case of ICA AtoM (공개 소프트웨어를 이용한 기록시스템 구축가능성 연구 ICA AtoM을 중심으로)

Lee, Bo-Ram;Hwang, Jin-Hyun;Park, Min-Yung;Kim, Hyung-Hee;Choi, Dong-Woon;Choi, Yun-Jin;Yim, Jin-Hee
- The Korean Journal of Archival Studies
- /
- no.39
- /
- pp.193-228
- /
- 2014
In recent years, as well as management of public records, interest in the private archive of large and small is growing. Dedicated archive has various types. In addition, lack of personnel and budget, personnel records management professional because the absence, that help you maintain these records in a systematic manner is not easy. Request to the system have continued to rise, but the budget and professionals in order to solve this problem are missing. As breakthrough of the burden to the system with archive dedicated, it introduces the trends and meaning of public recording system, and was examined in detail AtoM function. AtoM is public land can be made by a method that requires a Web service, the database server. Without restrictions, including the advantage of being available free of charge, by the application or operating system specific, installation and operation is convenient. In addition, compatibility, and is highly scalable, AtoM use and convenient archive of private experiencing a shortage of personnel and budget. Because in terms of data management, and excellent interoperability and search share, and use, it is possible in the future, it favors also documentary use through a network of inter-agency archives and private. In addition, Enhancements exhibition services through cooperation with Omeka, long-term storage through Archivematica, many discussion is needed. Public centered around the private area of the recording management spilling expanded, open-source software allows to balance the recording system will be able to play an important role. In addition, the efforts of academia and in the field, close collaboration between the open source recording system through a user study should be continued. Furthermore, co-operation and sharing of private archives expect come true.
https://doi.org/10.20923/kjas.2014.39.193 인용 PDF

A Study on the Meaning and Strategy of Keyword Advertising Marketing

Park, Nam Goo
- Journal of Distribution Science
- /
- v.8 no.3
- /
- pp.49-56
- /
- 2010
At the initial stage of Internet advertising, banner advertising came into fashion. As the Internet developed into a central part of daily lives and the competition in the on-line advertising market was getting fierce, there was not enough space for banner advertising, which rushed to portal sites only. All these factors was responsible for an upsurge in advertising prices. Consequently, the high-cost and low-efficiency problems with banner advertising were raised, which led to an emergence of keyword advertising as a new type of Internet advertising to replace its predecessor. In the beginning of 2000s, when Internet advertising came to be activated, display advertisement including banner advertising dominated the Net. However, display advertising showed signs of gradual decline, and registered minus growth in the year 2009, whereas keyword advertising showed rapid growth and started to outdo display advertising as of the year 2005. Keyword advertising refers to the advertising technique that exposes relevant advertisements on the top of research sites when one searches for a keyword. Instead of exposing advertisements to unspecified individuals like banner advertising, keyword advertising, or targeted advertising technique, shows advertisements only when customers search for a desired keyword so that only highly prospective customers are given a chance to see them. In this context, it is also referred to as search advertising. It is regarded as more aggressive advertising with a high hit rate than previous advertising in that, instead of the seller discovering customers and running an advertisement for them like TV, radios or banner advertising, it exposes advertisements to visiting customers. Keyword advertising makes it possible for a company to seek publicity on line simply by making use of a single word and to achieve a maximum of efficiency at a minimum cost. The strong point of keyword advertising is that customers are allowed to directly contact the products in question through its more efficient advertising when compared to the advertisements of mass media such as TV and radio, etc. The weak point of keyword advertising is that a company should have its advertisement registered on each and every portal site and finds it hard to exercise substantial supervision over its advertisement, there being a possibility of its advertising expenses exceeding its profits. Keyword advertising severs as the most appropriate methods of advertising for the sales and publicity of small and medium enterprises which are in need of a maximum of advertising effect at a low advertising cost. At present, keyword advertising is divided into CPC advertising and CPM advertising. The former is known as the most efficient technique, which is also referred to as advertising based on the meter rate system; A company is supposed to pay for the number of clicks on a searched keyword which users have searched. This is representatively adopted by Overture, Google's Adwords, Naver's Clickchoice, and Daum's Clicks, etc. CPM advertising is dependent upon the flat rate payment system, making a company pay for its advertisement on the basis of the number of exposure, not on the basis of the number of clicks. This method fixes a price for advertisement on the basis of 1,000-time exposure, and is mainly adopted by Naver's Timechoice, Daum's Speciallink, and Nate's Speedup, etc, At present, the CPC method is most frequently adopted. The weak point of the CPC method is that advertising cost can rise through constant clicks from the same IP. If a company makes good use of strategies for maximizing the strong points of keyword advertising and complementing its weak points, it is highly likely to turn its visitors into prospective customers. Accordingly, an advertiser should make an analysis of customers' behavior and approach them in a variety of ways, trying hard to find out what they want. With this in mind, her or she has to put multiple keywords into use when running for ads. When he or she first runs an ad, he or she should first give priority to which keyword to select. The advertiser should consider how many individuals using a search engine will click the keyword in question and how much money he or she has to pay for the advertisement. As the popular keywords that the users of search engines are frequently using are expensive in terms of a unit cost per click, the advertisers without much money for advertising at the initial phrase should pay attention to detailed keywords suitable to their budget. Detailed keywords are also referred to as peripheral keywords or extension keywords, which can be called a combination of major keywords. Most keywords are in the form of texts. The biggest strong point of text-based advertising is that it looks like search results, causing little antipathy to it. But it fails to attract much attention because of the fact that most keyword advertising is in the form of texts. Image-embedded advertising is easy to notice due to images, but it is exposed on the lower part of a web page and regarded as an advertisement, which leads to a low click through rate. However, its strong point is that its prices are lower than those of text-based advertising. If a company owns a logo or a product that is easy enough for people to recognize, the company is well advised to make good use of image-embedded advertising so as to attract Internet users' attention. Advertisers should make an analysis of their logos and examine customers' responses based on the events of sites in question and the composition of products as a vehicle for monitoring their behavior in detail. Besides, keyword advertising allows them to analyze the advertising effects of exposed keywords through the analysis of logos. The logo analysis refers to a close analysis of the current situation of a site by making an analysis of information about visitors on the basis of the analysis of the number of visitors and page view, and that of cookie values. It is in the log files generated through each Web server that a user's IP, used pages, the time when he or she uses it, and cookie values are stored. The log files contain a huge amount of data. As it is almost impossible to make a direct analysis of these log files, one is supposed to make an analysis of them by using solutions for a log analysis. The generic information that can be extracted from tools for each logo analysis includes the number of viewing the total pages, the number of average page view per day, the number of basic page view, the number of page view per visit, the total number of hits, the number of average hits per day, the number of hits per visit, the number of visits, the number of average visits per day, the net number of visitors, average visitors per day, one-time visitors, visitors who have come more than twice, and average using hours, etc. These sites are deemed to be useful for utilizing data for the analysis of the situation and current status of rival companies as well as benchmarking. As keyword advertising exposes advertisements exclusively on search-result pages, competition among advertisers attempting to preoccupy popular keywords is very fierce. Some portal sites keep on giving priority to the existing advertisers, whereas others provide chances to purchase keywords in question to all the advertisers after the advertising contract is over. If an advertiser tries to rely on keywords sensitive to seasons and timeliness in case of sites providing priority to the established advertisers, he or she may as well make a purchase of a vacant place for advertising lest he or she should miss appropriate timing for advertising. However, Naver doesn't provide priority to the existing advertisers as far as all the keyword advertisements are concerned. In this case, one can preoccupy keywords if he or she enters into a contract after confirming the contract period for advertising. This study is designed to take a look at marketing for keyword advertising and to present effective strategies for keyword advertising marketing. At present, the Korean CPC advertising market is virtually monopolized by Overture. Its strong points are that Overture is based on the CPC charging model and that advertisements are registered on the top of the most representative portal sites in Korea. These advantages serve as the most appropriate medium for small and medium enterprises to use. However, the CPC method of Overture has its weak points, too. That is, the CPC method is not the only perfect advertising model among the search advertisements in the on-line market. So it is absolutely necessary that small and medium enterprises including independent shopping malls should complement the weaknesses of the CPC method and make good use of strategies for maximizing its strengths so as to increase their sales and to create a point of contact with customers.
PDF

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
- Journal of Internet Computing and Services
- /
- v.14 no.6
- /
- pp.71-84
- /
- 2013
Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.
https://doi.org/10.7472/jksii.2013.14.6.71 인용 PDF KSCI

Search Result 1,879, Processing Time 0.033 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)