• Title/Summary/Keyword: Web based data system

Search Result 1,908, Processing Time 0.027 seconds

Analysis of shopping website visit types and shopping pattern (쇼핑 웹사이트 탐색 유형과 방문 패턴 분석)

  • Choi, Kyungbin;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.85-107
    • /
    • 2019
  • Online consumers browse products belonging to a particular product line or brand for purchase, or simply leave a wide range of navigation without making purchase. The research on the behavior and purchase of online consumers has been steadily progressed, and related services and applications based on behavior data of consumers have been developed in practice. In recent years, customization strategies and recommendation systems of consumers have been utilized due to the development of big data technology, and attempts are being made to optimize users' shopping experience. However, even in such an attempt, it is very unlikely that online consumers will actually be able to visit the website and switch to the purchase stage. This is because online consumers do not just visit the website to purchase products but use and browse the websites differently according to their shopping motives and purposes. Therefore, it is important to analyze various types of visits as well as visits to purchase, which is important for understanding the behaviors of online consumers. In this study, we explored the clustering analysis of session based on click stream data of e-commerce company in order to explain diversity and complexity of search behavior of online consumers and typified search behavior. For the analysis, we converted data points of more than 8 million pages units into visit units' sessions, resulting in a total of over 500,000 website visit sessions. For each visit session, 12 characteristics such as page view, duration, search diversity, and page type concentration were extracted for clustering analysis. Considering the size of the data set, we performed the analysis using the Mini-Batch K-means algorithm, which has advantages in terms of learning speed and efficiency while maintaining the clustering performance similar to that of the clustering algorithm K-means. The most optimized number of clusters was derived from four, and the differences in session unit characteristics and purchasing rates were identified for each cluster. The online consumer visits the website several times and learns about the product and decides the purchase. In order to analyze the purchasing process over several visits of the online consumer, we constructed the visiting sequence data of the consumer based on the navigation patterns in the web site derived clustering analysis. The visit sequence data includes a series of visiting sequences until one purchase is made, and the items constituting one sequence become cluster labels derived from the foregoing. We have separately established a sequence data for consumers who have made purchases and data on visits for consumers who have only explored products without making purchases during the same period of time. And then sequential pattern mining was applied to extract frequent patterns from each sequence data. The minimum support is set to 10%, and frequent patterns consist of a sequence of cluster labels. While there are common derived patterns in both sequence data, there are also frequent patterns derived only from one side of sequence data. We found that the consumers who made purchases through the comparative analysis of the extracted frequent patterns showed the visiting pattern to decide to purchase the product repeatedly while searching for the specific product. The implication of this study is that we analyze the search type of online consumers by using large - scale click stream data and analyze the patterns of them to explain the behavior of purchasing process with data-driven point. Most studies that typology of online consumers have focused on the characteristics of the type and what factors are key in distinguishing that type. In this study, we carried out an analysis to type the behavior of online consumers, and further analyzed what order the types could be organized into one another and become a series of search patterns. In addition, online retailers will be able to try to improve their purchasing conversion through marketing strategies and recommendations for various types of visit and will be able to evaluate the effect of the strategy through changes in consumers' visit patterns.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

Advanced Optimization of Reliability Based on Cost Factor and Deploying On-Line Safety Instrumented System Supporting Tool (비용 요소에 근거한 신뢰도 최적화 및 On-Line SIS 지원 도구 연구)

  • Lulu, Addis;Park, Myeongnam;Kim, Hyunseung;Shin, Dongil
    • Journal of the Korean Institute of Gas
    • /
    • v.21 no.2
    • /
    • pp.32-40
    • /
    • 2017
  • Safety Instrumented Systems (SIS) have wide application area. They are of vital importance at process plants to detect the onset of hazardous events, for instance, a release of some hazardous material, and for mitigating their consequences to humans, material assets, and the environment. The integrated safety systems, where electrical, electronic, and/or programmable electronic (E/E/PE) devices interact with mechanical, pneumatic, and hydraulic systems are governed by international safety standards like IEC 61508. IEC 61508 organises its requirements according to a Safety Life Cycle (SLC). Fulfilling these requirements following the SLC can be complex without the aid of SIS supporting tools. This paper presents simple SIS support tool which can greatly help the user to implement the design phase of the safety lifecycle. This tool is modelled in the form of Android application which can be integrated with a Web-based data reading and modifying system. This tool can reduce the computation time spent on the design phase of the SLC and reduce the possible errors which can arise in the process. In addition, this paper presents an optimization approach to SISs based on cost measures. The multi-objective genetic algorithm has been used for the optimization to search for the best combinations of solutions without enumeration of all the solution space.

A Study on the Collection and Application Measures for Media Platform Based Materials (매체 플랫폼 기반 자료의 수집 및 적용 방안 연구)

  • Younghee Noh;Youngmi Jung;Aekyoung Son;Inho Chang;Hyunju Cha
    • Journal of Korean Library and Information Science Society
    • /
    • v.55 no.1
    • /
    • pp.193-214
    • /
    • 2024
  • This study aimed to propose a method for collecting and applying media platform based materials at the National Library of Korea. Firstly, we analyzed the current status and limitations of data collection based on domestic media platforms, including the National Library of Korea. Secondly, a literature review method was used to investigate the current status and types of digital content based on media platforms. Thirdly, we identified the types of materials based on media platforms that are not currently included in the National Central Library's online material collection guidelines through the examination of cases from major overseas libraries. Fourthly, after reviewing technical and legal elements such as the definition of collection targets and scope for each new media, and collection methods, we established collection criteria. Fifthly, based on the research results, the policies proposed in this study are as follows: 1) there is a need to establish a clear legal basis for the collection of media platform based materials; 2) the development and presentation of collection guidelines for media platform based materials is necessary; 3) the development of collection tools and infrastructure for media platform based materials is required; 4) for the collection of media platform based materials, it is necessary to obtain permission for collection from targeted social media organizations, and to cooperate in linkage with organizations that produce and service extended reality content; 5) for the service activation of media platform based materials, it is necessary to improve accessibility for the usage activation of these materials, to enhance the content extensibility and ease of use of the e-deposit system including extended reality content, and to advance and construct spaces for reproducing extended reality content.

Prediction Model for Gas-Energy Consumption using Ontology-based Breakdown Structure of Multi-Family Housing Complex (온톨로지 기반 공동주택 분류체계를 활용한 가스에너지 사용량 예측 모델)

  • Hong, Tae-Hoon;Park, Sung-Ki;Koo, Choong-Wan;Kim, Hyun-Joong;Kim, Chun-Hag
    • Korean Journal of Construction Engineering and Management
    • /
    • v.12 no.6
    • /
    • pp.110-119
    • /
    • 2011
  • Global warming caused by excessive greenhouse gas emission is causing climate change all over the world. In Korea, greenhouse gas emission from residential buildings accounts for about 10% of gross domestic emission. Also, the number of deteriorated multi-family housing complexes is increasing. Therefore, the goal of this research is to establish the bases to manage energy consumption continuously and methodically during MR&R period of multi-family housings. The research process and methodologies are as follows. First, research team collected the data on project characteristics and energy consumption of multi-family housing complexes in Seoul. Second, an ontology-based breakdown structure was established with some primary characteristics affecting the energy consumption, which were selected by statistical analysis. Finally, a predictive model of energy consumption was developed based on the ontology-based breakdown structure, with application of CBR, ANN, MRA and GA. In this research, PASW (Predictive Analytics SoftWare) Statistics 18, Microsoft EXCEL, Protege 4.1 were utilized for data analysis and prediction. In future research, the model will be more continuous and methodical by developing the web-base system. And it has facility manager of government or local government, or multi-family housing complex make a decision with definite references regarding moderate energy consumption.

Carbon Footprint and Mitigation of Vegetables Produced at Open Fields and Film House using Life Cycle Assessment

  • Lee, Deog Bae;Jung, Sun Chul;So, Kyu Ho;Kim, Gun Yeob;Jeong, Hyun Cheol;Sonn, Yeon Gyu
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.47 no.6
    • /
    • pp.457-463
    • /
    • 2014
  • This study was carried out to find out major factors to mitigate carbon emission using Life Cycle Assessment (LCA). System boundary of LCA was confined from sowing to packaging during vegetable production. Input amount of agri-materials was calculated on 2007 Income reference of white radish, chinese cabbage and chive produced at open field and film house published by Rural Development Administration. Domestic data and Ecoinvent data were used for emission factors of each agri-material based on the 1996 IPCC guideline. Carbon footprint of white radish was 0.19 kg $CO_2kg^{-1}$ at open fields, 0.133 kg $CO_2kg^{-1}$ at film house, that of chinese cabbage was 0.22 kg $CO_2kg^{-1}$ at open fields, 0.19 kg $CO_2kg^{-1}$ at film house, and that of chive was 0.66 kg $CO_2kg^{-1}$ at open fields and 1.04 kg $CO_2kg^{-1}$ at film house. The high carbon footprint of chive was related to lower vegetable production and higher fuel usage as compared to white radish and Chinese cabbage. The mean proportion of carbon emission was 35.7% during the manufacturing byproduct fertilizer; white radish at open fields was 50.6%, white radish at film house 13.1%, Chinese cabbage at outdoor 38.4%, Chinese cabbage at film house 34.0%, chive at outdoor 50.6%, and chive at film house 36.0%. Carbon emission, on average, for the step of manufacturing and combustion accounted for 16.1% of the total emission; white radish at open fields was 4.3%, white radish at film house 15.6%, Chinese cabbage at open fields 6.9%, Chinese cabbage at film house 19.0%, chive at open fields 12.5%, and chive at film house 29.1%. On the while, mean proportion of carbon footprint for the step of $N_2O$ emission was 29.2%; white radish at open fields was 39.2%, white radish at film house 41.9%, Chinese cabbage at open fields 34.4%, Chinese cabbage at film house 23.1%, chive at open fields 28.8%, and chive at film house 17.1%. Fertilizer was the primary factor and fuel was the secondary factor for carbon emission among the vegetables of this study. It was suggested to use Heug-To-Ram web-service system, http://soil.rda.go.kr, for the scientific fertilization based on soil testing, and for increase of energy efficiency to produce low carbon vegetable.

Investigation of Topic Trends in Computer and Information Science by Text Mining Techniques: From the Perspective of Conferences in DBLP (텍스트 마이닝 기법을 이용한 컴퓨터공학 및 정보학 분야 연구동향 조사: DBLP의 학술회의 데이터를 중심으로)

  • Kim, Su Yeon;Song, Sung Jeon;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.32 no.1
    • /
    • pp.135-152
    • /
    • 2015
  • The goal of this paper is to explore the field of Computer and Information Science with the aid of text mining techniques by mining Computer and Information Science related conference data available in DBLP (Digital Bibliography & Library Project). Although studies based on bibliometric analysis are most prevalent in investigating dynamics of a research field, we attempt to understand dynamics of the field by utilizing Latent Dirichlet Allocation (LDA)-based multinomial topic modeling. For this study, we collect 236,170 documents from 353 conferences related to Computer and Information Science in DBLP. We aim to include conferences in the field of Computer and Information Science as broad as possible. We analyze topic modeling results along with datasets collected over the period of 2000 to 2011 including top authors per topic and top conferences per topic. We identify the following four different patterns in topic trends in the field of computer and information science during this period: growing (network related topics), shrinking (AI and data mining related topics), continuing (web, text mining information retrieval and database related topics), and fluctuating pattern (HCI, information system and multimedia system related topics).

Optimal Release Problems based on a Stochastic Differential Equation Model Under the Distributed Software Development Environments (분산 소프트웨어 개발환경에 대한 확률 미분 방정식 모델을 이용한 최적 배포 문제)

  • Lee Jae-Ki;Nam Sang-Sik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.7A
    • /
    • pp.649-658
    • /
    • 2006
  • Recently, Software Development was applied to new-approach methods as a various form : client-server system and web-programing, object-orient concept, distributed development with a network environments. On the other hand, it be concerned about the distributed development technology and increasing of object-oriented methodology. These technology is spread out the software quality and improve of software production, reduction of the software develop working. Futures, we considered about the distributed software development technique with a many workstation. In this paper, we discussed optimal release problem based on a stochastic differential equation model for the distributed Software development environments. In the past, the software reliability applied to quality a rough guess with a software development process and approach by the estimation of reliability for a test progress. But, in this paper, we decided to optimal release times two method: first, SRGM with an error counting model in fault detection phase by NHPP. Second, fault detection is change of continuous random variable by SDE(stochastic differential equation). Here, we decide to optimal release time as a minimum cost form the detected failure data and debugging fault data during the system test phase and operational phase. Especially, we discussed to limitation of reliability considering of total software cost probability distribution.

Real Estate Asset NFT Tokenization and FT Asset Portfolio Management (부동산 유동화 NFT와 FT 분할 거래 시스템 설계 및 구현)

  • Young-Gun Kim;Seong-Whan Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.9
    • /
    • pp.419-430
    • /
    • 2023
  • Currently, NFTs have no dominant application except for the proof of ownership for digital content, and it also have small liquidity problem, which makes their price difficult to predict. Real estate usually has very high barriers to investment due to its high pricing. Real estate can be converted into NFTs and also divided into small value fungible tokens (FTs), and it can increase the the volume of the investor community due to more liquidity and better accessibility. In this document, we implement and design a system that allows ordinary users can invest on high priced real estate utilizing Black Litterman (BL) model-based Portfolio investment interface. To this end, we target a set of real estates pegged as collateral and issue NFT for the collateral using blockchain. We use oracle to get the current real estate information and to monitor varying real estate prices. After tokenizing real estate into NFTs, we divide the NFTs into easily accessible price FTs, thereby, we can lower prices and provide large liquidity with price volatility limited. In addition, we also implemented BL based asset portfolio interface for effective portfolio composition for investing in multiple of real estates with small investments. Using BL model, investors can fix the asset portfolio. We implemented the whole system using Solidity smart contracts on Flask web framework with public data portals as oracle interfaces.

Automatic gasometer reading system using selective optical character recognition (관심 문자열 인식 기술을 이용한 가스계량기 자동 검침 시스템)

  • Lee, Kyohyuk;Kim, Taeyeon;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.1-25
    • /
    • 2020
  • In this paper, we suggest an application system architecture which provides accurate, fast and efficient automatic gasometer reading function. The system captures gasometer image using mobile device camera, transmits the image to a cloud server on top of private LTE network, and analyzes the image to extract character information of device ID and gas usage amount by selective optical character recognition based on deep learning technology. In general, there are many types of character in an image and optical character recognition technology extracts all character information in an image. But some applications need to ignore non-of-interest types of character and only have to focus on some specific types of characters. For an example of the application, automatic gasometer reading system only need to extract device ID and gas usage amount character information from gasometer images to send bill to users. Non-of-interest character strings, such as device type, manufacturer, manufacturing date, specification and etc., are not valuable information to the application. Thus, the application have to analyze point of interest region and specific types of characters to extract valuable information only. We adopted CNN (Convolutional Neural Network) based object detection and CRNN (Convolutional Recurrent Neural Network) technology for selective optical character recognition which only analyze point of interest region for selective character information extraction. We build up 3 neural networks for the application system. The first is a convolutional neural network which detects point of interest region of gas usage amount and device ID information character strings, the second is another convolutional neural network which transforms spatial information of point of interest region to spatial sequential feature vectors, and the third is bi-directional long short term memory network which converts spatial sequential information to character strings using time-series analysis mapping from feature vectors to character strings. In this research, point of interest character strings are device ID and gas usage amount. Device ID consists of 12 arabic character strings and gas usage amount consists of 4 ~ 5 arabic character strings. All system components are implemented in Amazon Web Service Cloud with Intel Zeon E5-2686 v4 CPU and NVidia TESLA V100 GPU. The system architecture adopts master-lave processing structure for efficient and fast parallel processing coping with about 700,000 requests per day. Mobile device captures gasometer image and transmits to master process in AWS cloud. Master process runs on Intel Zeon CPU and pushes reading request from mobile device to an input queue with FIFO (First In First Out) structure. Slave process consists of 3 types of deep neural networks which conduct character recognition process and runs on NVidia GPU module. Slave process is always polling the input queue to get recognition request. If there are some requests from master process in the input queue, slave process converts the image in the input queue to device ID character string, gas usage amount character string and position information of the strings, returns the information to output queue, and switch to idle mode to poll the input queue. Master process gets final information form the output queue and delivers the information to the mobile device. We used total 27,120 gasometer images for training, validation and testing of 3 types of deep neural network. 22,985 images were used for training and validation, 4,135 images were used for testing. We randomly splitted 22,985 images with 8:2 ratio for training and validation respectively for each training epoch. 4,135 test image were categorized into 5 types (Normal, noise, reflex, scale and slant). Normal data is clean image data, noise means image with noise signal, relfex means image with light reflection in gasometer region, scale means images with small object size due to long-distance capturing and slant means images which is not horizontally flat. Final character string recognition accuracies for device ID and gas usage amount of normal data are 0.960 and 0.864 respectively.