• Title/Summary/Keyword: Business size

Search Result 2,310, Processing Time 0.031 seconds

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.

The Pattern Analysis of Financial Distress for Non-audited Firms using Data Mining (데이터마이닝 기법을 활용한 비외감기업의 부실화 유형 분석)

  • Lee, Su Hyun;Park, Jung Min;Lee, Hyoung Yong
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.111-131
    • /
    • 2015
  • There are only a handful number of research conducted on pattern analysis of corporate distress as compared with research for bankruptcy prediction. The few that exists mainly focus on audited firms because financial data collection is easier for these firms. But in reality, corporate financial distress is a far more common and critical phenomenon for non-audited firms which are mainly comprised of small and medium sized firms. The purpose of this paper is to classify non-audited firms under distress according to their financial ratio using data mining; Self-Organizing Map (SOM). SOM is a type of artificial neural network that is trained using unsupervised learning to produce a lower dimensional discretized representation of the input space of the training samples, called a map. SOM is different from other artificial neural networks as it applies competitive learning as opposed to error-correction learning such as backpropagation with gradient descent, and in the sense that it uses a neighborhood function to preserve the topological properties of the input space. It is one of the popular and successful clustering algorithm. In this study, we classify types of financial distress firms, specially, non-audited firms. In the empirical test, we collect 10 financial ratios of 100 non-audited firms under distress in 2004 for the previous two years (2002 and 2003). Using these financial ratios and the SOM algorithm, five distinct patterns were distinguished. In pattern 1, financial distress was very serious in almost all financial ratios. 12% of the firms are included in these patterns. In pattern 2, financial distress was weak in almost financial ratios. 14% of the firms are included in pattern 2. In pattern 3, growth ratio was the worst among all patterns. It is speculated that the firms of this pattern may be under distress due to severe competition in their industries. Approximately 30% of the firms fell into this group. In pattern 4, the growth ratio was higher than any other pattern but the cash ratio and profitability ratio were not at the level of the growth ratio. It is concluded that the firms of this pattern were under distress in pursuit of expanding their business. About 25% of the firms were in this pattern. Last, pattern 5 encompassed very solvent firms. Perhaps firms of this pattern were distressed due to a bad short-term strategic decision or due to problems with the enterpriser of the firms. Approximately 18% of the firms were under this pattern. This study has the academic and empirical contribution. In the perspectives of the academic contribution, non-audited companies that tend to be easily bankrupt and have the unstructured or easily manipulated financial data are classified by the data mining technology (Self-Organizing Map) rather than big sized audited firms that have the well prepared and reliable financial data. In the perspectives of the empirical one, even though the financial data of the non-audited firms are conducted to analyze, it is useful for find out the first order symptom of financial distress, which makes us to forecast the prediction of bankruptcy of the firms and to manage the early warning and alert signal. These are the academic and empirical contribution of this study. The limitation of this research is to analyze only 100 corporates due to the difficulty of collecting the financial data of the non-audited firms, which make us to be hard to proceed to the analysis by the category or size difference. Also, non-financial qualitative data is crucial for the analysis of bankruptcy. Thus, the non-financial qualitative factor is taken into account for the next study. This study sheds some light on the non-audited small and medium sized firms' distress prediction in the future.

Open Skies Policy : A Study on the Alliance Performance and International Competition of FFP (항공자유화정책상 상용고객우대제도의 제휴성과와 국제경쟁에 관한 연구)

  • Suh, Myung-Sun;Cho, Ju-Eun
    • The Korean Journal of Air & Space Law and Policy
    • /
    • v.25 no.2
    • /
    • pp.139-162
    • /
    • 2010
  • In terms of the international air transport, the open skies policy implies freedom in the sky or opening the sky. In the normative respect, the open skies policy is a kind of open-door policy which gives various forms of traffic right to other countries, but on the other hand it is a policy of free competition in the international air transport. Since the Airline Deregulation Act of 1978, the United States has signed an open skies agreement with many countries, starting with the Netherlands, so that competitive large airlines can compete in the international air transport market where there exist a lot of business opportunities. South Korea now has an open skies agreement with more than 20 countries. The frequent flyer program (FFP) is part of a broad-based marketing alliance which has been used as an airfare strategy since the U.S. government's airline deregulation. The membership-based program is an incentive plan that provides mileage points to customers for using airline services and rewards customer loyalty in tangible forms based on their accumulated points. In its early stages, the frequent flyer program was focused on marketing efforts to attract customers, but now in the environment of intense competition among airlines, the program is used as an important strategic marketing tool for enhancing business performance. Therefore, airline companies agree that they need to identify customer needs in order to secure loyal customers more effectively. The outcomes from an airline's frequent flyer program can have a variety of effects on international competition. First, the airline can obtain a more dominant position in the air flight market by expanding its air route networks. Second, the availability of flight products for customers can be improved with an increase in flight frequency. Third, the airline can preferentially expand into new markets and thus gain advantages over its competitors. However, there are few empirical studies on the airline frequent flyer program. Accordingly, this study aims to explore the effects of the program on international competition, after reviewing the types of strategic alliance between airlines. Making strategic airline alliances is a worldwide trend resulting from the open skies policy. South Korea also needs to be making open skies agreements more realistic to promote the growth and competition of domestic airlines. The present study is about the performance of the airline frequent flyer program and international competition under the open skies policy. With a sample of five global alliance groups (Star, Oneworld, Wings, Qualiflyer and Skyteam), the study was attempted as an empirical study of the effects that the resource structures and levels of information technology held by airlines in each group have on the type of alliance, and one-way analysis of variance and regression analysis were used to test hypotheses. The findings of this study suggest that both large airline companies and small/medium-size airlines in an alliance group with global networks and organizations are able to achieve high performance and secure international competitiveness. Airline passengers earn mileage points by using non-flight services through an alliance network with hotels, car-rental services, duty-free shops, travel agents and more and show high interests in and preferences for related service benefits. Therefore, Korean airline companies should develop more aggressive marketing programs based on multilateral alliances with other services including hotels, as well as with other airlines.

  • PDF

Automatic Quality Evaluation with Completeness and Succinctness for Text Summarization (완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법)

  • Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.125-148
    • /
    • 2018
  • Recently, as the demand for big data analysis increases, cases of analyzing unstructured data and using the results are also increasing. Among the various types of unstructured data, text is used as a means of communicating information in almost all fields. In addition, many analysts are interested in the amount of data is very large and relatively easy to collect compared to other unstructured and structured data. Among the various text analysis applications, document classification which classifies documents into predetermined categories, topic modeling which extracts major topics from a large number of documents, sentimental analysis or opinion mining that identifies emotions or opinions contained in texts, and Text Summarization which summarize the main contents from one document or several documents have been actively studied. Especially, the text summarization technique is actively applied in the business through the news summary service, the privacy policy summary service, ect. In addition, much research has been done in academia in accordance with the extraction approach which provides the main elements of the document selectively and the abstraction approach which extracts the elements of the document and composes new sentences by combining them. However, the technique of evaluating the quality of automatically summarized documents has not made much progress compared to the technique of automatic text summarization. Most of existing studies dealing with the quality evaluation of summarization were carried out manual summarization of document, using them as reference documents, and measuring the similarity between the automatic summary and reference document. Specifically, automatic summarization is performed through various techniques from full text, and comparison with reference document, which is an ideal summary document, is performed for measuring the quality of automatic summarization. Reference documents are provided in two major ways, the most common way is manual summarization, in which a person creates an ideal summary by hand. Since this method requires human intervention in the process of preparing the summary, it takes a lot of time and cost to write the summary, and there is a limitation that the evaluation result may be different depending on the subject of the summarizer. Therefore, in order to overcome these limitations, attempts have been made to measure the quality of summary documents without human intervention. On the other hand, as a representative attempt to overcome these limitations, a method has been recently devised to reduce the size of the full text and to measure the similarity of the reduced full text and the automatic summary. In this method, the more frequent term in the full text appears in the summary, the better the quality of the summary. However, since summarization essentially means minimizing a lot of content while minimizing content omissions, it is unreasonable to say that a "good summary" based on only frequency always means a "good summary" in its essential meaning. In order to overcome the limitations of this previous study of summarization evaluation, this study proposes an automatic quality evaluation for text summarization method based on the essential meaning of summarization. Specifically, the concept of succinctness is defined as an element indicating how few duplicated contents among the sentences of the summary, and completeness is defined as an element that indicating how few of the contents are not included in the summary. In this paper, we propose a method for automatic quality evaluation of text summarization based on the concepts of succinctness and completeness. In order to evaluate the practical applicability of the proposed methodology, 29,671 sentences were extracted from TripAdvisor 's hotel reviews, summarized the reviews by each hotel and presented the results of the experiments conducted on evaluation of the quality of summaries in accordance to the proposed methodology. It also provides a way to integrate the completeness and succinctness in the trade-off relationship into the F-Score, and propose a method to perform the optimal summarization by changing the threshold of the sentence similarity.

Mediating Roles of Attachment for Information Sharing in Social Media: Social Capital Theory Perspective (소셜 미디어에서 정보공유를 위한 애착의 매개역할: 사회적 자본이론 관점)

  • Chung, Namho;Han, Hee Jeong;Koo, Chulmo
    • Asia pacific journal of information systems
    • /
    • v.22 no.4
    • /
    • pp.101-123
    • /
    • 2012
  • Currently, Social Media, it has widely a renown keyword and its related social trends and businesses have been fastly applied into various contexts. Social media has become an important research area for scholars interested in online technologies and cyber space and their social impacts. Social media is not only including web-based services but also mobile-based application services that allow people to share various style information and knowledge through online connection. Social media users have tendency to common identity- and bond-attachment through interactions such as 'thumbs up', 'reply note', 'forwarding', which may have driven from various factors and may result in delivering information, sharing knowledge, and specific experiences et al. Even further, almost of all social media sites provide and connect unknown strangers depending on shared interests, political views, or enjoyable activities, and other stuffs incorporating the creation of contents, which provides benefits to users. As fast developing digital devices including smartphone, tablet PC, internet based blogging, and photo and video clips, scholars desperately have began to study regarding diverse issues connecting human beings' motivations and the behavioral results which may be articulated by the format of antecedents as well as consequences related to contents that people create via social media. Social media such as Facebook, Twitter, or Cyworld users are more and more getting close each other and build up their relationships by a different style. In this sense, people use social media as tools for maintain pre-existing network, creating new people socially, and at the same time, explicitly find some business opportunities using personal and unlimited public networks. In terms of theory in explaining this phenomenon, social capital is a concept that describes the benefits one receives from one's relationship with others. Thereby, social media use is closely related to the form and connected of people, which is a bridge that can be able to achieve informational benefits of a heterogeneous network of people and common identity- and bonding-attachment which emphasizes emotional benefits from community members or friend group. Social capital would be resources accumulated through the relationships among people, which can be considered as an investment in social relations with expected returns and may achieve benefits from the greater access to and use of resources embedded in social networks. Social media using for their social capital has vastly been adopted in a cyber world, however, there has been little explaining the phenomenon theoretically how people may take advantages or opportunities through interaction among people, why people may interactively give willingness to help or their answers. The individual consciously express themselves in an online space, so called, common identity- or bonding-attachments. Common-identity attachment is the focus of the weak ties, which are loose connections between individuals who may provide useful information or new perspectives for one another but typically not emotional support, whereas common-bonding attachment is explained that between individuals in tightly-knit, emotionally close relationship such as family and close friends. The common identify- and bonding-attachment are mainly studying on-offline setting, which individual convey an impression to others that are expressed to own interest to others. Thus, individuals expect to meet other people and are trying to behave self-presentation engaging in opposite partners accordingly. As developing social media, individuals are motivated to disclose self-disclosures of open and honest using diverse cues such as verbal and nonverbal and pictorial and video files to their friends as well as passing strangers. Social media context, common identity- and bond-attachment for self-presentation seems different compared with face-to-face context. In the realm of social media, social users look for self-impression by posting text messages, pictures, video files. Under the digital environments, people interact to work, shop, learn, entertain, and be played. Social media provides increasingly the kinds of intention and behavior in online. Typically, identity and bond social capital through self-presentation is the intentional and tangible component of identity. At social media, people try to engage in others via a desired impression, which can maintain through performing coherent and complementary communications including displaying signs, symbols, brands made of digital stuffs(information, interest, pictures, etc,). In marketing area, consumers traditionally show common-identity as they select clothes, hairstyles, automobiles, logos, and so on, to impress others in any given context in a shopping mall or opera. To examine these social capital and attachment, we combined a social capital theory with an attachment theory into our research model. Our research model focuses on the common identity- and bond-attachment how they are formulated through social capitals: cognitive capital, structural capital, relational capital, and individual characteristics. Thus, we examined that individual online kindness, self-rated expertise, and social relation influence to build common identity- and bond-attachment, and the attachment effects make an impact on both the willingness to help, however, common bond seems not to show directly impact on information sharing. As a result, we discover that the social capital and attachment theories are mainly applicable to the context of social media and usage in the individual networks. We collected sample data of 256 who are using social media such as Facebook, Twitter, and Cyworld and analyzed the suggested hypotheses through the Structural Equation Model by AMOS. This study analyzes the direct and indirect relationship between the social network service usage and outcomes. Antecedents of kindness, confidence of knowledge, social relations are significantly affected to the mediators common identity-and bond attachments, however, interestingly, network externality does not impact, which we assumed that a size of network was a negative because group members would not significantly contribute if the members do not intend to actively interact with each other. The mediating variables had a positive effect on toward willingness to help. Further, common identity attachment has stronger significant on shared information.

  • PDF

A Study on the Clustering Method of Row and Multiplex Housing in Seoul Using K-Means Clustering Algorithm and Hedonic Model (K-Means Clustering 알고리즘과 헤도닉 모형을 활용한 서울시 연립·다세대 군집분류 방법에 관한 연구)

  • Kwon, Soonjae;Kim, Seonghyeon;Tak, Onsik;Jeong, Hyeonhee
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.95-118
    • /
    • 2017
  • Recent centrally the downtown area, the transaction between the row housing and multiplex housing is activated and platform services such as Zigbang and Dabang are growing. The row housing and multiplex housing is a blind spot for real estate information. Because there is a social problem, due to the change in market size and information asymmetry due to changes in demand. Also, the 5 or 25 districts used by the Seoul Metropolitan Government or the Korean Appraisal Board(hereafter, KAB) were established within the administrative boundaries and used in existing real estate studies. This is not a district classification for real estate researches because it is zoned urban planning. Based on the existing study, this study found that the city needs to reset the Seoul Metropolitan Government's spatial structure in estimating future housing prices. So, This study attempted to classify the area without spatial heterogeneity by the reflected the property price characteristics of row housing and Multiplex housing. In other words, There has been a problem that an inefficient side has arisen due to the simple division by the existing administrative district. Therefore, this study aims to cluster Seoul as a new area for more efficient real estate analysis. This study was applied to the hedonic model based on the real transactions price data of row housing and multiplex housing. And the K-Means Clustering algorithm was used to cluster the spatial structure of Seoul. In this study, data onto real transactions price of the Seoul Row housing and Multiplex Housing from January 2014 to December 2016, and the official land value of 2016 was used and it provided by Ministry of Land, Infrastructure and Transport(hereafter, MOLIT). Data preprocessing was followed by the following processing procedures: Removal of underground transaction, Price standardization per area, Removal of Real transaction case(above 5 and below -5). In this study, we analyzed data from 132,707 cases to 126,759 data through data preprocessing. The data analysis tool used the R program. After data preprocessing, data model was constructed. Priority, the K-means Clustering was performed. In addition, a regression analysis was conducted using Hedonic model and it was conducted a cosine similarity analysis. Based on the constructed data model, we clustered on the basis of the longitude and latitude of Seoul and conducted comparative analysis of existing area. The results of this study indicated that the goodness of fit of the model was above 75 % and the variables used for the Hedonic model were significant. In other words, 5 or 25 districts that is the area of the existing administrative area are divided into 16 districts. So, this study derived a clustering method of row housing and multiplex housing in Seoul using K-Means Clustering algorithm and hedonic model by the reflected the property price characteristics. Moreover, they presented academic and practical implications and presented the limitations of this study and the direction of future research. Academic implication has clustered by reflecting the property price characteristics in order to improve the problems of the areas used in the Seoul Metropolitan Government, KAB, and Existing Real Estate Research. Another academic implications are that apartments were the main study of existing real estate research, and has proposed a method of classifying area in Seoul using public information(i.e., real-data of MOLIT) of government 3.0. Practical implication is that it can be used as a basic data for real estate related research on row housing and multiplex housing. Another practical implications are that is expected the activation of row housing and multiplex housing research and, that is expected to increase the accuracy of the model of the actual transaction. The future research direction of this study involves conducting various analyses to overcome the limitations of the threshold and indicates the need for deeper research.

Analysis of shopping website visit types and shopping pattern (쇼핑 웹사이트 탐색 유형과 방문 패턴 분석)

  • Choi, Kyungbin;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.85-107
    • /
    • 2019
  • Online consumers browse products belonging to a particular product line or brand for purchase, or simply leave a wide range of navigation without making purchase. The research on the behavior and purchase of online consumers has been steadily progressed, and related services and applications based on behavior data of consumers have been developed in practice. In recent years, customization strategies and recommendation systems of consumers have been utilized due to the development of big data technology, and attempts are being made to optimize users' shopping experience. However, even in such an attempt, it is very unlikely that online consumers will actually be able to visit the website and switch to the purchase stage. This is because online consumers do not just visit the website to purchase products but use and browse the websites differently according to their shopping motives and purposes. Therefore, it is important to analyze various types of visits as well as visits to purchase, which is important for understanding the behaviors of online consumers. In this study, we explored the clustering analysis of session based on click stream data of e-commerce company in order to explain diversity and complexity of search behavior of online consumers and typified search behavior. For the analysis, we converted data points of more than 8 million pages units into visit units' sessions, resulting in a total of over 500,000 website visit sessions. For each visit session, 12 characteristics such as page view, duration, search diversity, and page type concentration were extracted for clustering analysis. Considering the size of the data set, we performed the analysis using the Mini-Batch K-means algorithm, which has advantages in terms of learning speed and efficiency while maintaining the clustering performance similar to that of the clustering algorithm K-means. The most optimized number of clusters was derived from four, and the differences in session unit characteristics and purchasing rates were identified for each cluster. The online consumer visits the website several times and learns about the product and decides the purchase. In order to analyze the purchasing process over several visits of the online consumer, we constructed the visiting sequence data of the consumer based on the navigation patterns in the web site derived clustering analysis. The visit sequence data includes a series of visiting sequences until one purchase is made, and the items constituting one sequence become cluster labels derived from the foregoing. We have separately established a sequence data for consumers who have made purchases and data on visits for consumers who have only explored products without making purchases during the same period of time. And then sequential pattern mining was applied to extract frequent patterns from each sequence data. The minimum support is set to 10%, and frequent patterns consist of a sequence of cluster labels. While there are common derived patterns in both sequence data, there are also frequent patterns derived only from one side of sequence data. We found that the consumers who made purchases through the comparative analysis of the extracted frequent patterns showed the visiting pattern to decide to purchase the product repeatedly while searching for the specific product. The implication of this study is that we analyze the search type of online consumers by using large - scale click stream data and analyze the patterns of them to explain the behavior of purchasing process with data-driven point. Most studies that typology of online consumers have focused on the characteristics of the type and what factors are key in distinguishing that type. In this study, we carried out an analysis to type the behavior of online consumers, and further analyzed what order the types could be organized into one another and become a series of search patterns. In addition, online retailers will be able to try to improve their purchasing conversion through marketing strategies and recommendations for various types of visit and will be able to evaluate the effect of the strategy through changes in consumers' visit patterns.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

An Exploratory Study on the Components of Visual Merchandising of Internet Shopping Mall (인터넷쇼핑몰의 VMD 구성요인에 대한 탐색적 연구)

  • Kim, Kwang-Seok;Shin, Jong-Kuk;Koo, Dong-Mo
    • Journal of Global Scholars of Marketing Science
    • /
    • v.18 no.2
    • /
    • pp.19-45
    • /
    • 2008
  • This study is to empirically examine the primary dimensions of visual merchandising (VMD) of internet shopping mall, namely store design, merchandise, and merchandising cues, to be a attractive virtual store to the shoppers. The authors reviewed the literature related to the major components of VMD from the perspective of the AIDA model, which has been mainly applied to the offline store settings. The major purposes of the study are as follows; first, tries to derive the variables related with the components of visual merchandising through reviewing the existing literatures, establish the hypotheses, and test it empirically. Second, examines the relationships between the components of VMD and the attitude toward the VMD, however, putting more emphasis on finding out the component structure of the VMD. VMD needs to be examined with the perspective that an online shopping mall is a virtual self-service or clerkless store, which could reduce the number of employees, help the shoppers search, evaluate and purchase for themselves, and to be explored in terms of the in-store persuasion processes of customers. This study reviewed the literatures related to store design, merchandise, and merchandising cues which might be relevant to the store, product, and promotion respectively. VMD is a total communication tool, and AIDA model could explain the in-store consumer behavior of online shopping. Store design has to do with triggering a consumer attention to the online mall, merchandise with a product related interest, and merchandising cues with promotions such as recommendation and links that induce the desire to pruchase. These three steps might be seen as the processes for purchase actions. The theoretical rationale for the relationship between VMD and AIDA could be found in Tyagi(2005) that the three steps of consumer-oriented merchandising are a store, a product assortment, and placement, in Omar(1999) that three types of interior display are a architectural design display, commodity display, and point-of-sales(POS) display, and in Davies and Ward(2005) that the retail store interior image is related to an atmosphere, merchandise, and in-store promotion. Lee et al(2000) suggested as the web merchandising components a merchandising cues, a shopping metaphor which is an assistant tool for search, a store design, a layout(web design), and a product assortment. The store design which includes differentiation, simplicity and navigation is supposed to be related to the attention to the virtual store. Second, the merchandise dimensions comprising product assortments, visual information and product reputation have to do with the interest in the product offerings. Finally, the merchandising cues that refer to merchandiser(MD)'s recommendation of products and providing the hyperlinks to relevant goods for the shopper is concerned with attempt to induce the desire to purchase. The questionnaire survey was carried out to collect the data about the consumers who would shop at internet shopping malls frequently. To select the subject malls, the mall ranking data announced by a mall rating agency was used to differentiate the most popular and least popular five mall each. The subjects was instructed to answer the questions after navigating the designated mall for five minutes. The 300 questionnaire was distributed to the consumers, 166 samples were used in the final analysis. The empirical testing focused on identifying and confirming the dimensionality of VMD and its subdimensions using a structural equation modeling method. The confirmatory factor analysis for the endogeneous and exogeneous variables was carried out in four parts. The second-order factor analysis was done for a store design, a merchandise, and a merchandising cues, and first-order confirmatory factor analysis for the attitude toward the VMD. The model test results shows that the chi-square value of structural equation is 144.39(d.f 49), significant at 0.01 level which means the proposed model was rejected. But, judging from the ratio of chi-square value vs. degree of freedom, the ratio was 2.94 which smaller than an acceptable level of 3.0, RMR is 0.087 which is higher than a generally acceptable level of 0.08. GFI and AGFI is turned out to be 0.90 and 0.84 respectively. Both NFI and NNFI is 0.94, and CFI 0.95. The major test results are as follows; first, the second-order factor analysis and structural equational modeling reveals that the differentiation, simplicity and ease of identifying current status of the transaction are confirmed to be subdimensions of store design and to be a significant predictors of the dependent variable. This result implies that when designing an online shopping mall, it is necessary to differentiate visually from other malls to improve the effectiveness of the communications of store design. That is, the differentiated store design raise the contrast stimulus to sensory organs to promote the memory of the store and to have a favorable attitude toward the VMD of a store. The results that navigation which means the easiness of identifying current status of shopping affects the attitude to VMD could be interpreted that the navigating processes via the hyperlinks which is characteristics of an internet shopping is a complex and cognitive process and shoppers are likely to lack the sense of overall structure of the store. Consequently, shoppers are likely to be alost amid shopping not knowing where to go. The orientation tool enhance the accessibility of information to raise the perceptive power about the store environment.(Titus & Everett 1995) Second, the primary dimension of merchandise and its subdimensions was confirmed to be unidimensional respectively, have a construct validity, and nomological validity which the VMD dimensions supposed to have a positive correlation with the dependent variable. The subdimensions of product assortment, brand fame and information provision proved to have a positive effect on the attitude toward the VMD. It could be interpreted that the more plentiful the product and brand assortment of the mall is, the more likely the shoppers to favor it. Brand fame and information provision as well affect the VMD attitude, which means that the more famous the brand, the more likely the shoppers would trust and feel familiar with the mall, and the plentifully and visually presented information could have the shopper have a favorable attitude toward the store VMD. Third, it turned out to be that merchandising cue of product recommendation and hyperlinks affect the VMD attitude. This could be interpreted that recommended products could reduce the uncertainty related with the purchase decision, and the hyperlinks to relevant products would help the shopper save the cognitive effort exerted into the information search and gathering, which could lead to a favorable attitude to the VMD. This study tried to sheds some new light on the VMD of online store by reviewing the variables mentioned to be relevant with offline VMD in the existing literatures, and tried to link the VMD components from the perspective of AIDA model. The effect size of the VMD dimensions on the attitude was in the order of the merchandise, the store design and the merchandising cues.It is said that an internet has an unlimited place for display, however, the virtual store is not unlimited since the consumer has a limited amount of cognitive ability to process the external information and internal memory. Particularly, the shoppers are likely to face some difficulties in decision making on account of too many alternative and information overloads. Therefore, the internet shopping mall manager should take into consideration the cost of information search on the part of the consumer, to establish the optimal product placements and search routes. An efficient store composition would be possible by reducing the psychological burdens and cognitive efforts exerted to information search and alternatives evaluation. The store image is in most part determined by the product category and its brand it deals in. The results of this study support this proposition that the merchandise is most important to the VMD attitude than other components, the manager is required to take a strategic approach to VMD. The internet users are getting more accustomed and more knowledgeable about the internet media and more likely to accept the internet as a shopping channel as the period of time during which they use the internet to shop become longer. The web merchandiser should be aware that the product introduction using a moving pictures and a bulletin board become more important in order to present the interactive product information visually and communicate with customers more actively, therefore leading to making the quantity and quality of product information more rich.

  • PDF

A study on the air pollutant emission trends in Gwangju (광주시 대기오염물질 배출량 변화추이에 관한 연구)

  • Seo, Gwang-Yeob;Shin, Dae-Yewn
    • Journal of environmental and Sanitary engineering
    • /
    • v.24 no.4
    • /
    • pp.1-26
    • /
    • 2009
  • We conclude the following with air pollution data measured from city measurement net administered and managed in Gwangju for the last 7 years from January in 2001 to December in 2007. In addition, some major statistics governed by Gwangju city and data administered by Gwangju as national official statistics obtained by estimating the amount of national air pollutant emission from National Institute of Environmental Research were used. The results are as follows ; 1. The distribution by main managements of air emission factory is the following ; Gwangju City Hall(67.8%) > Gwangsan District Office(13.6%) > Buk District Office(9.8%) > Seo District Office(5.5%) > Nam District Office(3.0%) > Dong District Office(0.3%) and the distribution by districts of air emission factory ; Buk District(32.8%) > Gwangsan District(22.4%) > Seo District(21.8%) > Nam District(14.9%) > Dong District(8.1%). That by types(Year 2004~2007 average) is also following ; Type 5(45.2%) > Type 4(40.7%) > Type 3(8.6%) > Type 2(3.2%) > Type 1(2.2%) and the most of them are small size of factory, Type 4 and 5. 2. The distribution by districts of the number of car registrations is the following ; Buk District(32.8%) > Gwangsan District(22.4%) > Seo District(21.8%) > Nam District(14.9%) > Dong District(8.1%) and the distribution by use of car fuel in 2001 ; Gasoline(56.3%) > Diesel(30.3%) > LPG(13.4%) > etc.(0.2%). In 2007, there was no ranking change ; Gasoline(47.8%) > Diesel(35.6%) > LPG(16.2%) >etc.(0.4%). The number of gasoline cars increased slightly, but that of diesel and LPG cars increased remarkably. 3. The distribution by items of the amount of air pollutant emission in Gwangju is the following; CO(36.7%) > NOx(32.7%) > VOC(26.7%) > SOx(2.3%) > PM-10(1.5%). The amount of CO and NOx, which are generally generated from cars, is very large percentage among them. 4. The distribution by mean of air pollutant emission(SOx, NOx, CO, VOC, PM-10) of each county for 5 years(2001~2005) is the following ; Buk District(31.0%) > Gwangsan District(28.2%) > Seo District(20.4%) > Nam District(12.5%) > Dong District(7.9%). The amount of air pollutant emission in Buk District, which has the most population, car registrations, and air pollutant emission businesses, was the highest. On the other hand, that of air pollutant emission in Dong District, which has the least population, car registrations, and air pollutant emission businesses, was the least. 5. The average rates of SOx for 5 years(2001~2005) in Gwangju is the following ; Non industrial combustion(59.5%) > Combustion in manufacturing industry(20.4%) > Road transportation(11.4%) > Non-road transportation(3.8%) > Waste disposal(3.7%) > Production process(1.1%). And the distribution of average amount of SOx emission of each county is shown as Gwangsan District(33.3%) > Buk District(28.0%) > Seo District(19.3%) > Nam District(10.2%) > Dong District(9.1%). 6. The distribution of the amount of NOx emission in Gwangju is shown as Road transportation(59.1%) > Non-road transportation(18.9%) > Non industrial combustion(13.3%) > Combustion in manufacturing industry(6.9%) > Waste disposal(1.6%) > Production process(0.1%). And the distribution of the amount of NOx emission from each county is the following ; Buk District(30.7%) > Gwangsan District(28.8%) > Seo District(20.5%) > Nam District(12.2%) > Dong District(7.8%). 7. The distribution of the amount of carbon monoxide emission in Gwangju is shown as Road transportation(82.0%) > Non industrial combustion(10.6%) > Non-road transportation(5.4%) > Combustion in manufacturing industry(1.7%) > Waste disposal(0.3%). And the distribution of the amount of carbon monoxide emission from each county is the following ; Buk District(33.0%) > Seo District(22.3%) > Gwangsan District(21.3%) > Nam District(14.3%) > Dong District(9.1%). 8. The distribution of the amount of Volatile Organic Compound emission in Gwangju is shown as Solvent utilization(69.5%) > Road transportation(19.8%) > Energy storage & transport(4.4%) > Non-road transportation(2.8%) > Waste disposal(2.4%) > Non industrial combustion(0.5%) > Production process(0.4%) > Combustion in manufacturing industry(0.3%). And the distribution of the amount of Volatile Organic Compound emission from each county is the following ; Gwangsan District(36.8%) > Buk District(28.7%) > Seo District(17.8%) > Nam District(10.4%) > Dong District(6.3%). 9. The distribution of the amount of minute dust emission in Gwangju is shown as Road transportation(76.7%) > Non-road transportation(16.3%) > Non industrial combustion(6.1%) > Combustion in manufacturing industry(0.7%) > Waste disposal(0.2%) > Production process(0.1%). And the distribution of the amount of minute dust emission from each county is the following ; Buk District(32.8%) > Gwangsan District(26.0%) > Seo District(19.5%) > Nam District(13.2%) > Dong District(8.5%). 10. According to the major source of emission of each items, that of oxides of sulfur is Non industrial combustion, heating of residence, business and agriculture and stockbreeding. And that of NOx, carbon monoxide, minute dust is Road transportation, emission of cars and two-wheeled vehicles. Also, that of VOC is Solvent utilization emission facilities due to Solvent utilization. 11. The concentration of sulfurous acid gas has been 0.004ppm since 2001 and there has not been no concentration change year by year. It is considered that the use of sulfurous acid gas is now reaching to the stabilization stage. This is found by the facts that the use of fuel is steadily changing from solid or liquid fuel to low sulfur liquid fuel containing very little amount of sulfur element or gas, so that nearly no change in concentration has been shown regularly. 12. Concerning changes of the concentration of throughout time, the concentration of NO has been shown relatively higher than that of $NO_2$ between 6AM~1PM and the concentration of $NO_2$ higher during the other time. The concentration of NOx(NO, $NO_2$) has been relatively high during weekday evenings. This result shows that there is correlation between the concentration of NOx and car traffics as we can see the Road transportation which accounts for 59.1% among the amount of NOx emission. 13. 49.1~61.2% of PM-10 shows PM-2.5 concerning the relationship between PM-10 and PM-2.5 and PM-2.5 among dust accounts for 45.4%~44.5% of PM-10 during March and April which is the lowest rates. This proves that particles of yellow sand that are bigger than the size $2.5\;{\mu}m$ are sent more than those that are smaller from China. This result shows that particles smaller than $2.5\;{\mu}m$ among dust exist much during July~August and December~January and 76.7% of minute dust is proved to be road transportation in Gwangju.