• Title/Summary/Keyword: computing

Search Result 14,970, Processing Time 0.039 seconds

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Effects of firm strategies on customer acquisition of Software as a Service (SaaS) providers: A mediating and moderating role of SaaS technology maturity (SaaS 기업의 차별화 및 가격전략이 고객획득성과에 미치는 영향: SaaS 기술성숙도 수준의 매개효과 및 조절효과를 중심으로)

  • Chae, SeongWook;Park, Sungbum
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.151-171
    • /
    • 2014
  • Firms today have sought management effectiveness and efficiency utilizing information technologies (IT). Numerous firms are outsourcing specific information systems functions to cope with their short of information resources or IT experts, or to reduce their capital cost. Recently, Software-as-a-Service (SaaS) as a new type of information system has become one of the powerful outsourcing alternatives. SaaS is software deployed as a hosted and accessed over the internet. It is regarded as the idea of on-demand, pay-per-use, and utility computing and is now being applied to support the core competencies of clients in areas ranging from the individual productivity area to the vertical industry and e-commerce area. In this study, therefore, we seek to quantify the value that SaaS has on business performance by examining the relationships among firm strategies, SaaS technology maturity, and business performance of SaaS providers. We begin by drawing from prior literature on SaaS, technology maturity and firm strategy. SaaS technology maturity is classified into three different phases such as application service providing (ASP), Web-native application, and Web-service application. Firm strategies are manipulated by the low-cost strategy and differentiation strategy. Finally, we considered customer acquisition as a business performance. In this sense, specific objectives of this study are as follows. First, we examine the relationships between customer acquisition performance and both low-cost strategy and differentiation strategy of SaaS providers. Secondly, we investigate the mediating and moderating effects of SaaS technology maturity on those relationships. For this purpose, study collects data from the SaaS providers, and their line of applications registered in the database in CNK (Commerce net Korea) in Korea using a questionnaire method by the professional research institution. The unit of analysis in this study is the SBUs (strategic business unit) in the software provider. A total of 199 SBUs is used for analyzing and testing our hypotheses. With regards to the measurement of firm strategy, we take three measurement items for differentiation strategy such as the application uniqueness (referring an application aims to differentiate within just one or a small number of target industry), supply channel diversification (regarding whether SaaS vendor had diversified supply chain) as well as the number of specialized expertise and take two items for low cost strategy like subscription fee and initial set-up fee. We employ a hierarchical regression analysis technique for testing moderation effects of SaaS technology maturity and follow the Baron and Kenny's procedure for determining if firm strategies affect customer acquisition through technology maturity. Empirical results revealed that, firstly, when differentiation strategy is applied to attain business performance like customer acquisition, the effects of the strategy is moderated by the technology maturity level of SaaS providers. In other words, securing higher level of SaaS technology maturity is essential for higher business performance. For instance, given that firms implement application uniqueness or a distribution channel diversification as a differentiation strategy, they can acquire more customers when their level of SaaS technology maturity is higher rather than lower. Secondly, results indicate that pursuing differentiation strategy or low cost strategy effectively works for SaaS providers' obtaining customer, which means that continuously differentiating their service from others or making their service fee (subscription fee or initial set-up fee) lower are helpful for their business success in terms of acquiring their customers. Lastly, results show that the level of SaaS technology maturity mediates the relationships between low cost strategy and customer acquisition. That is, based on our research design, customers usually perceive the real value of the low subscription fee or initial set-up fee only through the SaaS service provide by vender and, in turn, this will affect their decision making whether subscribe or not.

Export Control System based on Case Based Reasoning: Design and Evaluation (사례 기반 지능형 수출통제 시스템 : 설계와 평가)

  • Hong, Woneui;Kim, Uihyun;Cho, Sinhee;Kim, Sansung;Yi, Mun Yong;Shin, Donghoon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.109-131
    • /
    • 2014
  • As the demand of nuclear power plant equipment is continuously growing worldwide, the importance of handling nuclear strategic materials is also increasing. While the number of cases submitted for the exports of nuclear-power commodity and technology is dramatically increasing, preadjudication (or prescreening to be simple) of strategic materials has been done so far by experts of a long-time experience and extensive field knowledge. However, there is severe shortage of experts in this domain, not to mention that it takes a long time to develop an expert. Because human experts must manually evaluate all the documents submitted for export permission, the current practice of nuclear material export is neither time-efficient nor cost-effective. Toward alleviating the problem of relying on costly human experts only, our research proposes a new system designed to help field experts make their decisions more effectively and efficiently. The proposed system is built upon case-based reasoning, which in essence extracts key features from the existing cases, compares the features with the features of a new case, and derives a solution for the new case by referencing similar cases and their solutions. Our research proposes a framework of case-based reasoning system, designs a case-based reasoning system for the control of nuclear material exports, and evaluates the performance of alternative keyword extraction methods (full automatic, full manual, and semi-automatic). A keyword extraction method is an essential component of the case-based reasoning system as it is used to extract key features of the cases. The full automatic method was conducted using TF-IDF, which is a widely used de facto standard method for representative keyword extraction in text mining. TF (Term Frequency) is based on the frequency count of the term within a document, showing how important the term is within a document while IDF (Inverted Document Frequency) is based on the infrequency of the term within a document set, showing how uniquely the term represents the document. The results show that the semi-automatic approach, which is based on the collaboration of machine and human, is the most effective solution regardless of whether the human is a field expert or a student who majors in nuclear engineering. Moreover, we propose a new approach of computing nuclear document similarity along with a new framework of document analysis. The proposed algorithm of nuclear document similarity considers both document-to-document similarity (${\alpha}$) and document-to-nuclear system similarity (${\beta}$), in order to derive the final score (${\gamma}$) for the decision of whether the presented case is of strategic material or not. The final score (${\gamma}$) represents a document similarity between the past cases and the new case. The score is induced by not only exploiting conventional TF-IDF, but utilizing a nuclear system similarity score, which takes the context of nuclear system domain into account. Finally, the system retrieves top-3 documents stored in the case base that are considered as the most similar cases with regard to the new case, and provides them with the degree of credibility. With this final score and the credibility score, it becomes easier for a user to see which documents in the case base are more worthy of looking up so that the user can make a proper decision with relatively lower cost. The evaluation of the system has been conducted by developing a prototype and testing with field data. The system workflows and outcomes have been verified by the field experts. This research is expected to contribute the growth of knowledge service industry by proposing a new system that can effectively reduce the burden of relying on costly human experts for the export control of nuclear materials and that can be considered as a meaningful example of knowledge service application.

Open Digital Textbook for Smart Education (스마트교육을 위한 오픈 디지털교과서)

  • Koo, Young-Il;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.177-189
    • /
    • 2013
  • In Smart Education, the roles of digital textbook is very important as face-to-face media to learners. The standardization of digital textbook will promote the industrialization of digital textbook for contents providers and distributers as well as learner and instructors. In this study, the following three objectives-oriented digital textbooks are looking for ways to standardize. (1) digital textbooks should undertake the role of the media for blended learning which supports on-off classes, should be operating on common EPUB viewer without special dedicated viewer, should utilize the existing framework of the e-learning learning contents and learning management. The reason to consider the EPUB as the standard for digital textbooks is that digital textbooks don't need to specify antoher standard for the form of books, and can take advantage od industrial base with EPUB standards-rich content and distribution structure (2) digital textbooks should provide a low-cost open market service that are currently available as the standard open software (3) To provide appropriate learning feedback information to students, digital textbooks should provide a foundation which accumulates and manages all the learning activity information according to standard infrastructure for educational Big Data processing. In this study, the digital textbook in a smart education environment was referred to open digital textbook. The components of open digital textbooks service framework are (1) digital textbook terminals such as smart pad, smart TVs, smart phones, PC, etc., (2) digital textbooks platform to show and perform digital contents on digital textbook terminals, (3) learning contents repository, which exist on the cloud, maintains accredited learning, (4) App Store providing and distributing secondary learning contents and learning tools by learning contents developing companies, and (5) LMS as a learning support/management tool which on-site class teacher use for creating classroom instruction materials. In addition, locating all of the hardware and software implement a smart education service within the cloud must have take advantage of the cloud computing for efficient management and reducing expense. The open digital textbooks of smart education is consdered as providing e-book style interface of LMS to learners. In open digital textbooks, the representation of text, image, audio, video, equations, etc. is basic function. But painting, writing, problem solving, etc are beyond the capabilities of a simple e-book. The Communication of teacher-to-student, learner-to-learnert, tems-to-team is required by using the open digital textbook. To represent student demographics, portfolio information, and class information, the standard used in e-learning is desirable. To process learner tracking information about the activities of the learner for LMS(Learning Management System), open digital textbook must have the recording function and the commnincating function with LMS. DRM is a function for protecting various copyright. Currently DRMs of e-boook are controlled by the corresponding book viewer. If open digital textbook admitt DRM that is used in a variety of different DRM standards of various e-book viewer, the implementation of redundant features can be avoided. Security/privacy functions are required to protect information about the study or instruction from a third party UDL (Universal Design for Learning) is learning support function for those with disabilities have difficulty in learning courses. The open digital textbook, which is based on E-book standard EPUB 3.0, must (1) record the learning activity log information, and (2) communicate with the server to support the learning activity. While the recording function and the communication function, which is not determined on current standards, is implemented as a JavaScript and is utilized in the current EPUB 3.0 viewer, ths strategy of proposing such recording and communication functions as the next generation of e-book standard, or special standard (EPUB 3.0 for education) is needed. Future research in this study will implement open source program with the proposed open digital textbook standard and present a new educational services including Big Data analysis.

Aspect-Based Sentiment Analysis Using BERT: Developing Aspect Category Sentiment Classification Models (BERT를 활용한 속성기반 감성분석: 속성카테고리 감성분류 모델 개발)

  • Park, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.1-25
    • /
    • 2020
  • Sentiment Analysis (SA) is a Natural Language Processing (NLP) task that analyzes the sentiments consumers or the public feel about an arbitrary object from written texts. Furthermore, Aspect-Based Sentiment Analysis (ABSA) is a fine-grained analysis of the sentiments towards each aspect of an object. Since having a more practical value in terms of business, ABSA is drawing attention from both academic and industrial organizations. When there is a review that says "The restaurant is expensive but the food is really fantastic", for example, the general SA evaluates the overall sentiment towards the 'restaurant' as 'positive', while ABSA identifies the restaurant's aspect 'price' as 'negative' and 'food' aspect as 'positive'. Thus, ABSA enables a more specific and effective marketing strategy. In order to perform ABSA, it is necessary to identify what are the aspect terms or aspect categories included in the text, and judge the sentiments towards them. Accordingly, there exist four main areas in ABSA; aspect term extraction, aspect category detection, Aspect Term Sentiment Classification (ATSC), and Aspect Category Sentiment Classification (ACSC). It is usually conducted by extracting aspect terms and then performing ATSC to analyze sentiments for the given aspect terms, or by extracting aspect categories and then performing ACSC to analyze sentiments for the given aspect category. Here, an aspect category is expressed in one or more aspect terms, or indirectly inferred by other words. In the preceding example sentence, 'price' and 'food' are both aspect categories, and the aspect category 'food' is expressed by the aspect term 'food' included in the review. If the review sentence includes 'pasta', 'steak', or 'grilled chicken special', these can all be aspect terms for the aspect category 'food'. As such, an aspect category referred to by one or more specific aspect terms is called an explicit aspect. On the other hand, the aspect category like 'price', which does not have any specific aspect terms but can be indirectly guessed with an emotional word 'expensive,' is called an implicit aspect. So far, the 'aspect category' has been used to avoid confusion about 'aspect term'. From now on, we will consider 'aspect category' and 'aspect' as the same concept and use the word 'aspect' more for convenience. And one thing to note is that ATSC analyzes the sentiment towards given aspect terms, so it deals only with explicit aspects, and ACSC treats not only explicit aspects but also implicit aspects. This study seeks to find answers to the following issues ignored in the previous studies when applying the BERT pre-trained language model to ACSC and derives superior ACSC models. First, is it more effective to reflect the output vector of tokens for aspect categories than to use only the final output vector of [CLS] token as a classification vector? Second, is there any performance difference between QA (Question Answering) and NLI (Natural Language Inference) types in the sentence-pair configuration of input data? Third, is there any performance difference according to the order of sentence including aspect category in the QA or NLI type sentence-pair configuration of input data? To achieve these research objectives, we implemented 12 ACSC models and conducted experiments on 4 English benchmark datasets. As a result, ACSC models that provide performance beyond the existing studies without expanding the training dataset were derived. In addition, it was found that it is more effective to reflect the output vector of the aspect category token than to use only the output vector for the [CLS] token as a classification vector. It was also found that QA type input generally provides better performance than NLI, and the order of the sentence with the aspect category in QA type is irrelevant with performance. There may be some differences depending on the characteristics of the dataset, but when using NLI type sentence-pair input, placing the sentence containing the aspect category second seems to provide better performance. The new methodology for designing the ACSC model used in this study could be similarly applied to other studies such as ATSC.

Analysis of Forestry Structure and Induced Output Based on Input - output Table - Influences of Forestry Production on Korean Economy - (산업관련표(産業關聯表)에 의(依)한 임업구조분석(林業構造分析)과 유발생산액(誘發生産額) -임업(林業)이 한국경제(韓國經濟)에 미치는 영향(影響)-)

  • Lee, Sung-Yoon
    • Journal of the Korean Wood Science and Technology
    • /
    • v.2 no.4
    • /
    • pp.4-14
    • /
    • 1974
  • The total forest land area in Korea accounts for some 67 percent of the nation's land total. Its productivity, however, is very low. Consequently, forest production accounts for only about 2 percent of the gross national product and a minor proportion of no more than about 5 percent versus primary industry. In this case, however, only the direct income from forestry is taken into account, making no reference to the forestry output induced by other industrial sectors. The value added Or the induced forestry output in manufacturing the primary wood products into higher quality products, makes a larger contribution to the economy than direct contribution. So, this author has tried to analyze the structure of forestry and compute the repercussion effect and the induced output of primary forest products when utilized by other industries for their raw materials, Hsing the input-output table and attached tables for 1963 and 1966 issued by the Bank of Korea. 1. Analysis of forestry structure A. Changes in total output Durng the nine-year period, 1961-1969, the real gross national product in Korea increased 2.1 times, while that of primary industries went up about 1. 4 times. Forestry which was valued at 9,380 million won in 1961, was picked up about 2. 1 times to 20, 120 million won in 1969. The rate of the forestry income in the GNP, accordingly, was no more than 1.5 percent both in 1961 and 1962, whereas its rate in primary industries increased 3.5 to 5.4 percent. Such increase in forestry income is attributable to increased forest production and rise in timber prices. The rate of forestry income, nonetheless, was on the decrease on a gradual basis. B. Changes in input coefficient The input coefficient which indicates the inputs of the forest products into other sectors were up in general in 1966 over 1963. It is noted that the input coefficient indicating the amount of forest products supplied to such industries closely related with forestry as lumber and plywood, and wood products and furniture, showed a downward trend for the period 1963-1966. On the other hand, the forest input into other sectors was generally on the increase. Meanwhile, the input coefficient representing the yolume of the forest products supplied to the forestry sector itself showed an upward tendency, which meant more and more decrease in input from other sectors. Generally speaking, in direct proportion to the higher input coefficient in any industrial sector, the reinput coefficient which denotes the use of its products by the same sector becomes higher and higher. C. Changes in ratio of intermediate input The intermediate input ratio showing the dependency on raw materials went up to 15.43 percent m 1966 from 11. 37 percent in 1963. The dependency of forestry on raw materials was no more than 15.43 percent, accounting for a high 83.57 percent of value added. If the intermediate input ratio increases in any given sector, the input coefficient which represents the fe-use of its products by the same sector becomes large. D. Changes in the ratio of intermediate demand The ratio of the intermediate demand represents the characteristics of the intermediary production in each industry, the intermediate demand ratio in forestry which accunted for 69.7 percent in 1963 went up to 75.2 percent in 1966. In other words, forestry is a remarkable industry in that there is characteristics of the intermediary production. E. Changes in import coefficient The import coefficient which denotes the relation between the production activities and imports, recorded at 4.4 percent in 1963, decreased to 2.4 percent in 1966. The ratio of import to total output is not so high. F. Changes in market composition of imported goods One of the major imported goods in the forestry sector is lumber. The import value increased by 60 percent to 667 million won in 1966 from 407 million won in 1963. The sales of imported forest products to two major outlets-lumber and plywood, and wood products and furniture-increased to 343 million won and 31 million won in 1966 from 240million won and 30 million won in 1963 respectively. On the other hand, imported goods valued at 66 million won were sold to the paper products sector in 1963; however, no supply to this sector was recorded in 1963. Besides these major markets, primary industries such as the fishery, coal and agriculture sectors purchase materials from forestry. 2. Analysis of repercussion effect on production The repercussion effect of final demand in any given sector upon the expansion of the production of other sectors was analyzed, using the inverse matrix coefficient tables attached to the the I.O. Table. A. Changes in intra-sector transaction value of inverse matrix coefficient. The intra-sector transaction value of an inverse matrix coefficient represents the extent of an induced increase in the production of self-support products of the same sector, when it is generated directly and indirectly by one unit of final demand in any given sector. The intra-sector transaction value of the forestry sector rose from 1.04 in 1963 to 1, 11 in 1966. It may well be said, therefore, that forestry induces much more self-supporting products in the production of one unit of final demand for forest products. B. Changes in column total of inverse matrix coefficient It should be noted that the column total indicates the degree of effect of the output of the corresponding and related sectors generated by one unit of final demand in each sector. No changes in the column total of the forestry sector were recorded between the 1963 and 1966 figures, both being the same 1. 19. C. Changes in difference between column total and intra-sector transaction amount. The difference between the column total and intra-sector transaction amount by sector reveals the extent of effect of output of related industrial sector induced indirectly by one unit of final demand in corresponding sector. This change in forestry dropped remarkable to 0.08 in 1966 from 0.15 in 1963. Accordingly, the effect of inducement of indirect output of other forestry-related sectors has decreased; this is a really natural phenomenon, as compared with an increasing input coefficient generated by the re-use of forest products by the forestry sector. 3. Induced output of forestry A. Forest products, wood in particular, are supplied to other industries as their raw materials, increasng their value added. In this connection the primary dependency rate on forestry for 1963 and 1966 was compared, i. e., an increase or decrease in each sector, from 7.71 percent in 1963 to 11.91 percent in 1966 in agriculture, 10.32 to 6.11 in fishery, 16.24 to 19.90 in mining, 0.76 to 0.70 in the manufacturing sector and 2.79 to 4.77 percent in the construction sector. Generally speaking, on the average the dependency on forestry during the period 1963-1966 increased from 5.92 percent to 8.03 percent. Accordingly, it may easily be known that the primary forestry output induced by primary and secondary industries increased from 16, 109 million won in 1963 to 48, 842 million won in 1966. B. The forest products are supplied to other industries as their raw materials. The products are processed further into higher quality products. thus indirectly increasing the value of the forest products. The ratio of the increased value added or the secondary dependency on forestry for 1963 and 1966 showed an increase or decrease, from 5.98 percent to 7.87 percent in agriculture, 9.06 to 5.74 in fishery, 13.56 to 15.81 in mining, 0.68 to 0.61 in the manufacturing sector and 2.71 to 4.54 in the construction sector. The average ratio in this connection increased from 4.69 percent to 5.60 percent. In the meantime, the secondary forestry output induced by primary and secondary industries rose from 12,779 million Wall in 1963 to 34,084 million won in 1966. C. The dependency of tertiary industries on forestry showed very minor ratios of 0.46 percent and 0.04 percent in 1963 and 1966 respectively. The forestry output induced by tertiary industry also decreased from 685 million won to 123 million won during the same period. D. Generally speaking, the ratio of dependency on forestry increased from 17.68 percent in 1963 to 24.28 percent in 1966 in primary industries, from 4.69 percent to 5.70 percent in secondary industries, while, as mentioned above, the ratio in the case of tertiary industry decreased from 0.46 to 0.04 percent during the period 1963-66. The mining industry reveals the heaviest rate of dependency on forestry with 29.80 percent in 1963 and 35.71 percent in 1966. As it result, the direct forestry income, valued at 8,172 million won in 1963, shot up to 22,724 million won in 1966. Its composition ratio lo the national income rose from 1.9 percent in 1963 to 2.3 per cent in 1966. If the induced outcome is taken into account, the total forestry production which was estimated at 37,744 million won in 1963 picked up to 105,773 million won in 1966, about 4.5 times its direct income. It is further noted that the ratio of the gross forestry product to the gross national product. rose significantly from 8.8 percent in 1963 to 10.7 percent in 1966. E. In computing the above mentioned ratio not taken into consideration were such intangible, indirect effects as the drought and flood prevention, check of soil run-off, watershed and land conservation, improvement of the people's recreational and emotional living, and maintenance and increase in the national health and sanitation. F. In conclusion, I would like to emphasize that the forestry sector exercices an important effect upon the national economy and that the effect of induced forestry output is greater than its direct income.

  • PDF

Implementation of integrated monitoring system for trace and path prediction of infectious disease (전염병의 경로 추적 및 예측을 위한 통합 정보 시스템 구현)

  • Kim, Eungyeong;Lee, Seok;Byun, Young Tae;Lee, Hyuk-Jae;Lee, Taikjin
    • Journal of Internet Computing and Services
    • /
    • v.14 no.5
    • /
    • pp.69-76
    • /
    • 2013
  • The incidence of globally infectious and pathogenic diseases such as H1N1 (swine flu) and Avian Influenza (AI) has recently increased. An infectious disease is a pathogen-caused disease, which can be passed from the infected person to the susceptible host. Pathogens of infectious diseases, which are bacillus, spirochaeta, rickettsia, virus, fungus, and parasite, etc., cause various symptoms such as respiratory disease, gastrointestinal disease, liver disease, and acute febrile illness. They can be spread through various means such as food, water, insect, breathing and contact with other persons. Recently, most countries around the world use a mathematical model to predict and prepare for the spread of infectious diseases. In a modern society, however, infectious diseases are spread in a fast and complicated manner because of rapid development of transportation (both ground and underground). Therefore, we do not have enough time to predict the fast spreading and complicated infectious diseases. Therefore, new system, which can prevent the spread of infectious diseases by predicting its pathway, needs to be developed. In this study, to solve this kind of problem, an integrated monitoring system, which can track and predict the pathway of infectious diseases for its realtime monitoring and control, is developed. This system is implemented based on the conventional mathematical model called by 'Susceptible-Infectious-Recovered (SIR) Model.' The proposed model has characteristics that both inter- and intra-city modes of transportation to express interpersonal contact (i.e., migration flow) are considered. They include the means of transportation such as bus, train, car and airplane. Also, modified real data according to the geographical characteristics of Korea are employed to reflect realistic circumstances of possible disease spreading in Korea. We can predict where and when vaccination needs to be performed by parameters control in this model. The simulation includes several assumptions and scenarios. Using the data of Statistics Korea, five major cities, which are assumed to have the most population migration have been chosen; Seoul, Incheon (Incheon International Airport), Gangneung, Pyeongchang and Wonju. It was assumed that the cities were connected in one network, and infectious disease was spread through denoted transportation methods only. In terms of traffic volume, daily traffic volume was obtained from Korean Statistical Information Service (KOSIS). In addition, the population of each city was acquired from Statistics Korea. Moreover, data on H1N1 (swine flu) were provided by Korea Centers for Disease Control and Prevention, and air transport statistics were obtained from Aeronautical Information Portal System. As mentioned above, daily traffic volume, population statistics, H1N1 (swine flu) and air transport statistics data have been adjusted in consideration of the current conditions in Korea and several realistic assumptions and scenarios. Three scenarios (occurrence of H1N1 in Incheon International Airport, not-vaccinated in all cities and vaccinated in Seoul and Pyeongchang respectively) were simulated, and the number of days taken for the number of the infected to reach its peak and proportion of Infectious (I) were compared. According to the simulation, the number of days was the fastest in Seoul with 37 days and the slowest in Pyeongchang with 43 days when vaccination was not considered. In terms of the proportion of I, Seoul was the highest while Pyeongchang was the lowest. When they were vaccinated in Seoul, the number of days taken for the number of the infected to reach at its peak was the fastest in Seoul with 37 days and the slowest in Pyeongchang with 43 days. In terms of the proportion of I, Gangneung was the highest while Pyeongchang was the lowest. When they were vaccinated in Pyeongchang, the number of days was the fastest in Seoul with 37 days and the slowest in Pyeongchang with 43 days. In terms of the proportion of I, Gangneung was the highest while Pyeongchang was the lowest. Based on the results above, it has been confirmed that H1N1, upon the first occurrence, is proportionally spread by the traffic volume in each city. Because the infection pathway is different by the traffic volume in each city, therefore, it is possible to come up with a preventive measurement against infectious disease by tracking and predicting its pathway through the analysis of traffic volume.

Incorporating Social Relationship discovered from User's Behavior into Collaborative Filtering (사용자 행동 기반의 사회적 관계를 결합한 사용자 협업적 여과 방법)

  • Thay, Setha;Ha, Inay;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.1-20
    • /
    • 2013
  • Nowadays, social network is a huge communication platform for providing people to connect with one another and to bring users together to share common interests, experiences, and their daily activities. Users spend hours per day in maintaining personal information and interacting with other people via posting, commenting, messaging, games, social events, and applications. Due to the growth of user's distributed information in social network, there is a great potential to utilize the social data to enhance the quality of recommender system. There are some researches focusing on social network analysis that investigate how social network can be used in recommendation domain. Among these researches, we are interested in taking advantages of the interaction between a user and others in social network that can be determined and known as social relationship. Furthermore, mostly user's decisions before purchasing some products depend on suggestion of people who have either the same preferences or closer relationship. For this reason, we believe that user's relationship in social network can provide an effective way to increase the quality in prediction user's interests of recommender system. Therefore, social relationship between users encountered from social network is a common factor to improve the way of predicting user's preferences in the conventional approach. Recommender system is dramatically increasing in popularity and currently being used by many e-commerce sites such as Amazon.com, Last.fm, eBay.com, etc. Collaborative filtering (CF) method is one of the essential and powerful techniques in recommender system for suggesting the appropriate items to user by learning user's preferences. CF method focuses on user data and generates automatic prediction about user's interests by gathering information from users who share similar background and preferences. Specifically, the intension of CF method is to find users who have similar preferences and to suggest target user items that were mostly preferred by those nearest neighbor users. There are two basic units that need to be considered by CF method, the user and the item. Each user needs to provide his rating value on items i.e. movies, products, books, etc to indicate their interests on those items. In addition, CF uses the user-rating matrix to find a group of users who have similar rating with target user. Then, it predicts unknown rating value for items that target user has not rated. Currently, CF has been successfully implemented in both information filtering and e-commerce applications. However, it remains some important challenges such as cold start, data sparsity, and scalability reflected on quality and accuracy of prediction. In order to overcome these challenges, many researchers have proposed various kinds of CF method such as hybrid CF, trust-based CF, social network-based CF, etc. In the purpose of improving the recommendation performance and prediction accuracy of standard CF, in this paper we propose a method which integrates traditional CF technique with social relationship between users discovered from user's behavior in social network i.e. Facebook. We identify user's relationship from behavior of user such as posts and comments interacted with friends in Facebook. We believe that social relationship implicitly inferred from user's behavior can be likely applied to compensate the limitation of conventional approach. Therefore, we extract posts and comments of each user by using Facebook Graph API and calculate feature score among each term to obtain feature vector for computing similarity of user. Then, we combine the result with similarity value computed using traditional CF technique. Finally, our system provides a list of recommended items according to neighbor users who have the biggest total similarity value to the target user. In order to verify and evaluate our proposed method we have performed an experiment on data collected from our Movies Rating System. Prediction accuracy evaluation is conducted to demonstrate how much our algorithm gives the correctness of recommendation to user in terms of MAE. Then, the evaluation of performance is made to show the effectiveness of our method in terms of precision, recall, and F1-measure. Evaluation on coverage is also included in our experiment to see the ability of generating recommendation. The experimental results show that our proposed method outperform and more accurate in suggesting items to users with better performance. The effectiveness of user's behavior in social network particularly shows the significant improvement by up to 6% on recommendation accuracy. Moreover, experiment of recommendation performance shows that incorporating social relationship observed from user's behavior into CF is beneficial and useful to generate recommendation with 7% improvement of performance compared with benchmark methods. Finally, we confirm that interaction between users in social network is able to enhance the accuracy and give better recommendation in conventional approach.

Development of a complex failure prediction system using Hierarchical Attention Network (Hierarchical Attention Network를 이용한 복합 장애 발생 예측 시스템 개발)

  • Park, Youngchan;An, Sangjun;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.127-148
    • /
    • 2020
  • The data center is a physical environment facility for accommodating computer systems and related components, and is an essential foundation technology for next-generation core industries such as big data, smart factories, wearables, and smart homes. In particular, with the growth of cloud computing, the proportional expansion of the data center infrastructure is inevitable. Monitoring the health of these data center facilities is a way to maintain and manage the system and prevent failure. If a failure occurs in some elements of the facility, it may affect not only the relevant equipment but also other connected equipment, and may cause enormous damage. In particular, IT facilities are irregular due to interdependence and it is difficult to know the cause. In the previous study predicting failure in data center, failure was predicted by looking at a single server as a single state without assuming that the devices were mixed. Therefore, in this study, data center failures were classified into failures occurring inside the server (Outage A) and failures occurring outside the server (Outage B), and focused on analyzing complex failures occurring within the server. Server external failures include power, cooling, user errors, etc. Since such failures can be prevented in the early stages of data center facility construction, various solutions are being developed. On the other hand, the cause of the failure occurring in the server is difficult to determine, and adequate prevention has not yet been achieved. In particular, this is the reason why server failures do not occur singularly, cause other server failures, or receive something that causes failures from other servers. In other words, while the existing studies assumed that it was a single server that did not affect the servers and analyzed the failure, in this study, the failure occurred on the assumption that it had an effect between servers. In order to define the complex failure situation in the data center, failure history data for each equipment existing in the data center was used. There are four major failures considered in this study: Network Node Down, Server Down, Windows Activation Services Down, and Database Management System Service Down. The failures that occur for each device are sorted in chronological order, and when a failure occurs in a specific equipment, if a failure occurs in a specific equipment within 5 minutes from the time of occurrence, it is defined that the failure occurs simultaneously. After configuring the sequence for the devices that have failed at the same time, 5 devices that frequently occur simultaneously within the configured sequence were selected, and the case where the selected devices failed at the same time was confirmed through visualization. Since the server resource information collected for failure analysis is in units of time series and has flow, we used Long Short-term Memory (LSTM), a deep learning algorithm that can predict the next state through the previous state. In addition, unlike a single server, the Hierarchical Attention Network deep learning model structure was used in consideration of the fact that the level of multiple failures for each server is different. This algorithm is a method of increasing the prediction accuracy by giving weight to the server as the impact on the failure increases. The study began with defining the type of failure and selecting the analysis target. In the first experiment, the same collected data was assumed as a single server state and a multiple server state, and compared and analyzed. The second experiment improved the prediction accuracy in the case of a complex server by optimizing each server threshold. In the first experiment, which assumed each of a single server and multiple servers, in the case of a single server, it was predicted that three of the five servers did not have a failure even though the actual failure occurred. However, assuming multiple servers, all five servers were predicted to have failed. As a result of the experiment, the hypothesis that there is an effect between servers is proven. As a result of this study, it was confirmed that the prediction performance was superior when the multiple servers were assumed than when the single server was assumed. In particular, applying the Hierarchical Attention Network algorithm, assuming that the effects of each server will be different, played a role in improving the analysis effect. In addition, by applying a different threshold for each server, the prediction accuracy could be improved. This study showed that failures that are difficult to determine the cause can be predicted through historical data, and a model that can predict failures occurring in servers in data centers is presented. It is expected that the occurrence of disability can be prevented in advance using the results of this study.

A Study on Startups' Dependence on Business Incubation Centers (창업보육서비스에 따른 입주기업의 창업보육센터 의존도에 관한 연구)

  • Park, JaeSung;Lee, Chul;Kim, JaeJon
    • Korean small business review
    • /
    • v.31 no.2
    • /
    • pp.103-120
    • /
    • 2009
  • As business incubation centers (BICs) have been operating for more than 10 years in Korea, many early stage startups tend to use the services provided by the incubating centers. BICs in Korea have accumulated the knowledge and experience in the past ten years and their services have been considerably improved. The business incubating service has three facets : (1) business infrastructure service, (2) direct service, and (3) indirect service. The mission of BICs is to provide the early stage entrepreneurs with the incubating service in a limited period time to help them grow strong enough to survive the fierce competition after graduating from the incubation. However, the incubating services sometimes fail to foster the independence of new startup companies, and raise the dependence of many companies on BICs. Thus, the dependence on BICs is a very important factor to understand the survival of the incubated startup companies after graduation from BICs. The purpose of this study is to identify the main factors that influence the firm's dependence on BICs and to characterize the relationships among the identified factors. The business incubating service is a core construct of this study. It includes various activities and resources, such as offering the physical facilities, legal service, and connecting them with outside organizations. These services are extensive and take various forms. They are provided by BICs directly or indirectly. Past studies have identified various incubating services and classify them in different ways. Based on the past studies, we classify the business incubating service into three categories as mentioned above : (1) business infrastructure support, (2) direct support, and (3) networking support. The business infrastructure support is to provide the essential resources to start the business, such as physical facilities. The direct support is to offer the business resources available in the BICs, such as human, technical, and administrational resources. Finally, the indirect service was to support the resource in the outside of business incubation center. Dependence is generally defined as the degree to which a client firm needs the resources provided by the service provider in order to achieve its goals. Dependence is generated when a firm recognizes the benefits of interacting with its counterpart. Hence, the more positive outcomes a firm derives from its relationship with the partner, the more dependent on the partner the firm must inevitably become. In business incubating, as a resident firm is incubated in longer period, we can predict that her dependence on BICs would be stronger. In order to foster the independence of the incubated firms, BICs have to be able to manipulate the provision of their services to control the firms' dependence on BICs. Based on the above discussion, the research model for relationships between dependence and its affecting factors was developed. We surveyed the companies residing in BICs to test our research model. The instrument of our study was modified, in part, on the basis of previous relevant studies. For the purposes of testing reliability and validity, preliminary testing was conducted with firms that were residing in BICs and incubated by the BICs in the region of Gwangju and Jeonnam. The questionnaire was modified in accordance with the pre-test feedback. We mailed to all of the firms that had been incubated by the BICs with the help of business incubating managers of each BIC. The survey was conducted over a three week period. Gifts (of approximately ₩10,000 value) were offered to all actively participating respondents. The incubating period was reported by the business incubating managers, and it was transformed using natural logarithms. A total of 180 firms participated in the survey. However, we excluded 4 cases due to a lack of consistency using reversed items in the answers of the companies, and 176 cases were used for the analysis. We acknowledge that 176 samples may not be sufficient to conduct regression analyses with 5 research variables in our study. Each variable was measured through multiple items. We conducted an exploratory factor analysis to assess their unidimensionality. In an effort to test the construct validity of the instruments, a principal component factor analysis was conducted with Varimax rotation. The items correspond well to each singular factor, demonstrating a high degree of convergent validity. As the factor loadings for a variable (or factor) are higher than the factor loadings for the other variables, the instrument's discriminant validity is shown to be clear. Each factor was extracted as expected, which explained 70.97, 66.321, and 52.97 percent, respectively, of the total variance each with eigen values greater than 1.000. The internal consistency reliability of the variables was evaluated by computing Cronbach's alphas. The Cronbach's alpha values of the variables, which ranged from 0.717 to 0.950, were all securely over 0.700, which is satisfactory. The reliability and validity of the research variables are all, therefore, considered acceptable. The effects of dependence were assessed using a regression analysis. The Pearson correlations were calculated for the variables, measured by interval or ratio scales. Potential multicollinearity among the antecedents was evaluated prior to the multiple regression analysis, as some of the variables were significantly correlated with others (e.g., direct service and indirect service). Although several variables show the evidence of significant correlations, their tolerance values range between 0.334 and 0.613, thereby demonstrating that multicollinearity is not a likely threat to the parameter estimates. Checking some basic assumptions for the regression analyses, we decided to conduct multiple regression analyses and moderated regression analyses to test the given hypotheses. The results of the regression analyses indicate that the regression model is significant at p < 0.001 (F = 44.260), and that the predictors of the research model explain 42.6 percent of the total variance. Hypotheses 1, 2, and 3 address the relationships between the dependence of the incubated firms and the business incubating services. Business infrastructure service, direct service, and indirect service are all significantly related with dependence (β = 0.300, p < 0.001; β = 0.230, p < 0.001; β = 0.226, p < 0.001), thus supporting Hypotheses 1, 2, and 3. When the incubating period is the moderator and dependence is the dependent variable, the addition of the interaction terms with the antecedents to the regression equation yielded a significant increase in R2 (F change = 2.789, p < 0.05). In particular, direct service and indirect service exert different effects on dependence. Hence, the results support Hypotheses 5 and 6. This study provides several strategies and specific calls to action for BICs, based on our empirical findings. Business infrastructure service has more effect on the firm's dependence than the other two services. The introduction of an additional high charge rate for a graduated but allowed to stay in the BIC is a basic and legitimate condition for the BIC to control the firm's dependence. We detected the differential effects of direct and indirect services on the firm's dependence. The firms with long incubating period are more sensitive to indirect service positively, and more sensitive to direct service negatively, when assessing their levels of dependence. This implies that BICs must develop a strategy on the basis of a firm's incubating period. Last but not least, it would be valuable to discover other important variables that influence the firm's dependence in the future studies. Moreover, future studies to explain the independence of startup companies in BICs would also be valuable.