• Title/Summary/Keyword: User Information Needs

Search Result 1,259, Processing Time 0.027 seconds

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Semi-supervised learning for sentiment analysis in mass social media (대용량 소셜 미디어 감성분석을 위한 반감독 학습 기법)

  • Hong, Sola;Chung, Yeounoh;Lee, Jee-Hyong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.5
    • /
    • pp.482-488
    • /
    • 2014
  • This paper aims to analyze user's emotion automatically by analyzing Twitter, a representative social network service (SNS). In order to create sentiment analysis models by using machine learning techniques, sentiment labels that represent positive/negative emotions are required. However it is very expensive to obtain sentiment labels of tweets. So, in this paper, we propose a sentiment analysis model by using self-training technique in order to utilize "data without sentiment labels" as well as "data with sentiment labels". Self-training technique is that labels of "data without sentiment labels" is determined by utilizing "data with sentiment labels", and then updates models using together with "data with sentiment labels" and newly labeled data. This technique improves the sentiment analysis performance gradually. However, it has a problem that misclassifications of unlabeled data in an early stage affect the model updating through the whole learning process because labels of unlabeled data never changes once those are determined. Thus, labels of "data without sentiment labels" needs to be carefully determined. In this paper, in order to get high performance using self-training technique, we propose 3 policies for updating "data with sentiment labels" and conduct a comparative analysis. The first policy is to select data of which confidence is higher than a given threshold among newly labeled data. The second policy is to choose the same number of the positive and negative data in the newly labeled data in order to avoid the imbalanced class learning problem. The third policy is to choose newly labeled data less than a given maximum number in order to avoid the updates of large amount of data at a time for gradual model updates. Experiments are conducted using Stanford data set and the data set is classified into positive and negative. As a result, the learned model has a high performance than the learned models by using "data with sentiment labels" only and the self-training with a regular model update policy.

Resolution Method of Hazard Factor for Life Safety in Rental Housing Complex (임대주택단지의 생활안전 위해요인 해소방안)

  • Sohn, Jeong-Rak;Cho, Gun-Hee;Kim, Jin-Won;Song, Sang-Hoon
    • Land and Housing Review
    • /
    • v.8 no.1
    • /
    • pp.1-11
    • /
    • 2017
  • The government has been constructing and supplying public rental housing to ordinary people in order to stabilize housing since 1989. However, the public rental houses initially supplied to ordinary people are at high risk for safety accidents due to the deterioration of the facilities. Therefore, this study is aimed to propose a solution to solve the life safety hazards of the old rental housing complex as a follow-up study of Analysis of Accident Patterns and Hazard Factor for Life Safety in Rental Housing Complex. Types of life safety accidents that occur in public rental housing complexes are sliding, falling, crash, falling objects, breakage, fire accidents, traffic accidents and criminal accidents. The types of safety accidents that occur in rental housing complexes analyzed in this study are sliding, crashes, falling objects, and fire accidents. Although the incidence of safety accidents such as falling, breakage, traffic accidents and crime accidents in public rental housing complexes is low, these types are likely to cause safety accidents. The method of this study utilized interviews and seminar results, and it suggested ways to solve the life safety hazards in rental housing complexes. Interviews were conducted with residents and managers of rental housing complexes. Seminars were conducted twice with experts in construction, maintenance, asset management, housing welfare and safety. Through interviews and seminars, this study categorizes the life safety hazards that occur in rental housing complexes by types of accidents and suggests ways to resolve them as follows. (1) sliding ; use of flooring materials with high friction coefficient, installation of safety devices such as safety handles, implementation of maintenance, safety inspections and safety education, etc. (2) falling ; supplementation of safety facilities, Improvement of the design method of the falling parts, Safety education, etc. (3) crash ; increase the effective width of the elevator door, increase the effective width of the lamp, improve the lamp type (U type ${\rightarrow}$ I type), etc. (4) falling objects and breakage ; design of furniture considering the usability of residents, replacement of old facilities, enhancement of safety consciousness of residents, safety education, etc. (5) fire accidents ; installation of fire safety equipment, improvement by emergency evacuation, safety inspection and safety education, etc. (6) traffic accidents ; securing parking spaces, installing safety facilities, conducting safety education, etc. (7) criminal accidents; improvement of CCTV pixels, installation of street lights, removal of blind spots in the complex, securing of security, etc. The roles of suppliers, administrators and users of public rental housing proposed in this study are summarized as follows. Suppliers of rental housing should take into consideration the risk factors that may arise not only in the design and construction but also in the maintenance phase and should consider the possibility of easily repairing old facilities considering the life cycle of rental housing. Next, Administrators of rental housing should consider the safety of the users of the rental housing, conduct safety checks from time to time, and immediately remove any hazardous elements within the apartment complex. Finally, the users of the rental housing needs to form a sense of ownership of all the facilities in the rental housing complex, and efforts should be made not to cause safety accidents caused by the user's carelessness. The results of this study can provide the necessary information to enable residents of rental housing complexes to live a safe and comfortable residential life. It is also expected that this information will be used to reduce the incidence of safety accidents in rental housing complexes.

Topographic Factors Computation in Island: A Comparison of Different Open Source GIS Programs (오픈소스 GIS 프로그램의 지형인자 계산 비교: 도서지역 경사도와 지형습윤지수 중심으로)

  • Lee, Bora;Lee, Ho-Sang;Lee, Gwang-Soo
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.5_1
    • /
    • pp.903-916
    • /
    • 2021
  • An area's topography refers to the shape of the earth's surface, described by its elevation, slope, and aspect, among other features. The topographical conditions determine energy flowsthat move water and energy from higher to lower elevations, such as how much solar energy will be received and how much wind or rain will affect it. Another common factor, the topographic wetness index (TWI), is a calculation in digital elevation models of the tendency to accumulate water per slope and unit area, and is one of the most widely referenced hydrologic topographic factors, which helps explain the location of forest vegetation. Analyses of topographical factors can be calculated using a geographic information system (GIS) program based on digital elevation model (DEM) data. Recently, a large number of free open source software (FOSS) GIS programs are available and developed for researchers, industries, and governments. FOSS GIS programs provide opportunitiesfor flexible algorithms customized forspecific user needs. The majority of biodiversity in island areas exists at about 20% higher elevations than in land ecosystems, playing an important role in ecological processes and therefore of high ecological value. However, island areas are vulnerable to disturbances and damage, such as through climate change, environmental pollution, development, and human intervention, and lacks systematic investigation due to geographical limitations (e.g. remoteness; difficulty to access). More than 4,000 of Korea's islands are within a few hours of its coast, and 88% are uninhabited, with 52% of them forested. The forest ecosystems of islands have fewer encounters with human interaction than on land, and therefore most of the topographical conditions are formed naturally and affected more directly by weather conditions or the environment. Therefore, the analysis of forest topography in island areas can be done more precisely than on its land counterparts, and therefore has become a major focus of attention in Korea. This study is focused on calculating the performance of different topographical factors using FOSS GIS programs. The test area is the island forests in Korea's south and the DEM of the target area was processed with GRASS GIS and SAGA GIS. The final slopes and TWI maps were produced as comparisons of the differences between topographic factor calculations of each respective FOSS GIS program. Finally, the merits of each FOSS GIS program used to calculate the topographic factors is discussed.

Current Trends for National Bibliography through Analyzing the Status of Representative National Bibliographies (주요국 국가서지 현황조사를 통한 국가서지의 최신 경향 분석)

  • Lee, Mihwa;Lee, Ji-Won
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.32 no.1
    • /
    • pp.35-57
    • /
    • 2021
  • This paper is to grasp the current trends of national bibliographies through analyzing representative national bibliographies using literature review, analysis of national bibliographies' web pages and survey. First, in order to conform to the definition of a national bibliography as a record of a national publication, it attempts to include a variety of materials from print to electronic resources, but in reality it cannot contain all the materials, so there are exceptions. It is impossible to create a general selection guide for national bibliography coverage, and a plan that reflects the national characteristics and prepares a valid and comprehensive coverage based on analysis is needed. Second, cooperation with publishers and libraries is being made to efficiently generate national bibliography. For the efficiency of national bibliography generation, changes should be sought such as the standardization and consistency, the collection level metadata description for digital resources, and the creation of national bibliography using linked data. Third, national bibliography is published through the national bibliographic online search system, linked data search, MARC download using PDF, OAI-PMH, SRU, Z39.50, and mass download in RDF/XML format, and is integrated with the online public access catalog or also built separately. Above all, national bibliographies and online public access catalogs need to be built in a way of data reuse through an integrated library system. Fourth, as a differentiated function for national bibliography, various services such as user tagging and national bibliographic statistics are provided along with various browsing functions. In addition, services of analysis of national bibliographic big data, links to electronic publications, and mass download of linked data should be provided, and it is necessary to identify users' needs and provide open services that reflect them in order to develop differentiated services. Through the current trends and considerations of the national bibliographies analyzed in this study, it will be possible to explore changes in national and international national bibliography.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

Development Strategy for New Climate Change Scenarios based on RCP (온실가스 시나리오 RCP에 대한 새로운 기후변화 시나리오 개발 전략)

  • Baek, Hee-Jeong;Cho, ChunHo;Kwon, Won-Tae;Kim, Seong-Kyoun;Cho, Joo-Young;Kim, Yeongsin
    • Journal of Climate Change Research
    • /
    • v.2 no.1
    • /
    • pp.55-68
    • /
    • 2011
  • The Intergovernmental Panel on Climate Change(IPCC) has identified the causes of climate change and come up with measures to address it at the global level. Its key component of the work involves developing and assessing future climate change scenarios. The IPCC Expert Meeting in September 2007 identified a new greenhouse gas concentration scenario "Representative Concentration Pathway(RCP)" and established the framework and development schedules for Climate Modeling (CM), Integrated Assessment Modeling(IAM), Impact Adaptation Vulnerability(IAV) community for the fifth IPCC Assessment Reports while 130 researchers and users took part in. The CM community at the IPCC Expert Meeting in September 2008, agreed on a new set of coordinated climate model experiments, the phase five of the Coupled Model Intercomparison Project(CMIP5), which consists of more than 30 standardized experiment protocols for the shortterm and long-term time scales, in order to enhance understanding on climate change for the IPCC AR5 and to develop climate change scenarios and to address major issues raised at the IPCC AR4. Since early 2009, fourteen countries including the Korea have been carrying out CMIP5-related projects. Withe increasing interest on climate change, in 2009 the COdinated Regional Downscaling EXperiment(CORDEX) has been launched to generate regional and local level information on climate change. The National Institute of Meteorological Research(NIMR) under the Korea Meteorological Administration (KMA) has contributed to the IPCC AR4 by developing climate change scenarios based on IPCC SRES using ECHO-G and embarked on crafting national scenarios for climate change as well as RCP-based global ones by engaging in international projects such as CMIP5 and CORDEX. NIMR/KMA will make a contribution to drawing the IPCC AR5 and will develop national climate change scenarios reflecting geographical factors, local climate characteristics and user needs and provide them to national IAV and IAM communites to assess future regional climate impacts and take action.

Economic Impact of HEMOS-Cloud Services for M&S Support (M&S 지원을 위한 HEMOS-Cloud 서비스의 경제적 효과)

  • Jung, Dae Yong;Seo, Dong Woo;Hwang, Jae Soon;Park, Sung Uk;Kim, Myung Il
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.10 no.10
    • /
    • pp.261-268
    • /
    • 2021
  • Cloud computing is a computing paradigm in which users can utilize computing resources in a pay-as-you-go manner. In a cloud system, resources can be dynamically scaled up and down to the user's on-demand so that the total cost of ownership can be reduced. The Modeling and Simulation (M&S) technology is a renowned simulation-based method to obtain engineering analysis and results through CAE software without actual experimental action. In general, M&S technology is utilized in Finite Element Analysis (FEA), Computational Fluid Dynamics (CFD), Multibody dynamics (MBD), and optimization fields. The work procedure through M&S is divided into pre-processing, analysis, and post-processing steps. The pre/post-processing are GPU-intensive job that consists of 3D modeling jobs via CAE software, whereas analysis is CPU or GPU intensive. Because a general-purpose desktop needs plenty of time to analyze complicated 3D models, CAE software requires a high-end CPU and GPU-based workstation that can work fluently. In other words, for executing M&S, it is absolutely required to utilize high-performance computing resources. To mitigate the cost issue from equipping such tremendous computing resources, we propose HEMOS-Cloud service, an integrated cloud and cluster computing environment. The HEMOS-Cloud service provides CAE software and computing resources to users who want to experience M&S in business sectors or academics. In this paper, the economic ripple effect of HEMOS-Cloud service was analyzed by using industry-related analysis. The estimated results of using the experts-guided coefficients are the production inducement effect of KRW 7.4 billion, the value-added effect of KRW 4.1 billion, and the employment-inducing effect of 50 persons per KRW 1 billion.

A Study on World University Evaluation Systems: Focusing on U-Multirank of the European Union (유럽연합의 세계 대학 평가시스템 '유-멀티랭크' 연구)

  • Lee, Tae-Young
    • Korean Journal of Comparative Education
    • /
    • v.27 no.4
    • /
    • pp.187-209
    • /
    • 2017
  • The purpose of this study was to highlight the necessity of a conceptual reestablishment of world university evaluations. The hitherto most well-known and validated world university evaluation systems such as Times Higher Education (THE), Quacquarelli Symonds (QS) or Academic Ranking of World Universities (ARWU) primarily assess big universities with quantitative evaluation indicators and performance results in the rankings. Those Systems have instigated a kind of elitism in higher education and neglect numerous small or local institutions of higher education, instead of providing stakeholders with comprehensive information about the real possibilities of tertiary education so that they can choose an institution that is individually tailored to their needs. Also, the management boards of universities and policymakers in higher education have partly been manipulated by and partly taken advantage of the elitist ranking systems with an economic emphasis, as indicated by research-centered evaluations and industry-university cooperation. To supplement such educational defects and to redress the lack of world university evaluation systems, a new system called 'U-Multirank' has been implemented with the financial support of the European Commission since 2012. U-Multirank was designed and is enforced by an international team of project experts led by CHE(Centre for Higher Education/Germany), CHEPS(Center for Higher Education Policy Studies/Netherlands) and CWTS(Centre for Science and Technology Studies at Leiden University/Netherlands). The significant features of U-Multirank, compared with e.g., THE and ARWU, are its qualitative, multidimensional, user-oriented and individualized assessment methods. Above all, its website and its assessment results, based on a mobile operating system and designed simply for international users, present a self-organized and evolutionary model of world university evaluation systems in the digital and global era. To estimate the universal validity of the redefinition of the world university evaluation system using U-Multirank, an epistemological approach will be used that relies on Edgar Morin's Complexity Theory and Karl Popper's Philosophy of Science.