• Title/Summary/Keyword: Public Big data

Search Result 709, Processing Time 0.099 seconds

Development of Topic Trend Analysis Model for Industrial Intelligence using Public Data (텍스트마이닝을 활용한 공개데이터 기반 기업 및 산업 토픽추이분석 모델 제안)

  • Park, Sunyoung;Lee, Gene Moo;Kim, You-Eil;Seo, Jinny
    • Journal of Technology Innovation
    • /
    • v.26 no.4
    • /
    • pp.199-232
    • /
    • 2018
  • There are increasing needs for understanding and fathoming of business management environment through big data analysis at industrial and corporative level. The research using the company disclosure information, which is comprehensively covering the business performance and the future plan of the company, is getting attention. However, there is limited research on developing applicable analytical models leveraging such corporate disclosure data due to its unstructured nature. This study proposes a text-mining-based analytical model for industrial and firm level analyses using publicly available company disclousre data. Specifically, we apply LDA topic model and word2vec word embedding model on the U.S. SEC data from the publicly listed firms and analyze the trends of business topics at the industrial and corporate levels. Using LDA topic modeling based on SEC EDGAR 10-K document, whole industrial management topics are figured out. For comparison of different pattern of industries' topic trend, software and hardware industries are compared in recent 20 years. Also, the changes of management subject at firm level are observed with comparison of two companies in software industry. The changes of topic trends provides lens for identifying decreasing and growing management subjects at industrial and firm level. Mapping companies and products(or services) based on dimension reduction after using word2vec word embedding model and principal component analysis of 10-K document at firm level in software industry, companies and products(services) that have similar management subjects are identified and also their changes in decades. For suggesting methodology to develop analysis model based on public management data at industrial and corporate level, there may be contributions in terms of making ground of practical methodology to identifying changes of managements subjects. However, there are required further researches to provide microscopic analytical model with regard to relation of technology management strategy between management performance in case of related to various pattern of management topics as of frequent changes of management subject or their momentum. Also more studies are needed for developing competitive context analysis model with product(service)-portfolios between firms.

Current Trends for National Bibliography through Analyzing the Status of Representative National Bibliographies (주요국 국가서지 현황조사를 통한 국가서지의 최신 경향 분석)

  • Lee, Mihwa;Lee, Ji-Won
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.32 no.1
    • /
    • pp.35-57
    • /
    • 2021
  • This paper is to grasp the current trends of national bibliographies through analyzing representative national bibliographies using literature review, analysis of national bibliographies' web pages and survey. First, in order to conform to the definition of a national bibliography as a record of a national publication, it attempts to include a variety of materials from print to electronic resources, but in reality it cannot contain all the materials, so there are exceptions. It is impossible to create a general selection guide for national bibliography coverage, and a plan that reflects the national characteristics and prepares a valid and comprehensive coverage based on analysis is needed. Second, cooperation with publishers and libraries is being made to efficiently generate national bibliography. For the efficiency of national bibliography generation, changes should be sought such as the standardization and consistency, the collection level metadata description for digital resources, and the creation of national bibliography using linked data. Third, national bibliography is published through the national bibliographic online search system, linked data search, MARC download using PDF, OAI-PMH, SRU, Z39.50, and mass download in RDF/XML format, and is integrated with the online public access catalog or also built separately. Above all, national bibliographies and online public access catalogs need to be built in a way of data reuse through an integrated library system. Fourth, as a differentiated function for national bibliography, various services such as user tagging and national bibliographic statistics are provided along with various browsing functions. In addition, services of analysis of national bibliographic big data, links to electronic publications, and mass download of linked data should be provided, and it is necessary to identify users' needs and provide open services that reflect them in order to develop differentiated services. Through the current trends and considerations of the national bibliographies analyzed in this study, it will be possible to explore changes in national and international national bibliography.

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques (텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석)

  • Bae, Jung-Hwan;Son, Ji-Eun;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.141-156
    • /
    • 2013
  • Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.

Data-Driven Approach to Identify Research Topics for Science and Technology Diplomacy (과학외교를 위한 데이터기반의 연구주제선정 방법)

  • Yeo, Woon-Dong;Kim, Seonho;Lee, BangRae;Noh, Kyung-Ran
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.11
    • /
    • pp.216-227
    • /
    • 2020
  • In science and technology diplomacy, major countries actively utilize their capabilities in science and technology for public diplomacy, especially for promoting diplomatic relations with politically sensitive regions and countries. Recently, with an increase in the influence of science and technology on national development, interest in science and technology diplomacy has increased. So far, science and technology diplomacy has relied on experts to find research topics that are of common interest to both the countries. However, this method has various problems such as the bias arising from the subjective judgment of experts, the attribution of the halo effect to famous researchers, and the use of different criteria for different experts. This paper presents an objective data-based approach to identify and recommend research topics to support science and technology diplomacy without relying on the expert-based approach. The proposed approach is based on big data analysis that uses deep-learning techniques and bibliometric methods. The Scopus database is used to find proper topics for collaborative research between two countries. This approach has been used to support science and technology diplomacy between Korea and Hungary and has raised expectations of policy makers. This paper finally discusses aspects that should be focused on to improve the system in the future.

A Methodology for Estimating Large Scale Dynamic O/D of Commuter Working Trip (대규모 동적 O/D 생성을 위한 추정 방법론 연구: 첨두 출근통행을 기준으로)

  • HAN, He;HONG, Kiman;KIM, Taegyun;WHANG, Junmun;HONG, Young Suk;CHO, Joong Rae
    • Journal of Korean Society of Transportation
    • /
    • v.36 no.3
    • /
    • pp.203-215
    • /
    • 2018
  • This study suggests a method to construct large scale dynamic O/D reflecting the characteristic that the passengers' travel patterns change according to the land use patterns of the destination. There are limitations in the existing research about dynamic O/D estimation method, such as the difficulty of collecting data, which can be applied only to a small area, or limiting to a specific transportation network such as highway networks or public transportation networks. In this paper, we propose a method to estimate dynamic O/D without limitation of analysis area based on transportation resources that can be easily collected and used according to the big data era. Clustering analysis was used to calculate the departure time trip distribution ratio based on arrival time and departure time trip distribution function was estimated by each cluster. As a result of the comparison test with the survey data, the estimated distribution function was statistically significant.

Arrival Time Estimation for Bus Information System Using Hidden Markov Model (은닉 마르코프 모델을 이용한 버스 정보 시스템의 도착 시간 예측)

  • Park, Chul Young;Kim, Hong Geun;Shin, Chang Sun;Cho, Yong Yun;Park, Jang Woo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.4
    • /
    • pp.189-196
    • /
    • 2017
  • BIS(Bus Information System) provides the different information related to buses including predictions of arriving times at stations. BIS have been deployed almost all cities in our country and played active roles to improve the convenience of public transportation systems. Moving average filters, Kalman filter and regression models have been representative in forecasting the arriving times of buses in current BIS. The accuracy in prediction of arriving times depends largely on the forecasting algorithms and traffic conditions considered when forecasting in BIS. In present BIS, the simple prediction algorithms are used only considering the passage times and distances between stations. The forecasting of arrivals, however, have been influenced by the traffic conditions such as traffic signals, traffic accidents and pedestrians ets., and missing data. To improve the accuracy of bus arriving estimates, there are big troubles in building models including the above problems. Hidden Markov Models have been effective algorithms considering various restrictions above. So, we have built the HMM forecasting models for bus arriving times in the current BIS. When building models, the data collected from Sunchean City at 2015 have been utilized. There are about 2298 stations and 217 routes in Suncheon city. The models are developed differently week days and weekend. And then the models are conformed with the data from different districts and times. We find that our HMM models can provide more accurate forecasting than other existing methods like moving average filters, Kalmam filters, or regression models. In this paper, we propose Hidden Markov Model to obtain more precise and accurate model better than Moving Average Filter, Kalman Filter and regression model. With the help of Hidden Markov Model, two different sections were used to find the pattern and verified using Bootstrap process.

A Study on the Improvement of RIMGIS for an Efficient River Information Service (효율적인 하천정보 서비스를 위한 RIMGIS 개선방안 연구)

  • Shin, Hyung-Jin;Chae, Hyo-Sok;Hwang, Eui-Ho;Lim, Kwang-Suop
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.16 no.1
    • /
    • pp.15-25
    • /
    • 2013
  • The RIMGIS(River Information Management GIS) has been developed since 2000 for public service and practical applications of related works after the standardization of national river data such as the river facility register report, river survey map, attached map, and etc. The RIMGIS has been improved in order to respond proactively to change in the information environment. Recently, Smart River-based river information services and related data have become so large as to be overwhelming, making necessary improvements in managing big data. In this study a plan was suggested both to respond to these changes in the information environment and to provide a future Smart River-based river information service by understanding the current state of RIMGIS, improving RIMGIS itself, redesigning the database, developing distribution, and integrating river information systems. Therefore, primary and foreign key, which can distinguish attribute information and entity linkages, were redefined to increase the usability of RIMGIS. Database construction of attribute information and entity relationship diagram have been newly redefined to redesign linkages among tables from the perspective of a river standard database. In addition, this study was undertaken to expand the current supplier-oriented operating system to a demand-oriented operating system by establishing an efficient management of river-related information and a utilization system capable of adapting to the changes of a river management paradigm.

An investigation on the Improvement of the Working Environment Measurement Reporting Policy (작업환경측정 보고제도 개선 방안 도출을 위한 조사 연구)

  • Lim, Dae Sung;Kim, Chi-Nyon;Lee, Seung kil;Park, Jung-Keun;Kim, Ki-Youn
    • Journal of Korean Society of Occupational and Environmental Hygiene
    • /
    • v.32 no.2
    • /
    • pp.172-181
    • /
    • 2022
  • Objectives: In order to reduce the burden on employers and increase the reliability of measurement results, improvements to the provisions related to the work environment measurement reporting system, such as the current Occupational Safety and Health Act and its Enforcement Rules, are planned. This study aimed to suggest improvements for the work environment measurement reporting system through a survey and Delphi investigation. Method: This survey included workplaces (health managers), national institutions (the Ministry of Employment and Labor) that use the results of the work environment measurement reporting system for policy and supervision purposes, and work environment measurement institutions that enter the results were included. In addition to the survey, we tried to derive results through meetings with stakeholders and expert advisory meetings. Results: It is difficult to abolish or partially improve the reporting system under the Enforcement Regulations of the Occupational Safety and Health Act at this point because the opinions of workplaces, supervisory agencies, and measuring agencies differ in terms of its intended purpose and use. In the case of high-exposure harmful factors (over 50% on the basis of exposure) in the "comprehensive opinion" described in the work environment measurement results table, it is necessary to insert unit of work with exposed harmful factors, exposure factors, and current conditions in checklists or tables so that they can be reflected in government policies. In the case of workplaces that are feared to be highly exposed to substances subject to measurement, it seems desirable to improve them so that industrial health instructors registered with the Korea Safety and Health Agency or local labor offices can provide technical guidance. As an improvement plan to increase the reliability of data and the use of big data, it is necessary to improve the input method for processes and jobs. Conclusion: The laws and regulations of the work environment measurement reporting system are difficult to revise due to a lack of consensus among current stakeholders, but improvements can be achieved by improving the Ministry of Employment and Labor's notifications and other means. In addition, in order to effectively utilize the data from the K2B system, it is necessary to improve the input method for processes and jobs.

Impacts of Social Distancing for COVID-19 on Urban Space Use in Seoul (COVID-19 사회적 거리두기가 도시공간이용에 미치는 영향)

  • Park, Hong Il;Lee, Sangkyeong
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.39 no.6
    • /
    • pp.457-467
    • /
    • 2021
  • This paper aims to analyze changes in urban space use due to social distancing measures for COVID-19 using de facto population data in Seoul during daytime, which is estimated by Seoul Metropolitan Government and telecommunication company of KT using public big data and LTE signal data. The result of kernel density estimation and spatial autocorrelation analysis shows that the distribution patterns of de facto population in 2019 and 2020 were generally similar. This is a result of showing that the government's social distancing measures enabled a certain level of normal activities while suppressing the spread of COVID-19. However, analyzing de facto population subtracting 2019 from 2020 showed different results at the micro level. De facto population decreased in commercial areas but increased in residential areas. This means that COVID-19 social distancing measures had spatially uneven effect. The results of analyzing the effect of regional, land use, economic, educational, and accessibility characteristics on the changes of de facto population using spatial regression analysis are as follows. The higher the density of commercial facilities, the more businesses subject to regulations and schools and universities that require non-face-to-face classes, the more de facto population decreased. Conversely, it was found that de facto population increased in areas with many houses and parks due to telecommuting.

Smart Store in Smart City: The Development of Smart Trade Area Analysis System Based on Consumer Sentiments (Smart Store in Smart City: 소비자 감성기반 상권분석 시스템 개발)

  • Yoo, In-Jin;Seo, Bong-Goon;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.25-52
    • /
    • 2018
  • This study performs social network analysis based on consumer sentiment related to a location in Seoul using data reflecting consumers' web search activities and emotional evaluations associated with commerce. The study focuses on large commercial districts in Seoul. In addition, to consider their various aspects, social network indexes were combined with the trading area's public data to verify factors affecting the area's sales. According to R square's change, We can see that the model has a little high R square value even though it includes only the district's public data represented by static data. However, the present study confirmed that the R square of the model combined with the network index derived from the social network analysis was even improved much more. A regression analysis of the trading area's public data showed that the five factors of 'number of market district,' 'residential area per person,' 'satisfaction of residential environment,' 'rate of change of trade,' and 'survival rate over 3 years' among twenty two variables. The study confirmed a significant influence on the sales of the trading area. According to the results, 'residential area per person' has the highest standardized beta value. Therefore, 'residential area per person' has the strongest influence on commercial sales. In addition, 'residential area per person,' 'number of market district,' and 'survival rate over 3 years' were found to have positive effects on the sales of all trading area. Thus, as the number of market districts in the trading area increases, residential area per person increases, and as the survival rate over 3 years of each store in the trading area increases, sales increase. On the other hand, 'satisfaction of residential environment' and 'rate of change of trade' were found to have a negative effect on sales. In the case of 'satisfaction of residential environment,' sales increase when the satisfaction level is low. Therefore, as consumer dissatisfaction with the residential environment increases, sales increase. The 'rate of change of trade' shows that sales increase with the decreasing acceleration of transaction frequency. According to the social network analysis, of the 25 regional trading areas in Seoul, Yangcheon-gu has the highest degree of connection. In other words, it has common sentiments with many other trading areas. On the other hand, Nowon-gu and Jungrang-gu have the lowest degree of connection. In other words, they have relatively distinct sentiments from other trading areas. The social network indexes used in the combination model are 'density of ego network,' 'degree centrality,' 'closeness centrality,' 'betweenness centrality,' and 'eigenvector centrality.' The combined model analysis confirmed that the degree centrality and eigenvector centrality of the social network index have a significant influence on sales and the highest influence in the model. 'Degree centrality' has a negative effect on the sales of the districts. This implies that sales decrease when holding various sentiments of other trading area, which conflicts with general social myths. However, this result can be interpreted to mean that if a trading area has low 'degree centrality,' it delivers unique and special sentiments to consumers. The findings of this study can also be interpreted to mean that sales can be increased if the trading area increases consumer recognition by forming a unique sentiment and city atmosphere that distinguish it from other trading areas. On the other hand, 'eigenvector centrality' has the greatest effect on sales in the combined model. In addition, the results confirmed a positive effect on sales. This finding shows that sales increase when a trading area is connected to others with stronger centrality than when it has common sentiments with others. This study can be used as an empirical basis for establishing and implementing a city and trading area strategy plan considering consumers' desired sentiments. In addition, we expect to provide entrepreneurs and potential entrepreneurs entering the trading area with sentiments possessed by those in the trading area and directions into the trading area considering the district-sentiment structure.