• Title/Summary/Keyword: news topic

Search Result 239, Processing Time 0.024 seconds

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

A Quantitative Analysis of Classification Classes and Classified Information Resources of Directory (디렉터리 서비스 분류항목 및 정보자원의 계량적 분석)

  • Kim, Sung-Won
    • Journal of Information Management
    • /
    • v.37 no.1
    • /
    • pp.83-103
    • /
    • 2006
  • This study analyzes the classification schemes and classified information resources of the directory services provided by major web portals to complement keyword-based retrieval. Specifically, this study intends to quantitatively analyze the topic categories, the information resources by subject, and the information resources classified by the topic categories of three directories, Yahoo, Naver, and Empas. The result of this analysis reveals some differences among directory services. Overall, these directories show different ratios of referred categories to original categories depending on the subject area, and the categories regarded as format-based show the highest proportion of referred categories. In terms of the total amount of classified information resources, Yahoo has the largest number of resources. The directories compared have different amounts of resources depending on the subject area. The quantitative analysis of resources classified by the specific category is performed on the class of 'News & Media'. The result reveals that Naver and Empas contain overly specified categories compared to Yahoo, as far as the number of information resources categorized is concerned. Comparing the depth of the categories assigned by the three directories to the same information resources, it is found that, on average, Yahoo assigns one-step further segmented divisions than the other two directories to the identical resources.

Multilingual Story Link Detection based on Properties of Event Terms (사건 어휘의 특성을 반영한 다국어 사건 연결 탐색)

  • Lee Kyung-Soon
    • The KIPS Transactions:PartB
    • /
    • v.12B no.1 s.97
    • /
    • pp.81-90
    • /
    • 2005
  • In this paper, we propose a novel approach which models multilingual story link detection by adapting the features such as timelines and multilingual spaces as weighting components to give distinctive weights to terms related to events. On timelines term significance is calculated by comparing term distribution of the documents on that day with that on the total document collection reported, and used to represent the document vectors on that day. Since two languages can provide more information than one language, term significance is measured on each language space and used to refer the other language space as a bridge on multilingual spaces. Evaluating the method on Korean and Japanese news articles, our method achieved $14.3{\%}\;and\;16.7{\%}$ improvement for mono- and multi-lingual story pairs, and for multilingual story pairs, respectively. By measuring the space density, the proposed weighting components are verified with a high density of the intra-event stories and a low density of the inter-events stories. This result indicates that the proposed method is helpful for multilingual story link detection.

Trend Properties and a Ranking Method for Automatic Trend Analysis (자동 트렌드 탐지를 위한 속성의 정의 및 트렌드 순위 결정 방법)

  • Oh, Heung-Seon;Choi, Yoon-Jung;Shin, Wook-Hyun;Jeong, Yoon-Jae;Myaeng, Sung-Hyon
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.3
    • /
    • pp.236-243
    • /
    • 2009
  • With advances in topic detection and tracking(TDT), automatic trend analysis from a collection of time-stamped documents, like patents, news papers, and blog pages, is a challenging research problem. Past research in this area has mainly focused on showing a trend line over time of a given concept by measuring the strength of trend-associated term frequency information. for detection of emerging trends, either a simple criterion such as frequency change was used, or an overall comparison was made against a training data. We note that in order to show most salient trends detected among many possibilities, it is critical to devise a ranking function. To this end, we define four properties(change, persistency, stability and volume) of trend lines drawn from frequency information, to quantify various aspects of trends, and propose a method by which trend lines can be ranked. The properties are examined individually and in combination in a series of experiments for their validity using the ranking algorithm. The results show that a judicious combination of the four properties is a better indicator for salient trends than any single criterion used in the past for ranking or detecting emerging trends.

A Study on the Analysis of Related Information through the Establishment of the National Core Technology Network: Focused on Display Technology (국가핵심기술 관계망 구축을 통한 연관정보 분석연구: 디스플레이 기술을 중심으로)

  • Pak, Se Hee;Yoon, Won Seok;Chang, Hang Bae
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.2
    • /
    • pp.123-141
    • /
    • 2021
  • As the dependence of technology on the economic structure increases, the importance of National Core Technology is increasing. However, due to the nature of the technology itself, it is difficult to determine the scope of the technology to be protected because the scope of the relation is abstract and information disclosure is limited due to the nature of the National Core Technology. To solve this problem, we propose the most appropriate literature type and method of analysis to distinguish important technologies related to National Core Technology. We conducted a pilot test to apply TF-IDF, and LDA topic modeling, two techniques of text mining analysis for big data analysis, to four types of literature (news, papers, reports, patents) collected with National Core Technology keywords in the field of Display industry. As a result, applying LDA theme modeling to patent data are highly relevant to National Core Technology. Important technologies related to the front and rear industries of displays, including OLEDs and microLEDs, were identified, and the results were visualized as networks to clarify the scope of important technologies associated with National Core Technology. Throughout this study, we have clarified the ambiguity of the scope of association of technologies and overcome the limited information disclosure characteristics of national core technologies.

Text Mining Analysis of Media Coverage of Maritime Sports: Perceptions of Yachting, Rowing, and Canoeing (텍스트마이닝을 활용한 해양스포츠에 대한 언론 보도기사 분석: 요트, 조정, 카누를 중심으로)

  • Ji-Hyeon Kim;Bo-Kyeong Kim
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.29 no.6
    • /
    • pp.609-619
    • /
    • 2023
  • This study aimed to investigate the formation of the social perception of domestic maritime sports using text mining analysis of keywords and topics from domestic media coverage over the past 10 years related to representative maritime sports, including yachting, rowing, and canoeing. The results are as follows: First, term frequency (TF) and word cloud analyses identified the top keywords: "maritime," "competition," "experience," "tourism," "world," "yachting," "canoeing," "leisure," and "participation." Second, semantic network analysis revealed that yachting was correlated with terms like "maritime," "industry," "competition," "leisure," "tourism," "boat," "facilities," and "business"; rowing with terms like "competition" and "Chungju"; and canoeing with terms like "maritime," "competition," "experience," "leisure," and "tourism." Third, topic modeling analysis indicated that yachting, rowing, and canoeing are perceived as elite sports and maritime leisure sports. However, the perception of these sports has been demonstrated to have little impact on society, public opinion, and social transformation. In summary, when considering these results comprehensively, it can be concluded that yachting and canoeing have gradually shifted from being perceived as elite sports to essential elements of the maritime leisure industry. Contrariwise, rowing remains primarily associated with elite sports, and its popularization as a maritime leisure sport appears limited at this time.

A Study on Trends of Key Issues in Port Safety at Busan Port (부산항 항만안전 주요 이슈 동향에 관한 연구)

  • Jeong-Min Lee;Do-Yeon Ha;Joo-Hye Kim
    • Journal of Navigation and Port Research
    • /
    • v.48 no.1
    • /
    • pp.34-48
    • /
    • 2024
  • As global supply chain risks proliferate unpredictably, the high interdependence of port and logistics industry intensifies the risk burden. This study conducted fundamental research to explore diverse safety issues in domestic ports. Utilizing news article data about Busan Port, we employed LDA topic modeling and time-series linear regression to understand key safety trends. Over the past 30 years, Busan Port faced nine major safety issues-maritime safety, import cargo inspection, labor strikes, and natural disasters emerged cyclically. Major port safety issues in Busan Port are primarily characterized by an unpredictable nature, falling under socio-environmental and natural phenomena types, indicating a significant impact of global uncertainty. Therefore, systematic policies need to be formulated based on identified port safety issues to enhance port safety in Busan Port. Additionally, there is a need to strengthen the resilience of port safety for unpredictable risk situations. In conclusion, advanced research activities are necessary to promote port safety enhancement in response to dynamically changing social conditions.

A News Frame Analysis by the South Korean Press on the Livelihoods of a North Koreans (북한주민 생활 실태에 관한 국내 신문보도 프레임연구: 조선일보, 동아일보, 한겨레, 경향신문을 중심으로)

  • Ha, Seung-Hee;Lee, Min-Kyu
    • Korean journal of communication and information
    • /
    • v.58
    • /
    • pp.222-241
    • /
    • 2012
  • Analysis research was conducted on the 'Chosen Ilbo,' 'Donga Ilbo,' 'Hankyoreh' and 'Kyunghyang Newspaper' on news reporting frame and sources regarding the North Korean population's living and status of their respective conditions throughout South Korean administrations of Kim Dae-jung, Roh Mu-hyuen, and Lee Myung-Bak. Resulting first and second frame analyses showed that the four newspapers showed clear distinctions in their execution and attitude in their reports in accordance to the philosophies of each 'progressive/left-wing newspapers' and 'conservative/right-wing newspapers'. Moreover, the first and second analyses showed that the reports mimicked each individual presidential administration and their political policy regardless of the newspaper's political voice. In terms of the usage of 'anonymous sources,' conservative newspapers were found to use them more frequently than their counterpart, the liberal newspapers. In conclusion to the overall analyses, with the continued tension between the International community and North Korea, the reports of North Korean inhabitant activities are inevitably portrayed in a negative tone in effort to plant a distorted view to the South Korean citizens. Furthermore, this manipulation of the press may affect the credibility of South Korean press in terms of the topic of North Korean culture.

  • PDF

A study on the method of deriving the cause of social issues based on causal sentences (인과관계문형 기반 사회이슈 발생원인 도출 방법 연구)

  • Lee, Namyeon;Lee, Jae Hyung
    • Journal of Digital Convergence
    • /
    • v.19 no.3
    • /
    • pp.167-176
    • /
    • 2021
  • With development of big data analysis technology, many studies to find social issues using texts mining techniques have been conducted. In order to derive social issues, previous studies performed in a way that collects a large amount of text data from news or SNS, and then analyzes issues based on text mining techniques such as topic modeling and terms network analysis. Social issues are the results of various social phenomena and factors. However, since previous studies focused on deriving social issues that are results of various causes, there are limitations to revealing the cause of the issues. In order to effectively respond to social issues, it is necessary not only to derive social issues, but also to be able to identify the causes of social issues. In this study, in order to overcome these limitations, we proposed a method of deriving the factors that cause social issues from texts related to social issues based on the theory of part of Korean linguistics. To do this, we collected news data related to social issues for three years from 2017 to 2019 and proposed a methodology to find causes based causal sentences based on text mining techniques.

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques (텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석)

  • Bae, Jung-Hwan;Son, Ji-Eun;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.141-156
    • /
    • 2013
  • Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.