• Title/Summary/Keyword: news text

Search Result 378, Processing Time 0.032 seconds

Social Media Fake News in India

  • Al-Zaman, Md. Sayeed
    • Asian Journal for Public Opinion Research
    • /
    • v.9 no.1
    • /
    • pp.25-47
    • /
    • 2021
  • This study analyzes 419 fake news items published in India, a fake-news-prone country, to identify the major themes, content types, and sources of social media fake news. The results show that fake news shared on social media has six major themes: health, religion, politics, crime, entertainment, and miscellaneous; eight types of content: text, photo, audio, and video, text & photo, text & video, photo & video, and text & photo & video; and two main sources: online sources and the mainstream media. Health-related fake news is more common only during a health crisis, whereas fake news related to religion and politics seems more prevalent, emerging from online media. Text & photo and text & video have three-fourths of the total share of fake news, and most of them are from online media: online media is the main source of fake news on social media as well. On the other hand, mainstream media mostly produces political fake news. This study, presenting some novel findings that may help researchers to understand and policymakers to control fake news on social media, invites more academic investigations of religious and political fake news in India. Two important limitations of this study are related to the data source and data collection period, which may have an impact on the results.

Urdu News Classification using Application of Machine Learning Algorithms on News Headline

  • Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.2
    • /
    • pp.229-237
    • /
    • 2021
  • Our modern 'information-hungry' age demands delivery of information at unprecedented fast rates. Timely delivery of noteworthy information about recent events can help people from different segments of life in number of ways. As world has become global village, the flow of news in terms of volume and speed demands involvement of machines to help humans to handle the enormous data. News are presented to public in forms of video, audio, image and text. News text available on internet is a source of knowledge for billions of internet users. Urdu language is spoken and understood by millions of people from Indian subcontinent. Availability of online Urdu news enable this branch of humanity to improve their understandings of the world and make their decisions. This paper uses available online Urdu news data to train machines to automatically categorize provided news. Various machine learning algorithms were used on news headline for training purpose and the results demonstrate that Bernoulli Naïve Bayes (Bernoulli NB) and Multinomial Naïve Bayes (Multinomial NB) algorithm outperformed other algorithms in terms of all performance parameters. The maximum level of accuracy achieved for the dataset was 94.278% by multinomial NB classifier followed by Bernoulli NB classifier with accuracy of 94.274% when Urdu stop words were removed from dataset. The results suggest that short text of headlines of news can be used as an input for text categorization process.

Fake News Detection for Korean News Using Text Mining and Machine Learning Techniques (텍스트 마이닝과 기계 학습을 이용한 국내 가짜뉴스 예측)

  • Yun, Tae-Uk;Ahn, Hyunchul
    • Journal of Information Technology Applications and Management
    • /
    • v.25 no.1
    • /
    • pp.19-32
    • /
    • 2018
  • Fake news is defined as the news articles that are intentionally and verifiably false, and could mislead readers. Spread of fake news may provoke anxiety, chaos, fear, or irrational decisions of the public. Thus, detecting fake news and preventing its spread has become very important issue in our society. However, due to the huge amount of fake news produced every day, it is almost impossible to identify it by a human. Under this context, researchers have tried to develop automated fake news detection method using Artificial Intelligence techniques over the past years. But, unfortunately, there have been no prior studies proposed an automated fake news detection method for Korean news. In this study, we aim to detect Korean fake news using text mining and machine learning techniques. Our proposed method consists of two steps. In the first step, the news contents to be analyzed is convert to quantified values using various text mining techniques (Topic Modeling, TF-IDF, and so on). After that, in step 2, classifiers are trained using the values produced in step 1. As the classifiers, machine learning techniques such as multiple discriminant analysis, case based reasoning, artificial neural networks, and support vector machine can be applied. To validate the effectiveness of the proposed method, we collected 200 Korean news from Seoul National University's FactCheck (http://factcheck.snu.ac.kr). which provides with detailed analysis reports from about 20 media outlets and links to source documents for each case. Using this dataset, we will identify which text features are important as well as which classifiers are effective in detecting Korean fake news.

Automatic Name Line Detection for Person Indexing Based on Overlay Text

  • Lee, Sanghee;Ahn, Jungil;Jo, Kanghyun
    • Journal of Multimedia Information System
    • /
    • v.2 no.1
    • /
    • pp.163-170
    • /
    • 2015
  • Many overlay texts are artificially superimposed on the broadcasting videos by humans. These texts provide additional information to the audiovisual content. Especially, the overlay text in news videos contains concise and direct description of the content. Therefore, it is most reliable clue for constructing a news video indexing system. To make the automatic person indexing of interview video in the TV news program, this paper proposes the method to only detect the name text line among the whole overlay texts in one frame. The experimental results on Korean television news videos show that the proposed framework efficiently detects the overlaid name text line.

Comparison of Text Beginning Frame Detection Methods in News Video Sequences (뉴스 비디오 시퀀스에서 텍스트 시작 프레임 검출 방법의 비교)

  • Lee, Sanghee;Ahn, Jungil;Jo, Kanghyun
    • Journal of Broadcast Engineering
    • /
    • v.21 no.3
    • /
    • pp.307-318
    • /
    • 2016
  • 비디오 프레임 내의 오버레이 텍스트는 음성과 시각적 내용에 부가적인 정보를 제공한다. 특히, 뉴스 비디오에서 이 텍스트는 비디오 영상 내용을 압축적이고 직접적인 설명을 한다. 그러므로 뉴스 비디오 색인 시스템을 만드는데 있어서 가장 신뢰할 수 있는 실마리이다. 텔레비전 뉴스 프로그램의 색인 시스템을 만들기 위해서는 텍스트를 검출하고 인식하는 것이 중요하다. 이 논문은 뉴스 비디오에서 오버레이 텍스트를 검출하고 인식하는데 도움이 되는 오버레이 텍스트 시작 프레임 식별을 제안한다. 비디오 시퀀스의 모든 프레임이 오버레이 텍스트를 포함하는 것이 아니기 때문에, 모든 프레임에서 오버레이 텍스트의 추출은 불필요하고 시간 낭비다. 그러므로 오버레이 텍스트를 포함하고 있는 프레임에만 초점을 맞춤으로써 오버레이 텍스트 검출의 정확도를 개선할 수 있다. 텍스트 시작 프레임 식별 방법에 대한 비교 실험을 뉴스 비디오에 대해서 실시하고, 적절한 처리 방법을 제안한다.

Building a Korean Text Summarization Dataset Using News Articles of Social Media (신문기사와 소셜 미디어를 활용한 한국어 문서요약 데이터 구축)

  • Lee, Gyoung Ho;Park, Yo-Han;Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.8
    • /
    • pp.251-258
    • /
    • 2020
  • A training dataset for text summarization consists of pairs of a document and its summary. As conventional approaches to building text summarization dataset are human labor intensive, it is not easy to construct large datasets for text summarization. A collection of news articles is one of the most popular resources for text summarization because it is easily accessible, large-scale and high-quality text. From social media news services, we can collect not only headlines and subheads of news articles but also summary descriptions that human editors write about the news articles. Approximately 425,000 pairs of news articles and their summaries are collected from social media. We implemented an automatic extractive summarizer and trained it on the dataset. The performance of the summarizer is compared with unsupervised models. The summarizer achieved better results than unsupervised models in terms of ROUGE score.

Text-Mining Analyses of News Articles on Schizophrenia (조현병 관련 주요 일간지 기사에 대한 텍스트 마이닝 분석)

  • Nam, Hee Jung;Ryu, Seunghyong
    • Korean Journal of Schizophrenia Research
    • /
    • v.23 no.2
    • /
    • pp.58-64
    • /
    • 2020
  • Objectives: In this study, we conducted an exploratory analysis of the current media trends on schizophrenia using text-mining methods. Methods: First, web-crawling techniques extracted text data from 575 news articles in 10 major newspapers between 2018 and 2019, which were selected by searching "schizophrenia" in the Naver News. We had developed document-term matrix (DTM) and/or term-document matrix (TDM) through pre-processing techniques. Through the use of DTM and TDM, frequency analysis, co-occurrence network analysis, and topic model analysis were conducted. Results: Frequency analysis showed that keywords such as "police," "mental illness," "admission," "patient," "crime," "apartment," "lethal weapon," "treatment," "Jinju," and "residents" were frequently mentioned in news articles on schizophrenia. Within the article text, many of these keywords were highly correlated with the term "schizophrenia" and were also interconnected with each other in the co-occurrence network. The latent Dirichlet allocation model presented 10 topics comprising a combination of keywords: "police-Jinju," "hospital-admission," "research-finding," "care-center," "schizophrenia-symptom," "society-issue," "family-mind," "woman-school," and "disabled-facilities." Conclusion: The results of the present study highlight that in recent years, the media has been reporting violence in patients with schizophrenia, thereby raising an important issue of hospitalization and community management of patients with schizophrenia.

Statistical Analysis Between Size and Balance of Text Corpus by Evaluation of the effect of Interview Sentence in Language Modeling (언어모델 인터뷰 영향 평가를 통한 텍스트 균형 및 사이즈간의 통계 분석)

  • Jung Eui-Jung;Lee Youngjik
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.87-90
    • /
    • 2002
  • This paper analyzes statistically the relationship between size and balance of text corpus by evaluation of the effect of interview sentences in language model for Korean broadcast news transcription system. Our Korean broadcast news transcription system's ultimate purpose is to recognize not interview speech, but the anchor's and reporter's speech in broadcast news show. But the gathered text corpus for constructing language model consists of interview sentences a portion of the whole, $15\%$ approximately. The characteristic of interview sentence is different from the anchor's and the reporter's in one thing or another. Therefore it disturbs the anchor and reporter oriented language modeling. In this paper, we evaluate the effect of interview sentences in language model for Korean broadcast news transcription system and analyze statistically the relationship between size and balance of text corpus by making an experiment as the same procedure according to varying the size of corpus.

  • PDF

An Analysis of the Contents and Make-up of the Page in a News Story of the Internet Newspaper -focusing on Naver, Daum, Nate, Yahoo- (인터넷신문의 뉴스기사 페이지 구성과 콘텐츠에 대한 분석 -네이버, 다음, 네이트, 야후를 중심으로-)

  • Park, Kwang-Soon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.15 no.3
    • /
    • pp.1345-1354
    • /
    • 2014
  • This paper has analyzed how the format of the text page and the contents of space surrounding the text in the news stories of the portal sites are made-up. The result of analysis showed that the formats of the text page in Naver news story were more intricate than those of Daum, Nate and Yahoo. Also, Naver was higher in the number of advertising, the type of advertising, the entertainment contents, and various types of contents than other three portals. Especially, the percentage of new story related to entertainers was the highest. It was the portal site Daum that advertised the news story most of all in its text page. In contrast, it was portal site Yahoo that inserted the advertisements least of all. But from the whole sides, it was found that the formats and contents of the text page of the news story in these three portal sites have similarly been made-up. Consequently speaking, for the serviceability of use in news story, it can be evaluated that the news service method in portal sites is higher than that in press dot coms.

Joint Hierarchical Semantic Clipping and Sentence Extraction for Document Summarization

  • Yan, Wanying;Guo, Junjun
    • Journal of Information Processing Systems
    • /
    • v.16 no.4
    • /
    • pp.820-831
    • /
    • 2020
  • Extractive document summarization aims to select a few sentences while preserving its main information on a given document, but the current extractive methods do not consider the sentence-information repeat problem especially for news document summarization. In view of the importance and redundancy of news text information, in this paper, we propose a neural extractive summarization approach with joint sentence semantic clipping and selection, which can effectively solve the problem of news text summary sentence repetition. Specifically, a hierarchical selective encoding network is constructed for both sentence-level and document-level document representations, and data containing important information is extracted on news text; a sentence extractor strategy is then adopted for joint scoring and redundant information clipping. This way, our model strikes a balance between important information extraction and redundant information filtering. Experimental results on both CNN/Daily Mail dataset and Court Public Opinion News dataset we built are presented to show the effectiveness of our proposed approach in terms of ROUGE metrics, especially for redundant information filtering.