1. Introduction
Political manipulation refers to forcing or persuading people to change their behavior to winpolitical advantage. It is often achieved by presenting false information and deceiving people. Political manipulation dates back over a hundred years, and many tactics have been developed, such as deliberate ommision of details and repetition of a false statement until peopleeventually believe it [1]. Manipulation results are not trivial, e.g. changing the outcome of nation-wide elections or significant impact on people’s livelihood for several years [2].
Traditionally, political manipulation has leveraged television and newspapers to trick the public. However, current manipulation increasingly utilizes online communities—web portals, social networks, and discussion forums—since these communities are widely used as a mainsource of information and can instantly and widely disseminate information to the public [3]. One recent example is Russian interference in the US presidential election [4]. Numerous opinions were posted in online communities intended to damage the reputation of candidates that maintained a strong stance against Russia. The posts were disguised as being written by US citizens. Previous studies have shown that online manipulation can switch up to 10% of reader & rsquo;s votes, which is sufficient to make a significant difference, particularly in competitive elections [5].
Several methods have been proposed to detect political manipulation in onlinecommunities, most based on supervised learning with various features to identify manipulative activity. Ratkiewiz et al. [6] utilized that manipulative propaganda is generally repeatedly mentioned by relatively small groups of users. Lee [3] measured the amount oflabor required to distribute propaganda (e.g. the number of successive posts within a short period), which appeared to be significant among manipulative users. Although the proposed features are effective discriminators, supervised learning requires large prelabeled training sets (e.g. > 100K instances) to achieve acceptable accuracy, which requires substantial timeand effort. Furthermore, manipulative tactics continue to evolve, so new training sets are required on a regular basis.
Our study focused on building a practical system that does not require large labeling effort(s), but still achieves comparable accuracy to existing approaches. The proposed systemis based on unsupervised learning. In particular, we group opinions in online communities intoclusters, where each cluster contains opinions with similar characteristics. After identifying cluster structures, we determine if the clusters are manipulative by labeling a handful of opinions from each cluster. This reduces labeling effort to only several hundred instances. We can also track changes in manipulative tactics by tracing clusters over time. For example, previously unseen clusters could indicate new manipulation behaviors.
To validate the proposed method, we collected over a million opinions from popular webportals in South Korea during two presidential campaigns and one local election, when manipulative activities are at their peak. We analyzed the collected data to extract a set offeatures that distinguish manipulative and non-manipulative opinions. Using these features, the proposed method reduced labeling effort from 200K+ instances to approximately 200 instances, while correctly classifying more than 90% of instances. We also compared clustering results of different political campaigns and showed that certain tactics became morecommon whereas others gradually disappeared.
The remainder of this paper is organized as follows. Section 2 reviews related work and highlights differences from the current study, and Section 3 explains the procedures used tocollect and label data. Section 4 proposes the opinion model, including the features that weanalyze, and Section 5 describes the clustering method employed and evaluates its efficiency and accuracy. Finally, Section 6 concludes the paper and presents future research directions.
2. Related Work
Political manipulation is widespread in online communities of many countries. In particular, several government agencies and military intelligence units in South Korea led manipulative campaigns during presidential elections [7], spreading pro-government opinions to supportruling party candidates and disparaging anti-government views as being attempts by pro-North Korean forces to disrupt state affairs. Various government officials invovled are currently ontrial. Massive numbers of manipulative posts have also been reported in Italian [8], Russian [9], an US [4] nation-wide elections. Millions of people have changed their voting behaviors due to online manipulations [5]. Compounding their direct impact, false beliefs are difficult tosubsequently change once they become accepted [10], and continue to reappear as firmevidence in subsequent elections [8].
A few previous studies have explored different features to detect political manipulation, but almost all have trained a classifier based on supervised learning. Ratkiewiz et al. [6] modelthe way an idea propagates through multiple users in Twitter as diffusion patterns, and showthat manipulative and non-manipulative ideas have different patterns, e.g. manipulative propagand a is more frequently retweeted by fewer users than non-manipulative ideas. Sinceretweet capability is not available for the web portals we analyzed, diffusion patterns are not directly applicable to the current study. Lee [3] models the amount of labor and collaborationamong users, based on the observation that manipulators tend to work hard in teams to quickly influence the public. Some features are reused in the current study, including the number of opinions consecutively posted by a user. However, we exclude features related to the use of specific phrases, e.g. the number of words frequently used in political campaigns. This is because word usage differs from one campaign to another, hence collecting and preparing words for each campaign requires huge effort, which is inconsistent with one of our key goalsto reduce the workload. Overall, previous studies utilized supervised learning, requiringlabeling of a large training set. This is not practical, since manipulative tactics continue to evolve, as discussed in Section 5.4.
Although a significant portion of manipulative posts are for political purposes [11], they also have other objectives, most notably commercial reasons, such as product reviews thatunfairly support particular products or negatively comment on other products. Previous studies have proposed various features to characterize manipulation in the commercial domain. Since feature importance differs significantly across domains [12], many features proposed in the commercial domain will not apply to the political domain, and vice versa. Features related to the rating system (e.g. the five-star rating system) appear most often. For example, users whose ratings deviate significantly from average ratings are commonly identified as potential manipulators [13]. TrueView [14] goes one step further and compares ratings across multiplesites. Although rating systems are common for product reviews, they are rarely used forpolitical posts. Other studies have analyzed the timing between consecutive posts and shown that bursts of manipulative reviews tend to be posted on the same product over a short period [15], and such bursts reappear multiple times [16]. The current study also proposes features to identify concurrent posts. In contrast to other studies, ClickStream [17] uses unsupervised learning to cluster users with similar social-interaction patterns (e.g. friend request, photoviewing, and instant messaging), and then identifies anomalous clusters. In the web portals weanalyzed, users mostly read, write, and approve posts, but social interactions are not of major concern.
3. Data Collection and Annotation
We prepared a set of opinions, labeled as either manipulative or non-manipulative, to analyzemanipulative opinion characteristics (Section 4), and hence devise a detection system (Sections 5.1), estimate parameters (Section 5.2), and evaluate the accuracy of the proposed detection system (Sections 5.3−5.4). We first detail the process of collecting opinions (Section 3.1) and describe the method to label the opinions (Section 3.2).
3.1 Collecting Opinions
We collected over one million opinions posted on three popular web portals in South Korea1, as summarized in Table 1. The opinions were collected during three political campaigns, since we expect that manipulations are most likely to occur during these periods to sway voters. Indeed, several government agencies and cyber military units were accused of organized manipulation in online social media during these campaigns [18], as ordered by the ruling party [19]. Several political parties were also charged with circulating fake news to defameopposing party candidates [20], and North Korea was also commonly reported to be involved in manipulative activities to create anti-government movements [21].
Table 1. Collected opinion summary
Table 2. Summary of web portals that opinions were collected from. Statistics from KoreanClick [22]
Table 2 summarizes the three sites where opinions were collected. These sites are among the most popular news portals in South Korea, and all sites present news articles grouped intocategories; we focused on the politics category. Users can share their thoughts regarding eacharticle by posting opinions, which we collected. We considered articles where users were highly engaged with (i.e., > 1,000 posts). Users can approve (or disapprove) of an opinion by pressing a button, similar to the ‘Like’ capability on Facebook. Some opinions were approved by more than ten thousand users. A handful of opinions with the most approvals are shown at the front page of article, and we call such opinions top posts. Controversial opinions oftenreceived comparable approvals and disapprovals.
Each collected opinion is represented by a six-tuple, as shown in Table 3. Text values were originally written in Korean, translated into English for international readers. Item 1 of the tuple is news article title, on which the opinion is posted. This can be regarded as the subject of the opinion. Items 2−4 are the ID used to log into the portal, time the opinion was posted, andopinion content, respectively. Item 5 is the number of users who approve of the opinion, anditem 6 the number who disapprove. Items 5 and 6 were collected regularly (5 minuteintervals2) to track gradual changes over time and include such changes when detecting manipulation. For example, a sharp increase in user responses, such as 455→2240 (marked in bold face in Table 3), would most likely be due to an automated tool with a large pool of stolenIDs [23]. We performed regular collection for one week after posting, since the numbers rarely change beyond this period.
Table 3. Sample of collected opinions
3.2 Labeling Opinions
We labeled the collected opinions as either manipulative or non-manipulative in two steps. In the first step, we collected additional evidence to help label each opinion, including (i) whetherthe opinion was reported by users, (ii) whether the opinion was deleted, and (iii) whether theassociated user ID was deleted. We assumed that reported and deleted opinions were more 2 Collecting numbers at intervals shorter than five minutes did not improve detection accuracy. likely to be manipulative, according to the following reasons. Users can report abusive opinions, which are then reviewed by surveillance team and selectively removed if they arefound to violate portal policies (e.g. libel, profanity, copyright infringement, unapproved advertisement, etc.) [24]. Multiple violations can lead to deletion of the associated user ID. Opinions and IDs are also deleted to remove the evidence of illegal activities. In fact, mass deletions occurred during the election campaigns in Table 1, when investigation started overthe claims that several government organizations were involved in manipulation [25]. We found that approximately 15−20% of opinions were deleted during the study periods, which was unusual, with only 1−3% of opinions deleted in non-election periods. Thus, an opinion or ID being reported or deleted was a significant indicator of it being manipuative.
To obtain the evidence for (i)-(iii), we re-visited the portal sites three months aftercollection and checked the status of each collected opinion. We uniquely identified a particularopinion by the combination of posting time and user ID, and then confirmed whether the opinion and ID were flagged as reported or deleted3.\
Fig. 1. Number of opinions per article in increasing order
In the second step of the labeling process, we labeled each opinion as manipulative or non-manipulative based on all available of the evidence, and expert assessment. This process was performed by five judges, each of whom had more than two years experience monitoring opinions in online social media, and were familiar with various manipulation strategies [1]. The judges were provided full access to the database of collected opinions and were able to runmost SQL queries (e.g. they could list all opinions written by a paricular user ID and count the number of deleted opinions). Each judge labeled the opinion as manipulative or non-manipulative, and the final label was determined as the majority. Overall, 13.5% of the opinions were labeled as manipulative. Fig. 1 illustrates the distribution of manipulative opinions for 141 articles for the third collection period in Table 1. The vertical bars represent the number of opinions in an article, and green and red segments correspond to the number of non-manipulative and manipulative opinions, respectively. Up to 20% of opinions weremanipulative for some articles. Distributions were similar in the other collection periods.
We calculated Fleiss’ multi-rater kappa (κ) [26] to analyze label quality. This measureshows the strength of agreement among multiple judges. We obtained κ=0.77, whichrepresents substantial agreement according to the Landis and Koch scale [27]. We alsoverified the assumption that most deleted opinions were manipulative by analyzing the composition of deleted opinions in the labled dataset. Almost 88% of deleted opinions werejudged to be manipulative, with the remainder comprising advertisements or the use of abusivelanguage.
4. Opinion Modeling
We build an opinion model that characterizes abnormalities and thus can distinguishmanipulative from non-manipulative opinions. This model is defined as a tuple of 78 features, as shown in Table 4. Table 5 shows the terms and symbols used throughout Sections 4 and 5.
Let Oi denote a modeled opinion posted on article Ai by user Ui. The features are divided into two categories.
1. features 1-18 characterize opinion Oi, and
2. features 19-78 characterize author Ui over all their opinions.
These categories complement each other, for example suppose the first group raises s uspicion that Oi was manipulated (e.g. it was immediately approved by a thousand user accounts), this can be verified by the second group (e.g. Ui’s opinions tended to be approved much morequickly than other users’ opinions). Sections 4.1 and 4.2 explain the two categories, in detail.
Table 4. Features that characterize an opinion Table 5. Terms and symbols used in this article
Table 5. Terms and symbols used in this article
4.1 Opinion Characteristics
The first feature category (features 1−18) depicts opinion characteristics. Let Oi be posted onarticle Ai by user Ui. We performed a preliminary analysis of each feature’s discriminative power in detecting manipulative opinions, as shown in Fig. 24, where the vertical axes showthe cumulative percentage of opinions that exhibit the designated features. Large gaps between manipulative and non-manipulative opinions imply the feature is an effective discriminator.
Features 1−2 relate to the time when Oi is posted. Feature 1 is the absolute time, and feature 2 is the time relative to the publication time of Ai. For example, the sample opinion in Table 3 was written at 12:33:24, and let us suppose the article was published at 11:33:24.
Feature 1 = 12.56, represented as hours (12.56 ≈ 12 + 33/60 + 24/3600); and feature 2 = 1.00, since the opinion was written one hour after the article was published. Feature 1 capturessituations where manipulators prefer to work at particular hours. Feature 2 represents howsoon Oi is posted after Ai is made public; earlier Oi is more likely to be read by many users and become a top post. Fig. 2-(a) shows that manipulative opinions tend to be posted earlier thannon-manipulative ones, and more than 50% of manipulative posts are made within the firsthour after article publication.
Fig. 2. Cumulative distribution function (CDF) of selective features that characterize an opinion
Features 3−6 model Oi’s text, including how many letters, URLs, digits, and special characters are present. For example, the sample opinion in Table 3 contains 550 letters, 1 URL, and no digits or special characters. Therefore, features 3−6 = 550, 1, 0, and 0, respectively. Manipulative opinions tend to be lengthy, providing various supporting arguments to appeartrustworthy, which often include references such as URLs to supplemental documents and videos, and specific numbers from surveys and research articles. The arguments also tend to beenumerated and highlighted using special characters. Fig. 2-(b) shows the cumulativedistribution of opinions with varying numbers of references. Manipulative opinions tend touse more references than non-manipulative opinions, and nearly 20% of manipulative postsuse more than 5 such references.
Features 7−9 represent approval behavior for Oi, and features 10−12 characterizedisapprovals. Approvals are common manipulation targets—numerous approvals lead to a toppost, which is seen by a large number of users, increasing the chance of influence. Feature 7 is the maximum growth rate of approvals. For the sample opinion in Table 3, maximum growthoccurred when the number increased from 455→2240, hence feature 7 = 1785 (2240 - 455). Feature 8 is the time for maximum growth relative to Ai’s publication. Hence, if the observed maximum growth occurred 1 hour 30 minutes after publication, feature 8 = 1.50 (in hours). Feature 9 is the number of approvals in the final collection interval. Hence, for the opinion in Table 3, feature 9 = 9605. Figs. 2-(c) and 2-(d) show cumulative distributions for features 7 and 8, respectively. Manipulative opinions are approved by more users and at earlier times than non-manipulative opinions. This translates to an effective manipulation strategy: create atop post as early as possible, as these tend to remain at the top.
Features 13−18 measure the extent of opinions that share certain characteristics with Oi. Manipulators publish a series of opinions to effectively spread propaganda, and these opinions of ten have various similarities (e.g. duplicate words and similar posting times). Feature 13 is the number of opinions with text similar to Oi. We consider that two opinions Oi and Oj havesimilar text if the Jaccard Coefficient (JC) [28] is greater than a predefined threshold TJC. JC can be expressed as
\(J C(O i, O j)=|W(O i) \cap W(O j)| \div|W(O i) \cup W(O j)|\)
where W(Oi) denotes the set of distinct terms used in Oi. Therefore, JC > TJC means asubstantial fraction of Oi is reused in Oj, with a few words and phrases switched. In contrast t of eature 13, which counts similar opinions written by Ui, feature 14 counts those by other users, and feature 15 is the number of distinct users who share similar text to Oi. Features 16−18 consider different similarity aspects. Feature 16 is the number of Ui’s opinions on the samearticle Ai, and among such opinions, feature 17 counts top posts. Feature 18 is the number of Ui’s opinions posted at similar times to Oi (regardless of the articles the opinions are posted to). We consider two opinions Oi and Oj are written at similar times if their posting times differless than a predefined threshold TTM. Figs. 2-(e) and 2-(f) show cumulative distributions forsimilar text and posting times, respectively. Approximately 70% of manipulative opinions areprobably reproduced from other opinions (Fig. 2-(e)), and some collections of duplicateopinions contain more than a hundred opinions. Manipulative opinions are often posted at similar times, whereas non-manipulative opinions tend to be posted individually (Fig. 2-(f)). These results indicate that manipulative opinions can be clustered according to their posting times and texts5.
4.2 Author Characteristics
The second feature category (features 19−78) depicts the author’s (Ui) general behavior overall their published opinions, in contrast to the first category that focuses on the details of oneparticular opinion. We utilize four metrics to describe general behavior: maximum (max), average, median, and minimum (min). These features are marked with an asterisk(*) in Table 4. For example, features 21-24 are the max, average, median, and min posts per article, respectively. Thus, suppose Ui wrote 9, 4, 4, and 3 posts on four articles, respectively, thenfeatures 21-24 = 9, 5, 4, and 3, respectively.
Features 19−24 measure the volume of Ui’s opinions to investigate how voracious a writerthey are. Feature 19 counts the total number of Ui’s opinions over all articles, and feature 20 counts the number of distinct articles where these opinions are written. The opinions are thengrouped according to their respective articles, to compute features 21-24 (as detailed above). Figs. 3-(a) and 3-(b) show the cumulative distributions for features 20 and 21. Authors of manipulative opinions tend to write on multiple articles and post several opinions on eacharticle, sometimes more than 80. In contrast, authors of non-manipulative opinions write onone or a handful of articles, and nearly 80% post a single opinion per article. Thus, being avoracious writer increases the probability Ui is manipulative.
Features 25−30 characterize top posts among Ui’s opinions. We first count the number of top posts in each article where Ui ever posts opinions, and then compute max, average, median, and min per article (features 25−28, respectively). Feature 29 is the number of articles where Ui’s opinions become top posts, and feature 30 is the ratio of such articles to the entire articles where Ui posts opinions. For example, suppose Ui posts 9, 4, 4, and 3 opinions on four articles, and among them 4, 2, 2, and 0 opinions, respectively, become top posts. Then features 25 − 30= 4, 2, 2, 0, 3, and 0.75, respectively. Fig. 3-(c) shows the cumulative distribution for feature 29. Authors of manipulative opinions tend to leave top posts in more articles thannon-manipulative authors, sometimes in more than 10 articles. Thus, manipulative posts aremore likely to become top posts than non-manipulative posts, possibly by exploiting the approval system.
The remaining features (31−78) are derived from features 1−12 in the first category and areadjusted to Ui’s general behavior. Each of features 1−12 corresponds to four features in thesecond category, i.e., max, average, median, and min. For example, feature 1 is the posting time of one particular opinion Oi, and corresponding features 31−34 are max, average, median, and min posting times for all Ui’s opinions, respectively.
Fig. 3. Cumulative distribution function (CDF) of selective features that characterize an author
5. Unsupervised Detection of Manipulative Opinions
Based on the model from Section 4, we develop a system to identify manipulative opinions. Section 5.1 describes the details of the proposed system, Section 5.2 estimates systemparameters, and Section 5.3 evaluates the resultant system’s accuracy. Section 5.4 applies the proposed system to opinions from three different years and tracks changes in manipulativetactics over the years. Section 5.5 compares the proposed system with conventional classifiers.
5.1 Clustering and Coloring Methods
The proposed system utilizes unsupervised learning, in contrast to previous studies using supervised learning. Supervised learning requires labeling large-scale data on a regular basis, since manipulative opinion characteristics alter over time, whereas unsupervised learning can minimize this effort.
We adopt K-means clustering [29]6. Each opinion comprises an instance for clustering. Opinions with similar features are grouped into the same cluster, and those with dissimilar features are placed into different clusters. Manipulative opinions have distinct behaviors from non-manipulative opinions, as discussed in Section 4, and are likely to form separate clusters. Since not all manipulative and non-manipulative opinions exhibit identical behaviors, weexpect multiple clusters of manipulative and non-manipulative opinions. Note that the clustering allows a straightforward interpretation of results; by inspecting the center of cluster (centroid), we can understand what opinions compose each cluster and how one clustercontrast with the other clusters.
Fig. 4. Overview of clustering and classification process
Fig. 4 shows an overview of how we cluster opinions and how we classify opinions according to clustering results. The training set refers to a set of opinions used to build clusters and thus to initialize the proposed system. The test set corresponds to unclassified opinionsinput to the system in real time, which are then classified according to learned clusters. At first, both of the sets are not labeled. Later on in step 2, we label a small subset of the training set. The detailed steps are as follows.
- Step 1. We group the opinions in the training set using K-means clustering algorithm. The example in Fig. 4-(a) assumes that the number of clusters K = 3. Euclidean distance is used as the measure of similarity among opinions. We normalize each opinion feature to zeromean and unity standard deviation of one, so that one feature with a large variance does not dominate the similarity measure.
- Step 2. We randomly choose a small set of S opinions within each cluster and label them asmanipulative or non-manipulative (In Fig. 4-(b), S=2). These opinions determine the overall cluster label (color) in the next step, and we refer to them as seed instances.
- Step 3. We color each cluster depending on the seed labels as follows.
If all seeds in cluster C have homogeneous label l, then we color C as l. (3-1)
Otherwise, we perform another round of K-means clustering on C to divide the opinions into two clusters and repeat this process until all resulting clusters have homogeneous seeds. Then we color the clusters according to step 3-1. (3-2)
Fig. 4-(b) shows that clusters ① and ② contain homogeneous seeds, hence they are colored as non-manipulative and manipulative, respectively (Fig. 4-(c)). Cluster ③ contains bothmanipulative and non-manipulative seeds, so it is further divided into two clusters with homogeneous seeds and colored accordingly. The additional clustering reduces the chance ofincorrectly coloring a cluster, in case K is not sufficiently large and hence manipulative and non-manipulative opinions belong to the same cluster.
- Step 4. We classify the opinions in the test set according to cluster color. We compute the distance for unclassified instances to the centroid of each identified cluster, and then assign the label of the nearest cluster. Fig. 4-(d) shows three instances on the left are closer tonon-manipulative clusters and hence classified as non-manipulative; whereas two instanceson the right are closer to manipulative clusters and hence classified as manipulative.
Choice of the parameters K and S affects the accuracy and efficiency of the proposed method. In Section 5.2, we experiment with different choices and choose proper values.
5.2 Parameter Estimation
The proposed system requires two parameters: K, the number of clusters, and S, the number of seeds used to color each cluster. The choice of these parameters poses tradeoffs. On one hand, larger K allows us to fully separate opinions into their respective clusters and thus to discoverall manipulative and non-manipulative clusters; and larger S ensures a sufficient number of seeds to accurately color clusters. On the other hand, K and S need to be small enough to be practical, since we will need to label K×S seeds (S seeds in each cluster), and fewer seedsmeans less labelling effort required.
To estimate the parameters, we randomly sampled 1/3 of the opinions in Section 3 (i.e., 366,988 opinions), and retained the remaining opinions for classification in Sections 5.3, 5.4, and 5.5. We then varied K and S and evaluated the resulting clustering quality. As a measure of quality, we used the percentage of homogeneous clusters—clusters that contain opinions of one, homogenous label and thus will be correctly colored. We ran 100 experiments for each Kand S pair, and calculated the average, since the seeds are randomly selected and K-meansclustering yields slightly different results for each run.
Fig. 5 shows the effect of different K and S choices. In general, increasing either (or both) leads to more accurate clustering and coloring. When K = 70, S = 3, i.e., 70×3 = 210 seeds total, would be sufficient to correctly color 90% of clusters. If S = 9, i.e., 70×9 = 630 seeds, accuracy approaches 99%. In practice, 90% homogeneous clusters, i.e., approximately 200 seeds, issufficient to achieve acceptable classification accuracy - within the 10% of non-homogeneousclusters, the vast majority of opinions have the same label except a few opinions, so misplaced opinions are much less than 10%. We demonstrate this is the case in the next section. We alsoillustrate that supervised learning requires far more seeds to achieve the same level ofaccuracy.
Fig. 5. Number of clusters (K) and seeds (S) vs. percentage of clusters with homogeneous labels
5.3 Classification Accuracy and Comparison with Supervised Learning
We evaluate the proposed system considering three aspects. First, we measure the accuracy of the system at classifying unknown opinions. Second, we investigate misclassified opinions and present explanations. Finally, we compare the proposed system with supervised learning. We used the opinions not already used in Section 5.2, i.e., 2/3 of the dataset from Section 3.
We randomly sampled half of this dataset (366,988 instances) as training data and theremainder (366,988) as test data. The training set was used to build and color clusters, and weassumed that this set was not yet labeled. Using the training set, we constructed K = 70 clusters and labeled S = 3 seeds from each cluster, which overall required labeling of slightly more than 200 opinions7. According to the clusters, we classified each opinion in the test set and confirmed whether this classification was correct. We ran the experiments 100 times and took the average outcomes.
Table 6. Classification outcomes from the proposed system
Table 6 summarizes the classification results. The horizontal axis lists the two classes in the original collection, and the vertical axis lists the two outcomes of classification. The number at each intersection represents a percentage relative to the total number of opinions in the corresponding class. For instance, out of 49,702 manipulative opinions, 96.26% werecorrectly classified as manipulative, and 3.74% were misclassified as non-manipulative. Similarly, out of 317,286 non-manipulative opinions, 98.01% were correctly classified, while the rest (1.99%) were not. Based on the results, the F1 measure was slightly over 92%.
Among the non-manipulative opinions, 1.99% were erroneously classified as manipulative. Most of these opinions were advertisements for products and/or services, unrelated to the political articles. Such opinions were often copied and massively reproduced, which was also7 Further increasing K and S more than doubles labeling effort but only slightly increased the classificationaccuracy, and the F1 measure remained below 93%. common in manipulative opinions. Since this type of post is generally considered undesirable, it would be beneficial to detect and remove them. Among the manipulative opinions, 3.74% were incorrectly classified as non-manipulative. These opinions did not show typical characteristics of manipulative opinions, in that the writers appeared to post only a small number of opinions. Closer inspection showed that many of the corresponding user IDs were used together at almost the same time. We believe the tactic of utilizing multiple IDssimultaneously was used to avoid being reported and deleted. We could incorporate additional features to more accurately detect these cases, such as IP addresses the IDs used. For example, IDs that often share IP addresses can be analyzed together as belonging to the same user.
We compared the proposed system with supervised learning. In particular, we aim to answer whether the proposed system can achieve a comparable level of accuracy with supervised learning, while keeping labeling effort at a minimum. As a supervised classifier, we used Adaboost [30]8 with Decision Stumps as its base learner. We trained the classifierusing a randomly chosen subset of the training data, gradually increasing the subset from 200 to the entire training set (366,988), and measured classification accuracy using the test data. Note that the supervised classifier requires all training data to be labeled, whereas the proposed method requires labeling a small number of selected instances (i.e., seeds).
Fig. 6. Supervised classifier accuracy for different training set sizes
Fig. 6 presents the accuracy of the supervised classifier for different sizes of training data. With 200 labled data, the accuracy stayed below 20%. With 25K labeled data, the accuracy went above 70%, and it reached approximately 90% with 200K labeled data. Final accuracy was slightly above 92% when the entire training data were used. This contrasts with the proposed system, which achieved a nearly equivalent level of accuracy (92%) with far fewerlabeld data (70×3=210 seed instances). This is because the proposed system selects seedinstances with more diverse characteristics—we first cluster instances according to theircharacteristics and then from each cluster, we equally choose seeds to label. In supervised learning, however, it is not likely that the training data contain instances from each and every cluster unless the data is sufficiently large.
5.4 Manipulative Tactics over Time
The opinions collected in Section 3 consist of data from three different years (2012, 2014, and 2017), as shown in Table 1. Over these years, new manipulative tactics may have arisen, and other behaviors reduced or disappeared. We analyze such changes over time. We alsodemonstrate that the proposed system can effectively trace temporal changes.
Fig. 7 shows an overview of our analysis methods. For each year’s data, we construct and color clusters, according to the steps 1-3 in Section 5.1. We constructed K = 70 clusters and labeled S = 3 seeds to color each cluster. We then compare the identified clusters of one year with those of the subsequent year. In particular, we aim to discover three patterns as follows.
1. clusters that expanded or newly appeared,
2. clusters that shrunk or disappered, and
3. clusters that maintained similar scale.
To identify these patterns, we consider two clusters from different years to be the same if theircentroids are composed of nearly equivalent features (i.e., the centroids are in close proximity). We then measure the size of the clusters by the proportion of opinions in the cluster to the entire opinions. We focused on the clusters colored as manipulative, as our objective is tocharacterize and detect manipulation.
Fig. 7. Cluster analysis over time
Our main findings are as follows.
· Increasingly more manipulative opinions have been posted earlier, i.e., within several minutes after the articles were published. Many of these posts were immediately followed by massive approvals, to preemptively occupy top-post positions. Similarly, we observed sharpincreases in the number of disapprovals, intended to denigrate opinions of opposing parties. One way to prevent this type of manipulation would be to rate-limit approvals and disapprovals, e.g. less than 100 votes per minute.
· More manipulative IDs have refrained from posting numerous opinions in a short period, reducing the chance of being reported and deleted. An in-depth analysis showed that many such IDs posted opinions at almost the same time, with similar contents; and these IDs oftenshared a series of letters (e.g. patriot001, patriot002, and patriot003). This indicates that the IDs probably belong to the same manipulative user or group. The proposed system could beexpanded to leverage these similarities (i.e., posting time, content, and ID) to identify agroup of correlated IDs.
· Manipulative opinions with similar text continued to form large clusters over the years. However, more and more such opinions replaced a subset of words by synonyms rather than being exact duplicates. The manipulative opinions also became longer (e.g. over two hundred letters) over time. This suggests that automated tools are being used to composeopinions and post them on behalf of humans. The duplicate-opinion finder could be extended to identify word replacements utilizing lexical databases, such as WordNet [31]. Otheraspects of opinion text, including URLs, numerals, and special characters, continued to existin manipulative opinions at broadly constant levels, providing more clues and increasingreadability.
To summarize the results, the proposed system identified changes in manipulative tactics by tracing cluster features and sizes over time. As more and more sophisticated tactics have been developed, we recommend web administrators perform periodic audits using the proposed system to track manipulation evolution.
5.5 Comparison with Additional Classifiers
In Section 5.3, we compared the proposed system with an Adaboost classifier. In this section, we show a comparison with four additional classifiers that have been widely used in machinelearning applications. These classifiers are Support Vector Machine (SVM) [32], Random Forest (RF) [33], Multi-Layer Perceptron (MLP) [34], and Convolutional Neural Network (CNN) [35].
To implement the SVM classifier, we utilized an open-source library, LIBSVM [36], andits easy script that automatically determines the best parameters. To implement the RF classifier, we used Weka package [37]. The MLP and CNN classifiers were built upon Tensor Flow library [38]. The MLP and CNN employed 3 and 6 layers, respectively, and theirdetails are summarized in Table 7. Further adding layers did not improve classificationaccuracy but increased computational costs. All four classifiers took the 78 features as input (i.e., a 1×78 matrix) and made predictions over the two class labels, manipulative and non-manipulative. We trained the classifiers using gradually increasing subsets of the training data, and measured classification accuracy using the test data.
Table 7. Dimension of MLP and CNN layers
Fig. 8. Accuracy of 4 classifiers with different training set sizes
Fig. 8 presents the accuracy of the four classifiers for different sizes of training data. In allof the classifiers, the accuracy increased as more labeled data were used, and it reached above 90% when greater than 200K labeled data were used. MLP consumed more training data tocatch up with the other classifiers, since it had dense connections and thus needed to learnmore parameters. The final accuracy was not significantly different from that of the proposed system (92%). Note that the four classifiers require all training data to be labeled, whereas the proposed system requires labeling a small subset of the training data (i.e., 70×3=210 seedinstances).
We summarize our major findings as follows:
· In detecting manipulative opinions, the previous classifiers are of limited use in practice, since they require labeling a large number of training data. The proposed systemsignificantly reduces such labeling effort—it first clusters similar instances and then fromeach cluster, it chooses a small number of instances to label.
· As manipulative tactics can change over time, a new set of data needs to be labeled on aregular basis. In such cases, the proposed system can be repeatedly used without incurring too much labeling costs. In addition, tracing emerging tactics is straightforward since they appear as new clusters, as shown in Section 5.4. Such tracing is not as easy in the previousclassifiers because one needs to interpret complex models, e.g. SVM and neural networks.
· The classification accuracy remained below 93%, even though we tried different machine-learning algorithms and large training data. We believe that the accuracy can be improved with the help of additional information. For example, Section 5.3 shows that certain groups of IDs used together did not stand out as manipulative when each ID was only slightly used, and more features (e.g. IP addresses the IDs used) can help detect such cases.
6. Conclusion
We propose a system to inspect opinions in online communities and detect manipulative posts written for political campaigns. Compared to existing tools based on supervised learning, the proposed system accurately discovers manipulative opinions with moderate labeling effor trequired. This is because (i) the system comprehensively identifies the innate structure of opinions (i.e., clusters of opinions with similar characteristics) and uses this structure to select a small number of samples to label, and (ii) it models opinions with features that clearlyseparate manipulative and non-manipulative opinions. Moreover, the proposed system cantrace emerging manipulative tactics, since such changes appear as new clusters.
We evaluated the proposed system using over a million opinions collected during majorpolitical campaigns in South Korea. The system required labeling approximately 200 instances to achieve over 90% classification accuracy, whereas comparable accuracy from supervised approaches would require labeling over 200K instances. We also discovered changes in manipulative tactics, such as early posting followed by approval manipulation, distribution of workload over an increasingly large set of IDs, and widespread use of automatic writing tools.
We believe that the proposed system can provide increased classification accuracy as moreopinion related information becomes available. For example, user IP addresses could helpidentify a group of IDs used together by the same manipulative party. We also plan to apply the proposed system outside the political domain, e.g. to identify manipulative productreviews.
References
- K. Becker, "The handbook of political manipulation," Conservative Daily News, 2012.
- W. H. Riker, "The art of political manipulation," Yale University Press, 1986.
- S. Lee, "Detection of political manipulation in online communities through measures of effort and collaboration," ACM Transactions on the Web, vol. 9, no. 3, article 16, 2015.
- S. Shane and M. Mazzetti, "Inside a 3-year Russian campaign to influence US voters," The New York Times, 2018.
- R. Bond, et al., "A 61-million-person experiment in social influence and political mobilization," Nature, vol. 489, no. 7415, pp. 295-298, 2012. https://doi.org/10.1038/nature11421
- J. Ratkiewicz, M. D. Conover, M. Meiss, B. Goncalves, A. Flammini, and F. Menczer, "Detecting and tracking political abuse in social media," in Proc. of ICWSM, pp.294-304, 2011.
- "South Korea's spy agency admits trying to influence 2012 poll," BBC News, 2017.
- D. Mocanu, L. Rossi, Q. Zhang, M. Karsai, and W. Quattrociocchi, "Collective attention in the age of misinformation," CoRR, 2014.
- "Russian Twitter political protests swamped by spam," BBC News, 2012.
- R. K. Garrett and B. E. Weeks, "The promise and peril of real-time corrections to political misperceptions," in Proc. of ACM CSCW, pp. 1047-1058, 2013.
- "Manipulation of online public opinion and the battle of Naver's real-time search words," The Kyunghyang Shinmun, 2018.
- B. Pang and L. Lee, "Opinion mining and sentiment analysis," Now Publishers, 2008.
- H. Oh and S. Kim, "Identifying and exploiting trustable users with robust features in online rating systems," KSII Tranactions on Internet and Information Systems, vol. 11, no. 4, pp. 2171-2195, 2017.
- A. J. Minnich, N. Chavoshi, A. Mueen, S. Luan, M. Faloutsos, "TrueView: harnessing the power of multiple review sites," in Proc. of WWW, pp.787-797, 2015.
- H. Li, G. Fei, S. Wang, B. Liu, W. Shao, A. Mukherjee, and J. Shao, "Bimodal distribution and co-bursting in review spam detection," in Proc. of WWW, pp.1063-1072, 2017.
- S. Kwon, M. Cha, K. Jung, W. Chen, and Y. Wang, "Prominent features of rumor propagation in online social media," in Proc. of IEEE ICDM, pp. 1103-1108, 2013.
- G. Wang, T. Konolige, C. Wilson, X. Wang, H. Zheng, and B. Y. Zhao, "You are how you click: clickstream analysis for sybil detection," in Proc. of USENIX Security Symposium, pp.241-256, 2013.
- S. Choe, "Prosecutors detail attempt to sway South Korean election," The New York Times, 2013.
- H. Fawcett, "South Korea's political cyber war," Aljazeera, 2013.
- A. Shin, "Opposition party apologizes for spreading fake news on president's son during election," Arirang, 2017.
- H. Olsen, "North Korean weighs in on South Korean presidential election," KoreaBANG, 2012.
- "KoreanClick: Nielsen KoreanClick syndicated reports," Nielsen KoreanClick, 2015.
- "Manipulation of recommendation counts by military and government agencies," Media Today, 2013.
- "Responsibility of users for their postings," Nate, 2016.
- A. Joy, "How South Korean intelligence interfered in election," KoreaBANG, 2013.
- J. Fleiss, "Measuring nominal scale agreement among many raters," Psychological Bulletin, vol. 76, no. 5, pp. 378-382, 1971. https://doi.org/10.1037/h0031619
- R. Landis and G. Koch, "The measurement of observer agreement for categorical data," Biometrics, vol. 33, no.1, pp. 159-174, 1977. https://doi.org/10.2307/2529310
- B. Liu, "Web data mining: exploring hyperlinks, contents, and usage data," Springer, 2011.
- S. P. Lloyd, "Least squares quantization in PCM," IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129-137, 1982. https://doi.org/10.1109/TIT.1982.1056489
- Y. Freund and R. E. Schapire, "A short introduction to boosting," Journal of Japanese Society for Artificial Intelligence, vol. 14, no. 5, pp. 1-14, 1999.
- C. Fellbaum and G. A. Miller, "Wordnet: an electronic lexical database (language, speech, and communication)," MIT Press, 1998.
- V. Vapnik, "The nature of statistical learning theory," Springer, 2000.
- L. Breiman, "Random forests," Springer Machine Learning, vol. 45, no. 1, pp. 5-32, 2001. https://doi.org/10.1023/A:1010933404324
- S. Haykin, "Neural networks and learning machines," Pearson, 2009.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Communications of the ACM, vol. 60, no. 6, pp. 84-90, 2017. https://doi.org/10.1145/3065386
- C. Chang and C. Lin, "LIBSVM-a library for support vector machines," Retrieved August 24, 2018.
- "Weka 3: data mining software in Java," Retrieved August 24, 2018.
- "TensorFlow: an open source machine learning library for research and production," Retrieved August 24, 2018.