Filter by type:

Sort by year:

WebShapes: Network Visualization with 3D Shapes

Shengmin Jin, Richard Wituszynski, Max Caiello-Gingold, Reza Zafarani
Conference PapersIn Proceedings of the International conference on search and data mining (WSDM) 2020

Abstract

Network visualization has played a critical role in graph analysis, as it not only presents a big picture of a network but also helps reveal the structural information of a network. The most popular visual representation of networks is the node-link diagram. However, visualizing a large network with the node-link diagram can be challenging due to the difficulty in obtaining an optimal graph layout. To address this challenge, a recent advancement in network representation: network shape, allows one to compactly represent a network and its subgraphs with the distribution of their embeddings. Inspired by this research, we have designed a web platform WebShapes that enables researchers and practitioners to visualize their network data as customized 3D shapes (http://b.link/webshapes)Furthermore, we provide a case study on real-world networks to explore the sensitivity of network shapes to different graph sampling, embedding, and fitting methods, and we show examples of understanding networks through their network shapes.

Sentiment Paradoxes in Social Networks: Why Your Friends are More Positive Than You?

Xinyi Zhou, Shengmin Jin, Reza Zafarani
Conference PapersIn Proceedings of the International AAAI Conference on Weblogs and Social Media (ICWSM) 2020 Conference

Abstract

Most individuals consider their friends to be more positive than themselves, exhibiting a sentiment paradox. Psychological research attributes this paradox to human cognition bias. With the goal to understand this phenomenon, we study sentiment paradoxes in social networks. Our work shows that social connections (friends, followees, or followers) of users are indeed generally (not illusively) more positive than the users themselves. Five existing sentiment paradoxes are identified at different network levels ranging from triads to large-scale communities. Empirical and theoretical evidence are provided to verify the observed and expected existence of such sentiment paradoxes. By investigating the relationships between the sentiment paradox and other well-developed network paradoxes, i.e., friendship paradox and activity paradox, we found that user sentiments are positively correlated to their number of social connections while hardly to their social activity. Finally, we demonstrate how the validated sentiment paradoxes can be used in turn to predict user sentiments.

Network Identification and Authentication

Shengmin Jin, Vir V. Phoha, Reza Zafarani
Conference PapersIn Proceedings of the International Conference on Data Mining (ICDM) 2019 Conference

Abstract

Research on networks is commonly performed using anonymized network data for various reasons such as protecting data privacy. Under such circumstances, it is difficult to verify the source of network data, which leads to questions such as: Given an anonymized graph, can we identify the network from which it is collected? Or if one claims the graph is sampled from a certain network, can we verify it? The intuitive approach is to check for subgraph isomorphism. However, subgraph isomorphism is NP-complete; hence, infeasible for most large networks. Inspired by biometrics studies, we address these challenges by formulating two new problems: network identification and network authentication. To tackle these problems, similar to research on human fingerprints, we introduce two versions of a network identity: (1) embedding-based identity and (2) distribution-based identity. We demonstrate the effectiveness of these network identities on various real-world networks. Using these identities, we propose two approaches for network identification. One method uses supervised learning and can achieve an identification accuracy rate of 94.7%, and the other, which is easier to implement, relies on distances between identities and achieves an accuracy rate of 85.5%. For network authentication, we propose two methods to build a network authentication system. The first is a supervised learner and provides a low false accept rate and the other method allows one to control the false reject rate with a reasonable false accept rate across networks. Our study can help identify or verify the source of network data, validate network-based research, and be used for network-based biometrics.

Credibility-based Fake News Detection

Niraj Sitaula, Chilukuri K. Mohan, Jennifer Grygiel, Xinyi Zhou, Reza Zafarani
Preprint

Abstract

Fake news can significantly misinform people who often rely on online sources and social media for their information. Current research on fake news detection has mostly focused on analyzing fake news content and how it propagates on a network of users. In this paper, we emphasize the detection of fake news by assessing its credibility. By analyzing public fake news data, we show that information on news sources (and authors) can be a strong indicator of credibility. Our findings suggest that an author’s history of association with fake news, and the number of authors of a news article, can play a significant role in detecting fake news. Our approach can help improve traditional fake news detection methods, wherein content features are often used to detect fake news.

Network-based Fake News Detection: A Pattern-driven Approach

Xinyi Zhou, Reza Zafarani
Journal PaperACM SIGKDD Explorations, December 2019

Abstract

Fake news gains has gained significant momentum, strongly motivating the need for fake news research. Many fake news detection approaches have thus been proposed, where most of them heavily rely on news content. However, network-based clues revealed when analyzing news propagation on social networks is an information that has hardly been comprehensively explored or used for fake news detection. We bridge this gap by proposing a network-based pattern-driven fake news detection approach. We aim to study the patterns of fake news in social networks, which refer to the news being spread, spreaders of the news and relationships among the spreaders. Empirical evidence and interpretations on the existence of such patterns are provided based on social psychological theories. These patterns are then represented at various network levels (i.e., node-level, ego-level, triad-level, community-level and the overall network) for being further utilized to detect fake news. The proposed approach enhances the explainability in fake news feature engineering. Experiments conducted on real-world data demonstrate that the proposed approach can outperform the state of the arts.

The Impact of Graph Structure on Small-World Shortest Paths

Pegah Hozhabrierdi, Reza Zafarani
Conference PapersIn Proceedings of the SBP-Brims 2019 Conference

Abstract

Numerous studies have been devoted to modeling and estimating shortest-paths in complex networks. To maintain generality, these studies have neglected a common property of complex social networks; small-world phenomenon (colloquially stated as six degrees of separation). Based on the intuition behind the flow of information in smallworlds, we propose a small-world representation for social networks. In this new representation, we study the influence of different network measures on the shortest-paths. We perform a comprehensive analysis on a large set of local and global network measures and report our findings for various social networks. The results of our analyses show that: (1) shortest path lengths in small worlds are strongly correlated to the maximum degree centrality and the diameter. In fact, using these two features one can predict the average path length more accurately than using any other feature alone; (2) when nodes are ranked according to their average shortest-path lengths, we can approximate this ranking by a shifted standard normal distribution with minimum information loss. The shift can be estimated by the rank of the node with maximum local clustering coefficient, which can be computed in linear or constant time

Fake News Research: Theories, Detection Strategies, and Open Problems

Reza Zafarani, Xinyi Zhou, Kai Shu, Huan Liu
TutorialProceedings 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2019)

Abstract

The explosive growth of fake news and its erosion to democracy, justice, and public trust increased the demand for fake news detection. As an interdisciplinary topic, the study of fake news encourages a concerted effort of experts in computer and information science, political science, journalism, social science, psychology, and economics. A comprehensive framework to systematically understand and detect fake news is necessary to attract and unite researchers in related areas to conduct research on fake news. This tutorial aims to clearly present (1) fake news research, its challenges, and research directions; (2) a comparison between fake news and other related concepts (e.g., rumors); (3) the fundamental theories developed across various disciplines that facilitate interdisciplinary research; (4) various detection strategies unified under a comprehensive framework for fake news detection; and (5) the state-of-the-art datasets, patterns, and models. We present fake news detection from various perspectives, which involve news content and information in social networks, and broadly adopt techniques in data mining, machine learning, natural language processing, information retrieval and social search. Facing the upcoming 2020 U.S. presidential election, challenges for automatic, effective and efficient fake news detection are also clarified in this tutorial.

Fake News Early Detection: A Theory-driven Model

Xinyi Zhou, Atishay Jain, Vir V. Phoha, Reza Zafarani
Preprint

Abstract

The explosive growth of fake news and its erosion of democracy, justice, and public trust has significantly increased the demand for accurate fake news detection. Recent advancements in this area have proposed novel techniques that aim to detect fake news by exploring how it propagates on social networks. However, to achieve fake news early detection, one is only provided with limited to no information on news propagation; hence, motivating the need to develop approaches that can detect fake news by focusing mainly on news content. In this paper, a theory-driven model is proposed for fake news detection. The method investigates news content at various levels: lexicon-level, syntax-level, semantic-level and discourse-level. We represent news at each level, relying on well-established theories in social and forensic psychology. Fake news detection is then conducted within a supervised machine learning framework. As an interdisciplinary research, our work explores potential fake news patterns, enhances the interpretability in fake news feature engineering, and studies the relationships among fake news, deception/disinformation, and clickbaits. Experiments conducted on two real-world datasets indicate that the proposed method can outperform the state-of-the-art and enable fake news early detection, even when there is limited content information.

The Role of User Profiles for Fake News Detection

Kai Shu, Xinyi Zhou, Suhang Wang, Reza Zafarani, Huan Liu
Conference Papers IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2019)

Abstract

Consuming news from social media is becoming increasingly popular. Social media appeals to users due to its fast dissemination of information, low cost, and easy access. However, social media also enables the widespread of fake news. Because of the detrimental societal effects of fake news, detecting fake news has attracted increasing attention. However, the detection performance only using news contents is generally not satisfactory as fake news is written to mimic true news. Thus, there is a need for an in-depth understanding on the relationship between user profiles on social media and fake news. In this paper, we study the challenging problem of understanding and exploiting user profiles on social media for fake news detection. In an attempt to understand connections between user profiles and fake news, first, we measure users’ sharing behaviors on social media and group representative users who are more likely to share fake and real news; then, we perform a comparative analysis of explicit and implicit profile features between these user groups, which reveals their potential to help differentiate fake news from real news. To exploit user profile features, we demonstrate the usefulness of these user profile features in a fake news classification task. We further validate the effectiveness of these features through feature importance analysis. The findings of this work lay the foundation for deeper exploration of user profile features of social media and enhance the capabilities for fake news detection.

Fake News Detection: An Interdisciplinary Research

Xinyi Zhou, Reza Zafarani
Conference PapersIn Companion of The Web (WWW 2019) Conference

Abstract

The explosive growth of fake news and its erosion to democracy, journalism and economy has increased the demand for fake news detection. To achieve efficient and explainable fake news detection, an interdisciplinary approach is required, relying on scientific contributions from various disciplines, e.g., social sciences, engineering, among others. Here, we illustrate how such multidisciplinary contributions can help detect fake news by improving feature engineering, or by providing well-justified machine learning models. We demonstrate how news content, news propagation patterns, and users’ engagements with news can help detect fake news.

Fake News: Fundamental Theories, Detection Strategies and Challenges

Xinyi Zhou, Reza Zafarani, Kai Shu, Huan Liu
TutorialProceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM 2019)

Abstract

The explosive growth of fake news and its erosion to democracy, justice, and public trust increased the demand for fake news detection. As an interdisciplinary topic, the study of fake news encourages a concerted effort of experts in computer and information science, political science, journalism, social science, psychology, and economics. A comprehensive framework to systematically understand and detect fake news is necessary to attract and unite researchers in related areas to conduct research on fake news. This tutorial aims to clearly present (1) fake news research, its challenges, and research directions; (2) a comparison between fake news and other related concepts (e.g., rumors); (3) the fundamental theories developed across various disciplines that facilitate interdisciplinary research; (4) various detection strategies unified under a comprehensive framework for fake news detection; and (5) the state-of-the-art datasets, patterns, and models. We present fake news detection from various perspectives, which involve news content and information in social networks, and broadly adopt techniques in data mining, machine learning, natural language processing, information retrieval and social search. Facing the upcoming 2020 U.S. presidential election, challenges for automatic, effective and efficient fake news detection are also clarified in this tutorial.

Representing Networks with 3D Shapes

Shengmin Jin, Reza Zafarani
Conference PapersThe IEEE International Conference on Data Mining series (ICDM 2018) [Code]

Abstract

There has been a surge of interest in machine learning in graphs, as graphs and networks are ubiquitous across the globe and within science and engineering: road networks, power grids, protein-protein interaction networks, scientific collaboration networks, social networks, to name a few. Recent machine learning research has focused on efficient and effective ways to represent graph structure. Existing graph representation methods such as network embedding techniques learn to map a node (or a graph) to a vector in a low-dimensional vector space. However, the mapped values are often difficult to interpret, lacking information on the structure of the network or its subgraphs. Instead of using a low-dimensional vector to represent a graph, we propose to represent a network with a 3-dimensional shape: the network shape. We introduce the first network shape, a Kronecker hull, which represents a network as a 3D convex polyhedron using stochastic Kronecker graphs. We present a linear time algorithm to build Kronecker hulls. Network shapes provide a compact representation of networks that is easy to visualize and interpret. They captures various properties of not only the network, but also its subgraphs. For instance, they can provide the distribution of subgraphs within a network, e.g., what proportion of subgraphs are structurally similar to the whole network? Using experiments on real-world networks, we show how network shapes can be used in various applications, from computing similarity between two graphs (using the overlap between network shapes of two networks) to graph compression, where a graph with millions of nodes can be represented with a convex hull with less than 40 boundary points.

Fake News: A Survey of Research, Detection Methods, and Opportunities

Xinyi Zhou, Reza Zafarani
PreprintarXiv:1812.00315

Abstract

The explosive growth in fake news and its erosion to democracy, justice, and public trust has increased the demand for fake news analysis, detection and intervention. This survey comprehensively and systematically reviews fake news research. The survey identifies and specifies fundamental theories across various disciplines, e.g., psychology and social science, to facilitate and enhance the interdisciplinary research of fake news. Current fake news research is reviewed, summarized and evaluated. These studies focus on fake news from four perspective: (1) the false knowledge it carries, (2) its writing style, (3) its propagation patterns, and (4) the credibility of its creators and spreaders. We characterize each perspective with various analyzable and utilizable information provided by news and its spreaders, various strategies and frameworks that are adaptable, and techniques that are applicable. By reviewing the characteristics of fake news and open issues in fake news studies, we highlight some potential research tasks at the end of this survey.

Sentiment Prediction in Social Networks

Shengmin Jin, Reza Zafarani
Conference PapersThe IEEE International Conference on Data Mining series (ICDM 2018) [Dataset]

Abstract

Sentiment analysis research has focused on using text for predicting sentiments without considering the unavoidable peer influence on user emotions and opinions. The lack of large-scale ground-truth data on sentiments of users in social networks has limited research on how predictable sentiments are from social ties. In this paper, using a large-scale dataset on human sentiments, we study sentiment prediction within social networks. We demonstrate that sentiments are predictable using structural properties of social networks alone. With social science and psychology literature, we provide evidence on sentiments being connected to social relationships at four different network levels, starting from the ego-network level and moving up to the whole-network level. We discuss emotional signals that can be captured at each level of social relationships and investigate the importance of structural features on each network levels. We demonstrate that sentiment prediction that solely relies on social network structure can be as (or more) accurate than text-based techniques. For the situations where complete posts and friendship information are difficult to get, we analyze the trade-off between the sentiment prediction performance and the available information. When computational resources are limited, we show that using only four network properties, one can predict sentiments with competitive accuracy. Our findings can be used to (1) validate the peer influence on user sentiments, (2) improve classical text-based sentiment prediction methods, (3) enhance friend recommendation by utilizing sentiments, and (4) help identify personality traits.

Emotions in Social Networks: Distributions, Patterns, and Models

Shengmin Jin, Reza Zafarani
Conference PapersProceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM 2017) [Dataset]

Abstract

Understanding the role emotions play in social interactions has been a central research question in the social sciences. However, the challenge of obtaining large-scale data on human emotions has left the most fundamental questions on emotions less explored: How do emotions vary across individuals, evolve over time, and are connected to social ties? We address these questions using a large-scale dataset of users that contains both their emotions and social ties. Using this dataset, we identify patterns of human emotions on five different network levels, starting from the user-level and moving up to the whole-network level. At the user-level, we identify how human emotions are distributed and vary over time. At the ego-network level, we find that assortativity is only observed with respect to positive moods. This observation allows us to introduce emotional balance, the "dual'' of structural balance theory. We show that emotional balance has a natural connection to structural balance theory. At the community-level, we find that community members are emotionally-similar and that this similarity is stronger in smaller communities. Structural properties of communities, such as their sparseness or isolatedness, are also connected to the emotions of their members. At the whole-network level, we show that there is a tight connection between the global structure of a network and the emotions of its members. As a result, we demonstrate how one can accurately predict the proportion of positive/negative users within a network by only looking at the network structure. Based on our observations, we propose the Emotional-Tie model -- a network model that can simulate the formation of friendships based on emotions. This model generates graphs that exhibit both patterns of human emotions identified in this work and those observed in real-world social networks, such as having a high clustering coefficient. Our findings can help better understand the interplay between emotions and social ties.

User Identity Linkage across Online Social Networks: A Review.

Kai Shu, Suhang Wang, Jiliang Tang, Reza Zafarani, Huan Liu
Journal PaperACM SIGKDD Explorations, March 2017

Abstract

The increasing popularity and diversity of social media sites has encouraged more and more people to participate on multiple online social networks to enjoy their services. Each user may create a user identity, which can includes profile, content, or network information, to represent his or her unique public figure in every social network. Thus, a fundamental question arises -- can we link user identities across online social networks? User identity linkage across online social networks is an emerging task in social media and has attracted increasing attention in recent years. Advancements in user identity linkage could potentially impact various domains such as recommendation and link prediction. Due to the unique characteristics of social network data, this problem faces tremendous challenges. To tackle these challenges, recent approaches generally consist of (1) extracting features and (2) constructing predictive models from a variety of perspectives. In this paper, we review key achievements of user identity linkage across online social networks including stateof- the-art algorithms, evaluation metrics, and representative datasets. We also discuss related research areas, open problems, and future research directions for user identity linkage across online social networks.

The good, the bad, and the ugly: uncovering novel research opportunities in social media mining

Huan Liu, Fred Morstatter, Jiliang Tang,Reza Zafarani
Journal PaperInternational Journal of Data Science and Analytics, 1, no. 3-4 (2016): 137-143

Abstract

Big data is ubiquitous and can only become bigger, which challenges traditional data mining and machine learning methods. Social media is a new source of data that is significantly different from conventional ones. Social media data are mostly user-generated, and are big, linked, and heterogeneous. We present the good, the bad and the ugly associated with the multi-faceted social media data and exemplify the importance of some original problems with real-world examples. We discuss bias in social media data, evaluation dilemma, data reduction, inferring invisible information, and big-data paradox. We illuminate new opportunities of developing novel algorithms and tools for data science. In our endeavor of employing the good to tame the bad with the help of the ugly, we deepen the understanding of ever growing and continuously evolving data and create innovative solutions with interdisciplinary and collaborative research of data science.

Users Joining Multiple Sites: Friendship and Popularity Variations across Sites

Reza Zafarani, Huan Liu
Journal PaperInformation Fusion, March 2016

Abstract

Our social media experience is no longer limited to a single site. We use different social media sites for different purposes and our information on each site is often partial. By collecting complementary information for the same individual across sites, one can better profile users. These profiles can help improve online services such as advertising or recommendation across sites. To combine complementary information across sites, it is critical to understand how information for the same individual varies across sites. In this study, we aim to understand how two fundamental properties of users vary across social media sites. First, we study how user friendship behavior varies across sites. Our findings show how friend distributions for individuals change as they join new sites. Next, we analyze how user popularity changes across sites as individuals join different sites. We evaluate our findings and demonstrate how our findings can be employed to predict how popular users are likely to be on new sites they join.

Real-Time Crisis Mapping using Language Distribution

Justin Sampson, Fred Morstatter, Reza Zafarani, Huan Liu
Conference PapersThe IEEE International Conference on Data Mining series (ICDM 2015)

Abstract

With the increase in GPS-enabled devices, social media sites, such as Twitter, are quickly becoming a prime outlet for timely geo-spatial data. Such data can be leveraged to aid in emergency response planning and recovery operations. Unfortunately, the information overload poses significant difficulty to the quick discovery and identification of emergency situation areas. The system tackles this challenge by providing real-time mapping of influence areas based on automatic analysis of the flow of discussion using language distributions. The workflow is then further enhanced through the addition of keyword surprise mapping which projects the general divergence map onto specific task-level keywords for precise and focused response.

10 Bits of Surprise: Detecting Malicious Users with Minimum Information

Reza Zafarani, Huan Liu
Conference Papers24th ACM International Conference on Information and Knowledge Management (CIKM 2015)

Abstract

Malicious users are a threat to many sites and defending against them demands innovative countermeasures. When malicious users join sites, they provide limited information about themselves. With this limited information, sites can find it difficult to distinguish between a malicious user and a normal user. In this study, we develop a methodology that identifies malicious users with limited information. As information provided by malicious users can vary, the proposed methodology utilizes minimum information to identify malicious users. It is shown that as little as 10 bits of information can help greatly in this challenging task. The experiments results verify that this methodology is effective in identifying malicious users in the realistic scenario of limited information availability.

User Identification across Social Media

Reza Zafarani, Lei Tang, Huan Liu
Journal PaperACM Transactions on Knowledge Discovery from Data (TKDD), October 2015

Abstract

People use various social media sites for different purposes. The information on each site is often partial. When sources of complementary information are integrated, a better profile of a user can be built. This profile can help improve online services such as advertising across sites. To integrate these sources of information, it is necessary to identify individuals across social media sites. This paper aims to address the cross-media user identification problem. We provide evidence on the existence of a mapping among identities of individuals across social media sites, study the feasibility of finding this mapping, and illustrate and develop means for finding this mapping. Our studies show that effective approaches that exploit information redundancies due to users’ unique behavioral patterns can be utilized to find such a mapping. This study paves the way for analysis and mining across social networking sites, and facilitates the creation of novel online services across sites. In particular, recommending friends and advertising across networks, analyzing information diffusion across sites, and studying specific user behavior such as user migration across sites in social media are one of the many areas that can benefit from the results of this study

Sarcasm Detection on Twitter: A Behavioral Modeling Approach

Ashwin Rajadesingan, Reza Zafarani, Huan Liu
Conference PapersThe Eighth International conference on search and data mining (WSDM 2015) [Dataset]

Abstract

Sarcasm is a nuanced form of language in which individuals state the opposite of what is implied. With this intentional ambiguity, sarcasm detection has always been a challenging task, even for humans. Current approaches to automatic sarcasm detection rely primarily on lexical and linguistic cues. This paper aims to address the difficult task of sarcasm detection on Twitter by leveraging behavioral traits intrinsic to users expressing sarcasm. We identify such traits using the user's past tweets. We employ theories from behavioral and psychological studies to construct a behavioral modeling framework tuned for detecting sarcasm. We evaluate our framework and demonstrate its efficiency in identifying sarcastic tweets.

Evaluation without Ground Truth in Social Media Research

Reza Zafarani, Huan Liu
Journal PaperCommunications of the ACM, June 2015

Excerpt

With the rise of social media, user generated content has become available at an unprecedented scale. These massive collections of user generated content can help better understand billions of individuals. This data also enables novel scientific research in social sciences, anthropology, psychology, and economics at scale. Scientific research demands reproducible and independently verifiable findings. In social media research, scientific findings can be in form of behavioral patterns such as individuals commented on this Facebook post because of its quality. To validate such patterns one can survey the individuals that exhibited the pattern to verify if the patterns truly captured their intentions. This type of validation is known as evaluation with ground truth in data mining. However, social media users are scattered all across the globe. With no face-to-face access to individuals on social media, is it even possible to perform evaluation for social media research? In other words, how can we verify that the user behavioral patterns found are the `true patterns' of these individuals?

Social Media Mining: An Introduction

Book Cambridge University Press | April 30, 2014 | ISBN: 9781107018853
image

Abstract

The growth of social media over the last decade has revolutionized the way individuals interact and industries conduct business. Individuals produce data at an unprecedented rate by interacting, sharing, and consuming content through social media. Understanding and processing this new type of data to glean actionable patterns presents challenges and opportunities for interdisciplinary research, novel algorithms, and tool development. Social Media Mining integrates social media, social network analysis, and data mining to provide a convenient and coherent platform for students, practitioners, researchers, and project managers to understand the basics and potentials of social media mining. It introduces the unique problems arising from social media data and presents fundamental concepts, emerging issues, and effective algorithms for network analysis and data mining. Suitable for use in advanced undergraduate and beginning graduate courses as well as professional short courses, the text contains exercises of different degrees of difficulty that improve understanding and help apply concepts, principles, and methods in various scenarios of social media mining.

Behavior Analysis in Social Media

Reza Zafarani, Huan Liu
Journal Paper IEEE Intelligent Systems, Volume 29, Issue 4, 2014, Pages 9-11

Excerpt

With the rise of social media, information sharing has been democratized. As a result, users are given opportunities to exhibit different behaviors such as sharing, posting, liking, commenting, and befriending conveniently and on a daily basis. By analyzing behaviors observed on social media, we can categorize these behaviors into individual and collective behavior. Individual behavior is exhibited by a single user, whereas collective behavior is observed when a group of users behave together. For instance, users using the same hashtag on Twitter or migrating to another social media site are examples of collective behavior. User activities on social media generate behavioral data, which is massive, expansive, and indicative of user preferences, interests, opinions, and relationships. This behavioral data provides a new lens through which we can observe and analyze individual and collective behaviors of users.

Am I More Similar to My Followers or Followees? Analyzing Homophily Effect in Directed Social Networks

Mohammad Ali Abbasi, Reza Zafarani, Jiliang Tang, Huan Liu
Conference PapersThe 25th ACM Conference on Hypertext and Social Media (HT 2014)

Abstract

Homophily is the theory behind the formation of social ties between individuals with similar characteristics or interests. Based on homophily, in a social network it is expected to observe a higher degree of homogeneity among connected than disconnected people. Many researchers use this simple yet effective principal to infer users' missing information and interests based on the information provided by their neighbors. In a directed social network, the neighbors can be further divided into followers and followees. In this work, we investigate the homophily effect in a directed network. To explore the homophily effect in a directed network, we study if a user’s personal preferences can be inferred from those of users connected to her (followers or followees). We also study the effectiveness of each of these two groups on prediction one's preferences.

Users Joining Multiple Sites: Distributions and Patterns

Reza Zafarani, Huan Liu
Conference PapersThe 8th International AAAI Conference on Weblogs and Social Media (ICWSM-14)

Abstract

The rise of social media has led to an explosion in the number of possible sites users can join. However, this same profusion of social media sites has made it nearly impossible for users to actively engage in all of them simultaneously. Accordingly, users must make choices about which sites to use or to neglect. In this paper, we study users that have joined multiple sites. We study how individuals are distributed across sites, the way they select sites to join, and behavioral patterns they exhibit while selecting sites. Our study demonstrates that while users have a tendency to join the most popular or trendiest sites, this does not fully explain users' selections. We demonstrate that peer pressure also influences the decisions users make about joining emerging sites.

Finding Friends on a New Site Using Minimum Information

Reza Zafarani, Huan Liu
Conference Papers SIAM International Conference on Data Mining (SDM) 2014

Abstract

With the emergence of numerous social media sites, individuals, with their limited time, often face a dilemma of choosing a few sites over others. Users prefer more engaging sites, where they can find familiar faces such as friends, relatives, or colleagues. Link prediction methods help find friends using link or content information. Unfortunately, whenever users join any site, they have no friends or any content generated. In this case, sites have no chance other than recommending random influential users to individuals hoping that users by befriending them create sufficient information for link prediction techniques to recommend meaningful friends. In this study, by considering social forces that form friendships, namely, influence, homophily, and confounding, and by employing minimum information available for users, we demonstrate how one can significantly improve random predictions without link or content information. In addition, contrary to the common belief that similarity between individuals is the essence of forming friendships, we show that it is the similarity that one exhibits to the friends of another individual that plays a more decisive role in predicting their future friendship.

Connecting Users across Social Media Sites: A Behavioral-Modeling Approach

Reza Zafarani, Huan Liu
Conference Papers 19th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD-13) Pages 41-49

Abstract

People use various social media for different purposes. The information on an individual site is often incomplete. When sources of complementary information are integrated, a better profile of a user can be built to improve online services such as verifying online information. To integrate these sources of information, it is necessary to identify individuals across social media sites. This paper aims to address the cross-media user identification problem. We introduce a methodology (MOBIUS) for finding a mapping among identities of individuals across social media sites. It consists of three key components: the first component identifies users' unique behavioral patterns that lead to information redundancies across sites; the second component constructs features that exploit information redundancies due to these behavioral patterns; and the third component employs machine learning for effective user identification. We formally define the cross-media user identification problem and show that MOBIUS is effective in identifying users across social media sites. This study paves the way for analysis and mining across social media sites, and facilitates the creation of novel online services across sites.

Whom Should I Follow? Identifying Relevant Users in Crisis

Conference Papers 24th ACM Conference on Hypertext and Social Media (HT 2013)

Abstract

Social media is gaining popularity as a medium of communication before, during, and after crises. In several recent disasters, it has become evident that social media sites like Twitter and Facebook are an important source of information, and in cases they have even assisted in relief efforts. We propose a novel approach to identify a subset of active users during a crisis who can be tracked for fast access to information. Using a Twitter dataset that consists of 12.9 million tweets from 5 countries that are part of the "Arab Spring" movement, we show how instant information access can be achieved by user identification along two dimensions: user's location and the user's affinity towards topics of discussion. Through evaluations, we demonstrate that users selected by our approach generate more information and the quality of the information is better than that of users identified using state-of-the-art techniques.

Social Media Mining: Fundamental Issues and Challenges

Mohammad Ali Abbasi, Huan Liu, Reza Zafarani
TutorialProceedings of the IEEE International Conference on Data Mining (ICDM 2013)

Abstract

Social media generates massive amounts of user-generated-content data. Such data differs from classic data and poses new challenges to data mining. This tutorial presents fundamental issues of social media mining, ranging from network representation to influence/diffusion modeling, elaborate state-of-the-art approaches of processing and analyzing social media data, and show how to utilize patterns to real-world applications, such as recommendation and behavior analytics. The tutorials designed for researchers, students and scholars interested in studying social media and social networks. No prerequisite is required for ICDM participants to attend this tutorial.

Maximizing Benefits from Crowdsourced Data

Geoffrey Barbier, Reza Zafarani, Huiji Gao, Gabriel Fung, Huan Liu
Journal Paper Journal of Computational and Mathematical Organization Theory, Volume 18, Issue 3, 2012, Pages 257-279

Abstract

Crowds of people can solve some problems faster than individuals or small groups. A crowd can also rapidly generate data about circumstances affecting the crowd itself. This crowdsourced data can be leveraged to benefit the crowd by providing information or solutions faster than traditional means. However, the crowdsourced data can hardly be used directly to yield usable information. Intelligently analyzing and processing crowdsourced information can help prepare data to maximize the usable information, thus returning the benefit to the crowd. This article highlights challenges and investigates opportunities associated with mining crowdsourced data to yield useful information, as well as details how crowdsource information and technologies can be used for response-coordination when needed, and finally suggests related areas for future research.

Analyzing Behavior of the Influentials across Social Media

Book Chapter Behavior Computing: Modeling, Analysis, Mining, and Decision. Editors: Longbing Cao and Philip S. Yu. Springer. | April 12, 2012 | ISBN-10: 1447129687
image

Excerpt

Studying these behavior patterns of users across different social media sites have many applications. If an individual exhibits same behavior on various social media sites then it can help predict his behavior on other social media websites by studying one social media site. Various social media sites can be clustered based on the behavior patterns of individuals. These clusters can help discover helpful and valuable trends. The activity of individuals can also help in explaining which social media sites are likely to get more activity for various groups of people. These patterns can be also used to explore marketing opportunities, study the movement of individuals on social media sites to focus on niche sites for unique opportunities.

Understanding User Migration Patterns in Social Media

Shamanth Kumar, Reza Zafarani, Huan Liu
Conference Papers Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI-11)

Abstract

Incredible growth of the social web over the last decade has created a flurry of new social media sites for the users to choose from. Users with a lack of time and resources to engage on these sites must inevitably choose specific sites among them to remain active on. This choice more often than not leads to user migration, a well studied phenomenon in fields such as sociology and psychology. Users are the valuable assets for any site as they generate revenue and contribute to the growth of the site. Hence, It is essential for social media sites to establish and understand the reasons for user migration to prevent it. In this paper, we investigate whether people migrate, and if they do, how they migrate. We formalize site migration to help identify the migration between popular social media sites and determine clear patterns of migration between sites. This work suggests that it is feasible to study migration patterns. These patterns can help understand social media sites and gauge their popularity to improve business intelligence and revenue generation through the retention of users.

Sentiment Propagation in Social Networks: A Case Study in LiveJournal

Reza Zafarani, William D. Cole, Huan Liu
Book Chapter Advances in Social Computing, Editors: Sun-Ki Chai, John J. Salerno, and Patricia L. Mabry. Springer. | March, 2010

Abstract

Social networking websites have facilitated a new style of communication through blogs, instant messaging, and various other techniques. Through collaboration, millions of users participate in millions of discussions every day. However, it is still difficult to determine the extent to which such discussions affect the emotions of the participants. We surmise that emotionally-oriented discussions may affect a given user.s general emotional bent and be reflected in other discussions he or she may initiate or participate in. It is in this way that emotion (or sentiment) may propagate through a network. In this paper, we analyze sentiment propagation in social networks, review the importance and challenges of such a study, and provide methodologies for measuring this kind of propagation. A case study has been conducted on a large dataset gathered from the LiveJournal social network. Experimental results are promising in revealing some aspects of the sentiment propagation taking place in social networks.

Convergence of Influential Bloggers for Topic Discovery in the Blogosphere

Shamanth Kumar, Reza Zafarani, Mohammad Ali Abbasi, Geoffrey Barbier, Huan Liu
Book ChapterAdvanced in Social Computing, Editors: Sun-Ki Chai, John J. Salerno, and Patricia L. Mabry. Springer. | March, 2010

Abstract

In this paper, we propose a novel approach to automatically detect .hot. or important topics of discussion in the blogosphere. The proposed approach is based on analyzing the activity of influential bloggers to determine specific points in time when there is a convergence amongst the influential bloggers in terms of their topic of discussion. The tool BlogTrackers, is used to identify influential bloggers and the Normalized Google Distance is used to define the similarity amongst the topics of discussion of influential bloggers. The key advantage of the proposed approach is its ability to automatically detect events which are important in the blogger community.

Connecting Corresponding Identities across Communities

Reza Zafarani, Huan Liu
Conference Papers Third International AAAI Conference on Weblogs and Social Media (ICWSM-09)

Abstract

One of the most interesting challenges in the area of social computing and social media analysis is the so-called community analysis. A well known barrier in cross-community (multiple website) analysis is the disconnectedness of these websites. In this paper, our aim is to provide evidence on the existence of a mapping among identities across multiple communities, providing a method for connecting these websites. Our studies have shown that simple, yet effective approaches, which leverage social media's collective patterns can be utilized to find such a mapping. The employed methods successfully reveal this mapping with 66% accuracy.