2008/02/21

On the Submodularity of Influence in Social Networks (Mossel, Roch; STOC 2007)

Why does viral marketing work? How can one start a blog roll? Who is the most influential agents in a social network? How can we maximize our influence without paying too many bloggers?

Given the social graph (or any other underlying graph structure, e.g., hyperlinks or feed abonnements), applying easy heuristics (high degree nodes, short average distance to the rest of the network) can identify important nodes.

The general threshold model by Kempe, Kleinberg, and Tardos states the following: every node has a flag stating whether it is activated or not; then, a node measures/asks/checks the values of its neighbors, applies its activation function, and then, if the computed value is higher than a certain threshold, it gets activated (and stays so). Threshold values are random, activation functions (from subsets of all nodes to the rational) are monotone increasing (no node has a negative effect) and submodular (the sum of the function values of two sets U, V is at least the sum of the function values of intersection and union, i.e., f(U)+f(V)>=f(U [or] V)+f(U [and] V) drawing a Venn diagram helps to understand that, function value is not increasing too much but increase is diminishing [if we could draw the graph of f, it was not convex and the second derivative f''<=0 - is that right?]).
Of course, even the best viral marketing would need to take into account that thresholds and functions might change with time and that some people might have a negative effect.

The paper deals with the fourth question from above, how to maximize the influence. Say influence works in rounds and starts with a set S of activated nodes. At some point the number of activated nodes does not change any more (in each round at least one node gets activated or the process stops - due to this model). The number of activated nodes sigma(S) depends on the number of initially activated nodes and on their locations (i.e., S). The algorithm (of a viral marketing tool or software) tries to find this initial set of minimal size. It has been shown that even approximation is NP hard (Kempe et al. 2003). However, if sigma was submodular too (the union of sets does not give a blast to your Guerilla marketing campagin), there is hope to find such an approximate set of fixed size.
Mossel and Roch show that sigma is submodular for this process.

Long story. And the proof is still to come.
Submodular means s(A)+s(B)>=s(A [or] B)+s(A [and] B) (variable names got changed, want to prove it for sigma (s for short here)). Coupling is a technique from statistics to analyze random variables, often used to analyze independent random variables coupled to dependent random variables, which are then analyzed instead.

tbctd

@inproceedings{MosselRoch07,
author = {Elchanan Mossel and
S{\'e}bastien Roch},
title = {On the submodularity of influence in social networks},
booktitle = {Proceedings of the 39th Annual ACM Symposium on Theory of
Computing, San Diego, California, USA, June 11-13, 2007},
year = {2007},
pages = {128--134}
}

2008/02/01

Efficient algorithms for maximum lifetime data gathering and aggregation in wireless sensor networks (Kalpakis, Dasgupta, Namjoshi; 2003)

The maximum lifetime data gathering with aggregation (MLDA) problem is defined as follows:
Given a collection of sensors and a base station, together with their locations and the energy of each sensor, find a data gathering schedule with maximum lifetime, where sensors are permitted to aggregate incoming data packets.
From a schedule telling when to send which packets we get a flow network with edge capacities being the total number of packets sent through that edge in the schedule. If a schedule has lifetime T (i.e., the network survives for at least T rounds), for each sensor the maximum flow to the base station is at least T as each of the T packets has to eventually reach the base station.
The idea is to find such a flow network (a so-called admissible flow network) and then to construct the schedule (a collection of directed aggregation trees, one for each round) from it.

@article{KDN2003,
author = {Konstantinos Kalpakis and Koustuv Dasgupta and Parag Namjoshi},
title = {Efficient algorithms for maximum lifetime data gathering and aggregation in wireless sensor networks},
journal = {Comput. Networks},
volume = {42},
number = {6},
year = {2003},
pages = {697--716},
}

On the Construction of a Maximum-Lifetime Data Gathering Tree in Sensor Networks: NP-Completeness and Approximation Algorithm (Wu, Fahmy, Shroff 2008)

Data aggregation (often called data fusion) is very important in sensor networks in order to gather data from sensors as long as possible.

Wu et al. considers aggregation functions, where (several) packets of size B can be aggregated into one packet of size B, this holds for various functions such as MAX, MIN, AVG, etc. (Mitchell's book describes various models and more complicated data fusion very well)
Therefore, in a (rooted) aggregation tree, any node waits for and receives all values from its children, aggregates it with its own value, and sends the new aggregate to its parent node. Each node sends one packet to its parent and receives one packet from each of its children. If all nodes have the same life-time (a reasonable assumption), the maximum number of children should be minimized (possibly while maintaining a small height of the tree). However, the problem of computing the spanning tree with smallest maximum degree is NP hard. Wu et al. reuses ideas from Furer and Raghavachari (1994)'s algorithm for Steiner trees. (If the Steiner version is considered, only vertices from a distinguished set have to be covered.) Step by step, the algorithm tries to improve the situation for bottleneck nodes (nodes with high degree) without worsening the situation for nodes almost being bottlenecks as follows: iterate through edges and add an edge {u,v} if both u and v are neither bottleneck nor endangered and if within the cycle that appears when adding the edge there is a bottleneck node w, which will then benefit by removing one if its adjacent edges (otherwise, w is said to be blocked by u and v).

This paper is going to be presented at this year's INFOCOM in April (I found it while googling maximum lifetime data aggregation).