The notes of paper reading these 2 days.
1. definition
sybil: fake accounts to get more profit.
colluding: small set of entities reciprocally boosting.
2. natural solutions to the sybil&colluding attacks to PageRank
(a) identifying groups of colluding nodes.
construct a graph directed or non-directed, find the MINIMUM QUOTIENT CUT/ Minimum Flux Cut.
Approximable within O(log|V|), NP-Hardness.
(b) identifying individual colluders by using detailed return time statistics from the PageRank random walk.
The return time statistics for the colluding nodes are nearly indistinguishable from those for an "honest" node.
3. Hints from the Random Walk model
Given a directed graph, a random walk W starts its journey on any node with the same probability. At the current node x, with probabilitu (1-E) W jumps to one of the nodes that have links from x, and with probability E, W decides to restart (reset) its journey and again choose any node in the graph with the same probability.
The stationary probability that W is on node x is called the PageRank value of x, and all nodes are ordered based on the PageRank value.
Rank denotes the ordering. The node with the largest PR weight is ranked first.
PageRank:一个页面被访问的固定概率(不同的页面被访问的概率E是不同的)
Rank:页面的排序
->The observation: colluding nodes must suffer a significant drop in PageRnk as E increases.
->solution: We expect the stationary weight of colluding nodes to be highly correlated with 1/E and that of non-colluding nodes to be relatively insensitive to changes in E.
4. an adaptive-resetting heuristic
The heuristic identifies nodes which have a high correlation with 1/E and increases the reset probability for thost nodes- this diminishes the ability of colluding nodes to stall the random walk.
**two phases:
(1) Collusion detection
(a) Given the topology, calculate the PR weight vector under different E values.
(b) Calculate the correlation coefficient between the curve of each nodes x's PR weight and the curve of 1/E. co-co(x)=.....
(2) E Personalization
(a) node x's outlink personalized-E = F (Edefault, co-co(x))
(b) The algorithm is repeated with these personalized-E values.
5. references
[1] Improving Eigenvector-based reputation systems against collusions (NetEcon'06)
[2] Manipulability of PageRank under Sybile Strategies (Cornel Univ.)
[3] SybilGuard: Defending Against Sybil Attacks via Social Networks (SIGCOMM'06)
[4] Deeper inside PageRank
[5] Web Spam Taxonomy
[6] http://www.ensta.fr/~diam//ro/online/viggo_wwwcompendium/node97.html
-----------------------------------------------------------------------------------------
总结:道高一尺魔高一丈!上有政策,下有对策
攻击是针对某应用的某功能的,所以要解决sybil + collusion就要针对具体的攻击方案。
一种通用的方案是诱人的,但是是不实际的。 Good Luck!