Top authors in the big4

It has been one year since I wrote any blogs, how time flies. Recently, I read Li Li’s project on who the top researchers are in software engineering and security field, so I want to do a similar one on security myself.

As a computer security researcher, your goal is to get your paper in the big4, i.e., CCS, Oakland, Usenix and NDSS. Fortunately (in some sense…), the deadlines of the big4 are distributed sort of uniformly  through the whole year so you basically need to keep on working every day.

Every year by checkin the authors of the big4, you can clearly get a clue on who are the top guys in computer security research. So let’s talk about the real thing, who are the top researchers in the big4?

Getting the data.

To collect the data, the best source is DBLP. So I wrote a very simple python script to naively crawl the authors’ information from the big4’s pages in DBLP from 2010 to 2016, and embarrassingly parse the html file to find the authors’ name. I have no knowledge about HTML and XML, so the embarrassing part is to read the html source myself to find where the authors names are…….It took me a while (about half an hour) but after that everything goes much easier (of course, this is the only challenging part…..).


We aim to answer three questions with the data we have collected…….oh no, it is just a blog so let’s check the data in a spontaneous way.

Here are the top 20 researchers with most publications in the big4.

Christopher Kruegel 37
Giovanni Vigna 33
Wenke Lee 31
XiaoFeng Wang 30
Michael Backes 25
Elaine Shi 24
Vern Paxson 24
Thorsten Holz 23
Thomas Ristenpart 23
Ahmad-Reza Sadeghi 23
David Brumley 21
Dan Boneh 21
Dawn Song 21
Prateek Saxena 20
Engin Kirda 20
Ari Juels 18
Stefan Savage 18
Angelos D. Keromytis 18
Damon McCoy 17
Jonathan Katz 17

Not surprisingly, most of these researchers are in the US such as the top2 from UCSB, meanwhile, Germany is a also a powerful player, I have seen at least 2 researchers from German institutions.

Now let’s check the top10 for each conference.


Wenke Lee 9
Engin Kirda 8
Giovanni Vigna 8
Christopher Kruegel 8
Yan Chen 6
Dawn Song 6
Xiangyu Zhang 6
Prateek Mittal 6
Nikita Borisov 6
Dongyan Xu 6


XiaoFeng Wang 14
Elaine Shi 12
Ahmad-Reza Sadeghi 12
Gilles Barthe 11
Michael Backes 10
Ari Juels 10
Wenke Lee 10
Christopher Kruegel 10
Thomas Ristenpart 9
Michael K. Reiter 9


Christopher Kruegel 14
Giovanni Vigna 13
Thomas Ristenpart 9
Wenke Lee 8
Damon McCoy 8
Stefan Savage 7
J. Alex Halderman 7
Michael Backes 7
Vern Paxson 7
David Brumley 7


XiaoFeng Wang 10
Elie Bursztein 8
Thorsten Holz 7
Adrian Perrig 7
Vern Paxson 7
Cedric Fournet 6
Engin Kirda 6
Bryan Parno 6
John C. Mitchell 6
Rui Wang 5

It seems that the rankings are quite different for different conferences, so this shows the differences among the big4 w.r.t. research fields.

Now let’s show some trend, we plot the number of publications as a function of every year for the top 5 authors. It seems that most of them have increased their big4 papers.


Another thing I’m interested in is how many authors in one big4 paper, so I calculate the average number of authors, and the result is 4.26. If you check by conference,  the number would be 4.18 for NDSS, 4.09 for CCS, 4.49 for Usenix and 4.54 for Oakland. Of course, we need some baseline model to check whether this number is high or low, so let’s crawl WWW and KDD, the two prestigious conferences in data mining community. It turns out that the average number of authors for WWW is 3.78 and for KDD is 4.01. What this means is that the big4 indeeds involves more authors but the number is not significantly high compared to other top conferences.

Now let’s see the trend, it turns out for the past 6 years, the average number of authors for big4 increase quite a lot from 4 to 4.5, more than 10%, this shows a path on how the security community is shifting.



Yes, this is a blog, I don’t have a general conclusion.

If you are interested, the naive python code can be downloaded here.

PS: Merry Xmas and Happy New Year!

Notes on CSS WS 2015

I have participated computational social science winter symposium (#CSSWS15) at Cologne for the past three days.
It is organized by GESIS located in Cologne.
#CSSWS15 invites a lot of superstars to give great talks, half of the participants are with socialogy backgrounds.
I keep some notes here about #CSSWS15.

1) Professor Sune Lehmann’s work on mobility is impresive.
Nice visualization, clear motivation and solid methodology.
The idea of using a user’s cores to predict his location is quite similar to our work.
The difference is that the cores of a user is discovered through mobility profile in Lehmann’s work
while we find cores (communities) through community detection on social graph.

2) Professor Lehmann also presents that time itself can be a good indicator on community detection.
This means if we simply observe users’ behaviors at differnt timestamp and
construct a sub social network out of it, then the community is naturally discovered.
Users’ mobility behaviors are a strong evidence for this kind of task
but the location data from location-based social networks might be too sparse.
Only Lehmann’s dataset can support that kind of idea.
I mean a project which provides 1,000 smartphones for freshman of DTU really astonished me.
No data can be better than this in academia.

3) Sensor map from is by far the coolest data analysis project I have ever seen.
In fact this is the main reason I participate #CSSWS15.

The goodcitylife team starts by discussing how we understand food with 5 difference senses
and map it to how we think about the city we live in.
For “see”, they publish a work on the happiest route to the destination.
For “hear”, they will (?) publish a work on analyzing urban sound.
For “smell”, they publish a work on digitalize the urban smell map.
This one is really cool and it has never been touched by anyone before.
For “feel”, psycological map of a city is brilliant.

I guess they will publish at least one more work on how we “taste” the city.
It is very interesting to understand where restaurants are located
and how people feel about the food in each area.

The general idea of computational social science is to bring social good to people (in my opinion).
Many speakers have talked about how to use social media to fight againest mental illness, depression, etc.
Therefore, I believe that to be a good computational social scientist
or start a great computational social science project,
one has to be really motivated to help others with their brilliant minds and hard working.
I know this last sentence is weird, it is easier to express it in Mandarin:

data of location-based social networks

To mine a social network, the most important thing is the data.(it is only my opinion…)

I have been mining locaiton-based social network for a while. I have used other people’s data and I also collect data myself. With some experience, I think I can summarize a bit about the area.

First of all, I’m just a student from a univeristy most of people never heard of, there is no way for me to get high quality data from big commpanies like Facebook, Twitter and Instagram.

There are two ways to overcome this, the first one is using other researchers’ data. In the world of location-based network, there are mainly two famous baseline datasets. One is from SNAP Stanford where Prof. Leskovec and his students collect users’ checkin data from Gowalla and Brightkite. Even though these two companies are both dead, but the data is quite valuable. The other one is from Texas A&M, Dr. Zhiyuan Cheng collect about 5 months geo-tagged tweets from Twitter, the size of the dataset is about 10M. Both of datasets are at a global level and have been used a lot by researchers (see their papers’ citation number).

The second way is to collect data by ourselves. Now, most of social network publish their API that allows every user to extract data. This is where a PhD student should cut in…………..

Twitter. Twitter has a very strong API and allow you to extract real time tweets with a geo bounding box. I have been using this API for about five months, it indeed gives you a lot of data. BTW, I discovered that if you choose your searching area at a city level, you will get much more data in that city compared to the global level. We have about 3M geo-tagged tweets in New York while the whole global Gowalla dataset is about 6M. But this tip doesn’t matter anymore. Since April 27th, 2015, Twitter modified (or improved in their claim) its service, now when you decides to share a geo-tagged tweet, you can directly choose the exact venue (data from Foursquare), share the exact lat/lng is optional. Therefore, Twitter’s API cannot extract that many geo-tagged tweets. After that April 27, I can only get 10% of the data I used to get before. So Twitter’s way is dead. Luckily and sadly, I get the last five months data….

Foursquare. I have never used Foursquare’s API to collect user checkins since I don’t think it is a general social network service. But I do use it a lot to query a location’s information, including name, category, rating, tips and so on…. Foursquare indeed has an excellent API with not many rating limit, I barely need to wait though the whole sliding window to get the data.

Instagram. More recently, I start to use Instagram to collect data, it is awsome. I only use the static REST API to query check-ins from Instagram, since I haven’t figured out how to use its Streaming API. I discover that Instagram gives you more recent data than old ones. Therefore, to get more data, it is better query a city multiple times a day. In New York, I can get about 20k check-ins a day. Instagram’s policy on API is also quite geneours, the only thing that bothers me is querying friends, the API only gives you about 50 friends in one page, therefore, you need to query many pages to extract a user’s friends. But some user indeed has a lot of friends…….

That’s all I know about these APIs, I mainly focus on instagram nowadays. Hope they won’t change that much in the future like Twitter.

Urban informatics rocks!

Busy end of the year

2015 is about to finish, and I’m in an extreme busy mode. (Well, that’s why I decide to write another blog? )

Life is better compared to this time last year when I was alone in Luxembourg and didn’t to know what to do in the future. Well, it still doesn’t change that much, Marcela is still in Italy and I still don’t know what to do in the future. The only difference is now I have to reverse a linked list or write a stack…… Life is hard, but everything will be better when the new spring comes. That’s what I told myself last winter, now, I am still telling myself. I guess I start to dislike the winter in Europe where there is no sun.

Anyhow, I guess I will go to GESIS winter symposium, since I’m sort of dry and need a recharge. Anyone want to join?

University of Luxembourg is ranked 193 in THE

The Times Higher Education (THE) Rankings believes that University of Luxemboug is the 193th (98th) best university in the whole world (Europe).

This is a very exciting news for people in UL (me included) and an interesting news for me. UL is right above Texas A&M and 4 places under ASU. The reason for UL to hold this position is mainly due to the international factor (No.2 in the whole world behind Qatar University).

Luxembourg is indeed an international city and the university is very international as well, most of my colleagues are not from Luxembourg. Personally, I don’t think international outlook should be considered as a very important factor to judge a university, but to further contribute to the ranking next year, I decide to find more international collaborators..XD