Thèse de doctorat
Résumé : The popularity of location empowered devices such as GPS enabled smart-phones has immensely amplified the use of location-based services in social networks. This happened by allowing users to share Geo-tagged contents such as current locations/check-ins with their social network friends. These location-aware social networks are called Location-based Social Networks (LBSN), and examples include Foursquare and Gowalla. The data of LBSNs are being used for providing different kinds of services such as the recommendation of locations, friends, activities, and media contents, and the prediction of user's locations. To provide such services, different queries are utilized that exploit activity/check-in data of users. Usually, LBSN data is divided into two parts, a social graph that encapsulates the friendships of users and an activity graph that maintains the visit history of users at locations. Such a data separation is scalable enough for processing queries that directly utilize friendship information and visit history of users. These queries are called user and activity analytic queries. The visits of users at locations create relationships between those locations. Such relationships can be built on different features such as common visitors, geographical distance, and mutual location categories between them. The process of analysing such relationships for optimizing location-based services is termed Location Analytics. In location analytics, we expose the subjective nature of locations that can further be used for applications in the domain of prediction of visitors, traffic management, route planning, and targeted marketing.In this thesis, we provide a general LBSN data model which can support storage and processing of queries required for different applications, called location analytics queries. The LBSN data model we introduce, segregates the LBSN data into three graphs: the social graph, the activity graph, and the location graph. The location graph maintains the interactions of locations among each other. We define primitive queries for each of these graphs. In order to process an advanced query, we express it as a combination of these primitive queries and process them on corresponding graphs in parallel. We further provide a distributed data processing framework called GeoSocial-GraphX (GSG). GSG implements the aforementioned LBSN data model for efficient and scalable processing of the queries. We further exploit the location graph for providing novel location analytics queries in the domain of influence maximization and visitor prediction. We introduce a notion of location influence. Such influence can capture the interactions of locations based on their visitors and can be used for propagation of information between them. The applications of such a query lie in the domain of outdoor marketing, and simulation of virus and news propagation. We also provide a unified system IMaxer that can evaluate and compare different information propagation mechanisms. We further exploit the subjective nature of locations by analysing the mobility behaviour of their visitors. We use such information to predict the individual visitors as well as the groups of visitors (cohorts) in future for those locations. The prediction of visitors can be used for better event planning, traffic management, targeted marketing, and ride-sharing services.In order to evaluate the proposed frameworks and approaches, we utilize data from four real-life LBSNs: Foursquare, Brightkite, Gowalla, and Wee Places. The detailed LBSN data mining and statistically significant experimental evaluation results show the effectiveness, efficiency, and scalability of our proposed methods. Our proposed approaches can be employed in real systems for providing life-care services.