6. Conclusions and future work
In this paper, we proposed a privacy-preserving content-driven access control mechanism for textual publications in social networks. Contrary to related works [10,11,12], the proposal is content driven in the sense that the semantics of the messages are automatically assessed in order to detect the sensitive information they contain according to the privacy requirements of the publishers. These requirements are defined in general (i.e., an allowed level of disclosure is defined for the different contact types defined in the social network), and the publications whose contents are related to these requirements are automatically protected. To do so, the sensitive information is sanitized and different versions of the publication are generated according to the access level of the readers. Thus, the privacy enforcement is transparent both to the publishers and readers, thus requiring no administrative efforts at the publication time, contrary to most related works [11,12]. In addition, the proposed mechanism is flexible enough to be incorporated in any social network that publishes messages and classifies contacts into categories. As future work, we plan to develop a functional implementation in a real OSN in order to conduct a survey of the usability and utility of the proposed system among social network users. For this purpose, we will engineer the privacy requirements to be considered within the scope of the network. Furthermore, in order to alleviate users from completely specifying their privacy requirements, we will also consider the automatic inference of access control rules according to the social relationships implemented in the social network (e.g., the privacy rules for friends could be same for the friends of friend). At this respect, a machine learning approach [35] can also be considered to semi-automatize the configuration of privacy rules. Finally, as it has been highlighted in the evaluation section, the user’s privacy can also be compromised by the (co-)occurrence of information that is correlated to the sensitive topic to be protected. We are currently working on automatic solutions to address this issue that, in a nutshell, would assess the disclosure that potentially correlated terms may produce for a sensitive one according to their mutual information, which is computed from the information distribution of data in large corpora [33,34]. We plan to incorporate them to the developed system in the near future in order to improve the assessment of privacy risks by detecting correlated terms or term aggregations that may disclose more information about a sensitive topic than the one specified in the privacy rules.