SocialLink

Precomputed alignments between Knowledge Bases and Social Media

The latest version of the SocialLink RDF dataset can be programmatically queried via our publicly available SPARQL endpoint. You can use the query interface below (powered by YASQE and YASR) to query the endpoint using your browser.
This work has been carried out in Future Media Future Media unit () by:
Yaroslav Nechaev

Yaroslav Nechaev

          
Francesco Corcoglioniti

Francesco Corcoglioniti

   
Claudio Giuliano

Claudio Giuliano

   
evaluation
The general idea behind our resource is to create a kind of "bridge" between social media and Linked Open Data (LOD) cloud.

We present SocialLink — a Linked Open Data dataset that matches social media accounts on Twitter to their corresponding entities in DBpedia. This resource creates a bridge between the highly structured Linked Open Data cloud and the vibrant and up-to-date social media world. By aligning around 276K (out of 2.5M) DBpedia persons and organisations to their Twitter profiles, SocialLink serves two purposes. On the one hand, it facilitates social media processing by leveraging DBpedia data, e.g., as a source of ground truth properties for training supervised systems for user profiling, or as contextual data in natural language understanding tasks (e.g., Named Entity Linking) operating on social media contents. On the other hand, SociaLink gives Semantic Web practitioners the ability to populate knowledge bases with up-to-date data from social media accounts of DBpedia entities, such as structured attributes, images, connections, user locations and descriptions.

SocialLink is updated with periodic releases. The code and relevant technical documentation can be found here.

All data on this website is licenced under a Creative Commons Attribution 4.0 International license
The paper is to appear in proceedings of ACM SAC 2017. You can read the preprint version here.

To cite please use the following bibtex entry:
                    
@inproceedings{alignments2017sac,
    author = {Yaroslav Nechaev and Francesco Corcoglioniti and Claudio Giuliano},
    title = {Linking Knowledge Bases to Social Media Profiles},
    booktitle = {Proc. of ACM Symposium on Applied Computing (SAC)},
    year = {2017},
    publisher = {{ACM}},
    abstract = {Social media have become an invaluable source of data for a wide variety of tasks. Unfortunately, this data is hard to gather and process due to low amount of machine readable attributes, API limitations and noisiness. In this paper we propose a system that aligns knowledge base entries of people and organisations to the corresponding social media profiles. The motivation is twofold: (i) on the one hand, we facilitate processing of social media data by allowing the import of rich entity descriptions from knowledge bases; (ii) on the other hand, we are enabling an automatic enrichment of a knowledge base with additional data from the social media. We used this system to create a resource of 893,446 alignments between DBpedia entities and Twitter profiles. This resource allows, effectively, to connect Twitter to the Linked Open Data cloud.},
    url = {http://sociallink.futuro.media}
}
                

You can also check out the slides that briefly describe the idea and the approach:

Releases
Latest release
Version: v2.0
15 May 2017
Gold Standard DBpedia -> Twitter
Version: v2.0
15 May 2017
Previous releases
Gold Standard DBpedia -> Twitter
Version: v1.0
12 December 2016
Alignments DBpedia -> Twitter
Version: v0.6-beta
8 September 2016
Alignments DBpedia -> Twitter
Version: v0.5-beta
30 August 2016
Alignments DBpedia -> Twitter
Version: v0.1-alpha
25 August 2016
Gold Standard DBpedia -> Twitter
Version: v1.0-beta
25 August 2016
Format descriptions
RDF

The RDF format is based on the SocialLink vocabulary (see description there for more info), integrated with terms from FOAF and Dublin Core Terms. Briefly, each DBpedia entity is associated to an RDF fragment similar to the one reported below:

    @prefix dct: <http://purl.org/dc/terms/> .
    @prefix foaf: <http://xmlns.com/foaf/0.1/> .
    @prefix dct: <http://purl.org/dc/terms/> .
    @prefix sl: <http://sociallink.futuro.media/ontology#> .

    <http://dbpedia.org/resource/AC/DC>
            foaf:account <http://twitter.com/acdc>;
            sl:candidate [ sl:account <http://twitter.com/acdc>;
                           sl:rank "0"^^xsd:int;
                           sl:confidence "2.877776745385853"^^xsd:double ];
            sl:candidate [ sl:account <http://twitter.com/AC_DC_Twivia>;
                           sl:rank "1"^^xsd:int;
                           sl:confidence "0.6037499606664944"^^xsd:double ];
            sl:candidate [ sl:account <http://twitter.com/ACDC_BRASIL>;
                           sl:rank "2"^^xsd:int;
                           sl:confidence "0.02792038279804432"^^xsd:double ];
            sl:candidate [ sl:account <http://twitter.com/AC_DC>;
                           sl:rank "3"^^xsd:int;
                           sl:confidence "0.4462205906993879"^^xsd:double ];
            ...
    


Property foaf:account links the DBpedia entity (dbr:AC/DC) to the corresponding Twitter profile (if any) as selected by the SocialLink pipeline. The profile (@acdc) is identified through the URI of the user homepage on Twitter, coherently with FOAF.

Property sl:candidatelinks an entity to all the candidate profiles found by our approach, among which the aligned one was selected. For each candidate, property sl:accountspecifies the corresponding Twitter profile, while properties sl:rank and sl:confidence provide, respectively, the rank of that candidate profile in the candidate list identified by TwitterLink, and the alignment confidence computed by our approach, based on which the aligned profile is chosen (if certain thresholds are met).

For each Twitter profile (note that profiles can appear as candidates for multiple DBpedia entities) we provide additional information as follows:

    <http://twitter.com/acdc> a foaf:OnlineAccount;
        foaf:accountName "acdc"^^xsd:string;
        dct:identifier "2836755090"^^xsd:long.
    

Here, the Twitter account is typed as a foaf:OnlineAccount, property foaf:accountName specifies the Twitter screen name for the profile (actually redundant but mandated by FOAF), while property dct:identifier provides the Twitter unique, immutable, numeric identifier associated to the profile.

JSON file is a single array containing an object for each DBpedia entity with similar structure:

{
  "entity_uri": "http://dbpedia.org/resource/Alex_Gough_(luger)",
  "candidates": [
    344383738,
    615071998,
    14265520,
    23833648,
    ...
  ],
  "scores": [
    1.5554074239158626,
    0.0,
    0.0,
    0.0,
    ...
  ],
  "twitter_id": 344383738
}


Where candidates property contain the list of candidate IDs for each entity, while scores property contains a confidence score for each candidate reported by our candidate selection algorithm.

twitter_id might be present in case a certain threshold is met (thresholds are selected according to the high F1 setup from our paper)

CSV

Finally, for each row of our CSV file contains info about a certain entity. Each row looks like this:

http://dbpedia.org/resource/MoShang,"[6887052,26735153,302784580,1331809652,2275404837,2597365788,1516978014,753046765809508356,1512300530,255873440]","[1.579205048600787,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]",6887052


The columns contain the same data as in JSON format. If the Twitter ID can't be determined — 0 is used in the last column instead.