SocialLink

Precomputed alignments between Knowledge Bases and Social Media

SocialLink establishes a link between DBpedia and Twitter, centered on popular entities occurring in both of them, which enables transferring knowledge from one resource to another and back. On this page, we describe three example use cases where this capability can be leveraged.

DBpedia to Twitter: User Profiling. The task of inferring users attributes based on their digital footprint is typically referred to as user profiling. Prediction of various attributes based on a person's social graph, posted content or other attributes is popular among researchers and companies. However, in most setups, namely supervised machine learning-based ones, user profiling requires significant amounts of manual labour to construct training sets. This both limits the possible attributes that can be inferred and the applicability of approaches operating on large amounts of training data, such as Deep Neural Networks.

Recently, researchers focused on automatic crawling of user profiling datasets from social media. However, even the largest datasets only contain few thousands examples per property and are limited to properties explicitly present in social media.

SocialLink helps social media researchers to tackle user profiling by providing accurate machine-readable descriptions for hundreds of thousands of social media profiles. Any attribute that is present in DBpedia can now be modeled without relying on expensive manual annotation and our resource can be used both to train and to evaluate any proposed attribute classifiers.

Another simple example is the inference of user interests based on a social graph. Imagine a user that follows a set of accounts that have alignments in SocialLink. By using this information, one can try to model interests of this person, his/her location and language just by looking at the DBpedia properties of the accounts that she/he likes. For instance, following dbr:SpaceX and dbr:NASA can point on a dbr:Aerospace_engineering industry fan, while the increased amount of dbr:Donald_Trump-related tweets can reveal a dbr:GOP supporter.

DBpedia to Twitter: Entity Linking. Another use case is the Named Entity Linking (NEL) task, whose goal is to link mentions of named entities in a text to their corresponding entities in a KB such as DBpedia. Challenging on its own, the NEL task presents additional unique challenges when applied to social media posts due to noisiness, lack of sufficient textual context and their informal nature (e.g., use of slang).

It is worth noticing that social media posts typically contain explicit mentions of social media accounts in the form of @username snippets. When referring to Twitter, some of these mentions (especially the ones referring to popular accounts) may appear in SocialLink, and thus can be directly disambiguated to DBpedia with high precision using our resource. Apart being part of the NEL result, these links provide additional contextual information (injected from DBpedia) that can be leveraged for disambiguating other named entities occurring in the post.

SocialLink was used in this capacity by two teams participating to a NEL challenge on Italian tweets (NEEL-IT task) as part of the EVALITA 2016 campaign, allowing both of them to improve their results.

It is worth noting that the two-step approach of the SocialLink pipeline can be adapted to directly disambiguate named entities in regular texts against the social media. Such functionality is present in the Social Media Toolkit that is among the complementary tools available on the SocialLink website.

Twitter to DBpedia: Extracting FOAF Profiles. Up-to-date information about DBpedia persons and organisations can be extracted from Twitter after an alignment is established through SocialLink. Focusing on persons, different user profile properties expressible with FOAF may be extracted from a DBpedia person's Twitter account. They include:

  • Basic properties like foaf:name, foaf:surname, foaf:gender, foaf:birthday, and foaf:depiction linking to user images rather scarce in Wikipedia/DBpedia, often available in Twitter user descriptions and profile metadata;
  • Acquaintances (foaf:knows), extracted from friends, followers and Twitter accounts a user interacted with that are aligned to DBpedia entities in SocialLink
  • Links to homepages (foaf:homepage and similar) and other web resources from a Twitter user description and posts, that can be matched to external links in DBpedia to mine relations with other DBpedia entities (e.g., affiliation, authorship, participation, all expressible in FOAF), which may be possibly disambiguated using natural language understanding techniques.

While a basic FOAF profile can be extracted from any Twitter account, the links to DBpedia provided by SocialLink allow grounding the extracted data and disambiguating the values of object properties with respect to a larger knowledge base, this way increasing the usefulness of extracted FOAF profiles.

The latest version of the SocialLink RDF dataset can be programmatically queried via our publicly available SPARQL endpoint. You can use the query interface below (powered by YASQE and YASR) to query the endpoint using your browser.
This work has been carried out in Future Media Future Media unit () by:
Yaroslav Nechaev

Yaroslav Nechaev

          
Francesco Corcoglioniti

Francesco Corcoglioniti

   
Claudio Giuliano

Claudio Giuliano

   
The general idea behind our resource is to create a kind of "bridge" between social media and Linked Open Data (LOD) cloud.

We present SocialLink — a Linked Open Data dataset that matches social media accounts on Twitter to their corresponding entities in DBpedia. This resource creates a bridge between the highly structured Linked Open Data cloud and the vibrant and up-to-date social media world. By aligning around 276K (out of 2.5M) DBpedia persons and organisations to their Twitter profiles, SocialLink serves two purposes. On the one hand, it facilitates social media processing by leveraging DBpedia data, e.g., as a source of ground truth properties for training supervised systems for user profiling, or as contextual data in natural language understanding tasks (e.g., Named Entity Linking) operating on social media contents. On the other hand, SociaLink gives Semantic Web practitioners the ability to populate knowledge bases with up-to-date data from social media accounts of DBpedia entities, such as structured attributes, images, connections, user locations and descriptions.

SocialLink is updated with periodic releases. The code and relevant technical documentation can be found in our github repository:

Remper/sociallink 

Instructions on how to run the SocialLink library are available here. Issue tracker is located here. Please refer to download page to download the resource or use the DOI link below to get the latest release.
Contributions and suggestions are always welcome!

All data on this website is licenced under a Creative Commons Attribution 4.0 International license

To read

  PRAI 2018 paper
The most recent paper covering the latest version (v3.0) of the SocialLink approach and resource published in PRAI journal

  ISWC 2017 paper
The previous version (v2.0) of the SocialLink resource published at ISWC 2017

  SAC 2017 paper
The detailed description of the original version of our linking algorithm published at ACM SAC 2017

  ISWC 2017 talk
Our talk recorded at ISWC 2017

To cite

To cite the resource or the approach please use the following bibtex entry:
                    
@article{sociallink2018prai,
  author    = {Yaroslav Nechaev and
               Francesco Corcoglioniti and
               Claudio Giuliano},
  title     = {SocialLink: exploiting graph embeddings to link DBpedia entities to
               Twitter profiles},
  journal   = {Progress in {AI}},
  volume    = {7},
  number    = {4},
  pages     = {251--272},
  year      = {2018}
}

                

Canonical citations (DOIs) for the dataset are available via Zenodo Zenodo (all versions) and Springer Nature (version v2.0) digital repositories (see DOI links in the footer).

Releases
Latest release
Gold Standard DBpedia -> Twitter
Version: v2.0
15 May 2017
Previous releases
Gold Standard DBpedia -> Twitter
Version: v1.0
12 December 2016
Alignments DBpedia -> Twitter
Version: v0.6-beta
8 September 2016
Alignments DBpedia -> Twitter
Version: v0.5-beta
30 August 2016
Alignments DBpedia -> Twitter
Version: v0.1-alpha
25 August 2016
Gold Standard DBpedia -> Twitter
Version: v1.0-beta
25 August 2016
Format descriptions
RDF

The RDF format is based on the SocialLink vocabulary (see description there for more info), integrated with terms from FOAF and Dublin Core Terms. Briefly, each DBpedia entity is associated to an RDF fragment similar to the one reported below:

    @prefix dct: <http://purl.org/dc/terms/> .
    @prefix foaf: <http://xmlns.com/foaf/0.1/> .
    @prefix dct: <http://purl.org/dc/terms/> .
    @prefix sl: <http://sociallink.futuro.media/ontology#> .

    <http://dbpedia.org/resource/AC/DC>
            foaf:account <http://twitter.com/acdc>;
            sl:candidate [ sl:account <http://twitter.com/acdc>;
                           sl:rank "0"^^xsd:int;
                           sl:confidence "2.877776745385853"^^xsd:double ];
            sl:candidate [ sl:account <http://twitter.com/AC_DC_Twivia>;
                           sl:rank "1"^^xsd:int;
                           sl:confidence "0.6037499606664944"^^xsd:double ];
            sl:candidate [ sl:account <http://twitter.com/ACDC_BRASIL>;
                           sl:rank "2"^^xsd:int;
                           sl:confidence "0.02792038279804432"^^xsd:double ];
            sl:candidate [ sl:account <http://twitter.com/AC_DC>;
                           sl:rank "3"^^xsd:int;
                           sl:confidence "0.4462205906993879"^^xsd:double ];
            ...
    


Property foaf:account links the DBpedia entity (dbr:AC/DC) to the corresponding Twitter profile (if any) as selected by the SocialLink pipeline. The profile (@acdc) is identified through the URI of the user homepage on Twitter, coherently with FOAF.

Property sl:candidatelinks an entity to all the candidate profiles found by our approach, among which the aligned one was selected. For each candidate, property sl:accountspecifies the corresponding Twitter profile, while properties sl:rank and sl:confidence provide, respectively, the rank of that candidate profile in the candidate list identified by TwitterLink, and the alignment confidence computed by our approach, based on which the aligned profile is chosen (if certain thresholds are met).

For each Twitter profile (note that profiles can appear as candidates for multiple DBpedia entities) we provide additional information as follows:

    <http://twitter.com/acdc> a foaf:OnlineAccount;
        foaf:accountName "acdc"^^xsd:string;
        dct:identifier "2836755090"^^xsd:long.
    

Here, the Twitter account is typed as a foaf:OnlineAccount, property foaf:accountName specifies the Twitter screen name for the profile (actually redundant but mandated by FOAF), while property dct:identifier provides the Twitter unique, immutable, numeric identifier associated to the profile.

JSON file is a single array containing an object for each DBpedia entity with similar structure:

{
  "entity_uri": "http://dbpedia.org/resource/Alex_Gough_(luger)",
  "candidates": [
    344383738,
    615071998,
    14265520,
    23833648,
    ...
  ],
  "scores": [
    1.5554074239158626,
    0.0,
    0.0,
    0.0,
    ...
  ],
  "twitter_id": 344383738
}


Where candidates property contain the list of candidate IDs for each entity, while scores property contains a confidence score for each candidate reported by our candidate selection algorithm.

twitter_id might be present in case a certain threshold is met (thresholds are selected according to the high F1 setup from our paper)

CSV

Finally, for each row of our CSV file contains info about a certain entity. Each row looks like this:

http://dbpedia.org/resource/MoShang,"[6887052,26735153,302784580,1331809652,2275404837,2597365788,1516978014,753046765809508356,1512300530,255873440]","[1.579205048600787,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]",6887052


The columns contain the same data as in JSON format. If the Twitter ID can't be determined — 0 is used in the last column instead.