Precomputed alignments between Knowledge Bases and Social Media
SocialLink establishes a link between DBpedia and Twitter, centered on popular entities occurring in both of them, which enables transferring knowledge from one resource to another and back. On this page, we describe three example use cases where this capability can be leveraged.
DBpedia to Twitter: User Profiling. The task of inferring users attributes based on their digital footprint is typically referred to as user profiling. Prediction of various attributes based on a person's social graph, posted content or other attributes is popular among researchers and companies. However, in most setups, namely supervised machine learning-based ones, user profiling requires significant amounts of manual labour to construct training sets. This both limits the possible attributes that can be inferred and the applicability of approaches operating on large amounts of training data, such as Deep Neural Networks.
Recently, researchers focused on automatic crawling of user profiling datasets from social media. However, even the largest datasets only contain few thousands examples per property and are limited to properties explicitly present in social media.
SocialLink helps social media researchers to tackle user profiling by providing accurate machine-readable descriptions for hundreds of thousands of social media profiles. Any attribute that is present in DBpedia can now be modeled without relying on expensive manual annotation and our resource can be used both to train and to evaluate any proposed attribute classifiers.
Another simple example is the inference of user interests based on a social graph. Imagine a user that follows a set of accounts that have alignments in SocialLink. By using this information, one can try to model interests of this person, his/her location and language just by looking at the DBpedia properties of the accounts that she/he likes. For instance, following dbr:SpaceX and dbr:NASA can point on a dbr:Aerospace_engineering industry fan, while the increased amount of dbr:Donald_Trump-related tweets can reveal a dbr:GOP supporter.
DBpedia to Twitter: Entity Linking. Another use case is the Named Entity Linking (NEL) task, whose goal is to link mentions of named entities in a text to their corresponding entities in a KB such as DBpedia. Challenging on its own, the NEL task presents additional unique challenges when applied to social media posts due to noisiness, lack of sufficient textual context and their informal nature (e.g., use of slang).
It is worth noticing that social media posts typically contain explicit mentions of social media accounts in the form of @username snippets. When referring to Twitter, some of these mentions (especially the ones referring to popular accounts) may appear in SocialLink, and thus can be directly disambiguated to DBpedia with high precision using our resource. Apart being part of the NEL result, these links provide additional contextual information (injected from DBpedia) that can be leveraged for disambiguating other named entities occurring in the post.
SocialLink was used in this capacity by two teams participating to a NEL challenge on Italian tweets (NEEL-IT task) as part of the EVALITA 2016 campaign, allowing both of them to improve their results.
It is worth noting that the two-step approach of the SocialLink pipeline can be adapted to directly disambiguate named entities in regular texts against the social media. Such functionality is present in the Social Media Toolkit that is among the complementary tools available on the SocialLink website.
Twitter to DBpedia: Extracting FOAF Profiles. Up-to-date information about DBpedia persons and organisations can be extracted from Twitter after an alignment is established through SocialLink. Focusing on persons, different user profile properties expressible with FOAF may be extracted from a DBpedia person's Twitter account. They include:
While a basic FOAF profile can be extracted from any Twitter account, the links to DBpedia provided by SocialLink allow grounding the extracted data and disambiguating the values of object properties with respect to a larger knowledge base, this way increasing the usefulness of extracted FOAF profiles.
@article{sociallink2018prai,
author = {Yaroslav Nechaev and
Francesco Corcoglioniti and
Claudio Giuliano},
title = {SocialLink: exploiting graph embeddings to link DBpedia entities to
Twitter profiles},
journal = {Progress in {AI}},
volume = {7},
number = {4},
pages = {251--272},
year = {2018}
}
Canonical citations (DOIs) for the dataset are available via Zenodo Zenodo (all versions) and Springer Nature (version v2.0) digital repositories (see DOI links in the footer).
The RDF format is based on the SocialLink vocabulary (see description there for more info), integrated with terms from FOAF and Dublin Core Terms. Briefly, each DBpedia entity is associated to an RDF fragment similar to the one reported below:
@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix sl: <http://sociallink.futuro.media/ontology#> .
<http://dbpedia.org/resource/AC/DC>
foaf:account <http://twitter.com/acdc>;
sl:candidate [ sl:account <http://twitter.com/acdc>;
sl:rank "0"^^xsd:int;
sl:confidence "2.877776745385853"^^xsd:double ];
sl:candidate [ sl:account <http://twitter.com/AC_DC_Twivia>;
sl:rank "1"^^xsd:int;
sl:confidence "0.6037499606664944"^^xsd:double ];
sl:candidate [ sl:account <http://twitter.com/ACDC_BRASIL>;
sl:rank "2"^^xsd:int;
sl:confidence "0.02792038279804432"^^xsd:double ];
sl:candidate [ sl:account <http://twitter.com/AC_DC>;
sl:rank "3"^^xsd:int;
sl:confidence "0.4462205906993879"^^xsd:double ];
...
Property foaf:account
links the DBpedia entity (dbr:AC/DC
) to
the corresponding Twitter profile (if any) as selected by the SocialLink pipeline. The profile
(@acdc
) is identified through the URI of the user homepage on Twitter, coherently with FOAF.
Property sl:candidate
links an entity to all the candidate profiles found
by our approach, among which the aligned one was selected. For each candidate, property
sl:account
specifies the corresponding Twitter profile, while properties
sl:rank
and sl:confidence
provide, respectively, the rank of that
candidate profile in the candidate list identified by TwitterLink, and the alignment
confidence computed by our approach, based on which the aligned profile is chosen
(if certain thresholds are met).
For each Twitter profile (note that profiles can appear as candidates for multiple DBpedia entities) we provide additional information as follows:
<http://twitter.com/acdc> a foaf:OnlineAccount;
foaf:accountName "acdc"^^xsd:string;
dct:identifier "2836755090"^^xsd:long.
Here, the Twitter account is typed as a foaf:OnlineAccount
, property foaf:accountName
specifies the Twitter screen name for the profile (actually redundant but mandated by FOAF), while property
dct:identifier
provides the Twitter unique, immutable, numeric identifier associated to the profile.
JSON file is a single array containing an object for each DBpedia entity with similar structure:
{
"entity_uri": "http://dbpedia.org/resource/Alex_Gough_(luger)",
"candidates": [
344383738,
615071998,
14265520,
23833648,
...
],
"scores": [
1.5554074239158626,
0.0,
0.0,
0.0,
...
],
"twitter_id": 344383738
}
Where candidates
property contain the list of candidate IDs for each entity,
while scores
property contains a confidence score for each candidate
reported by our candidate selection algorithm.
twitter_id
might be present in case a certain threshold is met (thresholds are
selected according to the high F1 setup from our paper)
Finally, for each row of our CSV file contains info about a certain entity. Each row looks like this:
http://dbpedia.org/resource/MoShang,"[6887052,26735153,302784580,1331809652,2275404837,2597365788,1516978014,753046765809508356,1512300530,255873440]","[1.579205048600787,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]",6887052
The columns contain the same data as in JSON format. If the Twitter ID can't be determined — 0 is used in the last column instead.