Blocking aims to avoid unnecessary comparisons in a data matching pipeline. The heterogeneous nature of Knowledge Graphs (KG) is challenging for blocking approaches that traditionally were implemented for tabular data. While there is vast research on blocking approaches in general, the domain of KGs lacks systematic investigation, especially when comparing embedding-based and symbolic approaches. In this study, we generalize relational blocking, which incorporates neighborhood information of entities, to enable a variety of approaches across the neuro-symbolic spectrum.
The results of our study are three-fold:
(1) The relational enhancements to state-of-the-art approaches significantly improve their results.
(2) (Neuro-)Symbolic approaches can outperform sophisticated deep-learning-based methods in terms of speed and quality.
(3) Hybrid methods that combine symbolic and embedding-based techniques are promising avenues that have not been explored thoroughly yet.
Our experiments were run on 16 real-world datasets of varying sizes with mono- and multi-lingual settings.
We ensure statistical significance with a Bayesian analysis. We release our framework as open-source library.