top of page

Other publications

​

We present a method of automatically linking several data sets on companies based on supervised machine learning. We employ this method to perform a record linkage of several company datasets used for research and analytical purposes at the Deutsche Bundesbank. The record linkage process involves comprehensive data pre-processing, blocking/indexing, construction of comparison features, training and testing of a supervised match classification model as well as post-processing to produce a company identifier mapping table for all internal and public company identifiers found in the data. The evaluation of our linkage method shows that the process yields precise match predictions with a sufficiently high coverage/recall to make full automation of company data linkage feasible for typical use cases in research and analytics

​

We analyze overlaps between various company datasets, building on the results of the company data record linkage by Gabor-Toth, Schild, and Walter (2023) and Gabor-Toth and Schild (2023). To better understand the data overlaps, we also briefly describe the input data for this linkage, in particular with respect to data universes and time periods covered by the data. We report descriptive statistics that characterize the overlaps found between the company data. The overlaps are discussed and interpreted with reference to properties of the input data and of the record linkage process.

​

​

bottom of page