Methodology for matching financial and patents databases
The priority patent portfolio of a given group is defined as the collection of the priority patents applied for by its “Global Ultimate Owner” (GUO) and by all its consolidated subsidiary companies - i.e those in which the GUO has a total participation higher or equal to 50,01%. The implementation of this rule requires matching the names of the GUO of the groups and the subsidiaries extracted from Orbis database. But it also requires matching those with the names of the assignees as listed in the Patstat database. This automated pairing required a strict match between the character strings of the two databases, which raised two difficulties.
Difficulties related to the matching of databases
First of all, a company can appear with a different name and spelling within the patent database, for example, IBM and International Business Machines. It is thus difficult to regroup under a single label patents applied for with different applicant labels. This difficulty is overcome by using the harmonized names suggested in the Patstat database (see Data sources). We can thus gather the variations of an assignee name under a single one, known as the harmonized one.
A second difficulty rises when the name of the assignee which appears in the patent database does not correspond exactly to its legal name (GUO or subsidiary company name) used in the Orbis database; which can also differ from the common designation of this entity. Thus the Dutch group known as “Philips” in the Patstat patent database is listed in Orbis as a GUO whose name is Koninklijke Philips Electronics NV.
Preliminary standardization of character strings
The matching technique which is used in the Corporate Invention Board project builds on Tom Magerman’s research at the Catholic University of Leuwen. His methodology, developed in collaboration with the Organization of Economic Co-operation for the Development (OECD), the Eurostat directorate of the European commission and the European Patent Office (EPO) can be summarized in two stages.
It is first necessary to proceed to a spelling check and a cleaning stage and to remove, for example, double spaces appearing between two words or blank spaces preceding a comma.
Then, legal designations of the companies (such as Ltd, Corp, Its, Inc…) that appear systematically in the Orbis database, but seldom in the patent one, need to be removed in order to improve the matching between the two.
This methodology allows us to identify more than 5 million priority patents registered by the 2400 studied groups.
A reducible but inevitable margin of error
As for any treatment of large databases, it would be illusory to think of identifying without any mistake, the whole of the patents of the studied population multinational corporations. The objective is to choose the most satisfactory trade off between the false-negatives and the false-positives. In the first case, it would mean matching a patent to a company, which it should not; in the second case, it would mean missing a patent applied for by a firm which would lead to its non integration in the company’s patent portfolio. Nevertheless, a margin of error always remains when using an automated process.
Our goal is to limit the extent of error for this first edition and to work in order to improve it in future editions of the Corporate Invention Board. This will be possible with methodological improvements in names’ identification and standardization and thanks to improvements in the the Patstat database (see Data sources). This should improve the matching process.
It is our intention to engage with other research groups who would like to work within the framework of this first edition of Corporate Invention Board.





