THE European Union Agency for Railways’ (ERA) motto is “moving Europe towards a sustainable and safe railway system without frontiers,” which leads many people to still think only in terms of the traditional electro-mechanical components of the railway system, such as track, rolling stock, electrification and signalling systems. Reducing technical and operational barriers between EU member states, and defining common standards and practices for European rail safety, are and always will be at the core of ERA’s mandate, but this alone will not prepare the sector for the future.

If we want to push rail technology forward we must ensure that data within the rail system - the “language” of technology - is findable, accessible, interoperable, and re-usable (Fair). If data sets comply with this Fair principle, they can expand the scope of their original area of use, be repurposed, and therefore bring additional benefits to the citizens of Europe. Linked data creates the opportunity that defines the business of tomorrow, allowing new players to join the market and discover new uses for the data we already collect today.

ERA started its own data project by linking two data sets that were created for entirely different use cases: the European Register of Authorised Types of Vehicles (ERATV) and the Register of Infrastructure (RINF). Both registers were created under the ERA umbrella with reference to two different sets of European legislation, and in both cases the use of these data sets has been strictly limited to the scope of the precise piece of legislation that brought them into being. Both registers rely on data input from ERA stakeholders, the national safety authorities (NSAs) and the rail sector, and as different registers have been created for a different single-use purpose, stakeholders must often put the same data into different registers.

To leave behind this world of single-purpose, locked data sets, a common language must define the data’s relationship with the world, so it can be used in a context different from the use case for which it was originally developed. This is when technology becomes semantic technology, a set of methods and tools that provide an advanced means of categorising and processing data, as well as for discovering relationships within varied data sets. And since a common language requires a common vocabulary, the first step of ERA’s linked data project was to create the ERA ontology. It represents the concepts and relationships contained in the legal framework for the sector and the use cases within ERA’s remit, and currently covers European railway infrastructure and the vehicle types authorised to operate on it.

The semantic vocabulary used to extract value from ERA’s registers is extended each time a new use case is defined using ERA’s data assets. Linking RINF and ERATV is merely the first step towards the larger goal of becoming a data-centric organisation. But it is quite a major step forward as, thanks to semantic technology, route compatibility checks can be made on demand, and soon, in several languages. The tool for linking what have until now been siloed registers is the creation of knowledge graphs. Knowledge graphs are often used to store interlinked descriptions of entities, such as objects, events, situations, or abstract concepts, while also encoding the semantics or relationships underlying these entities. It is only with a knowledge graph that we can ask questions that go beyond the originally defined use of a data set.

Placing ERA on the path to becoming a data-centric organisation is a gradual, step-by-step process. We define the core values, vision, and principles, and then in close collaboration with the sector search for use cases such as the route compatibility check. The meticulous process then begins to create knowledge graphs for each concerned data set, expanding the ERA ontology in the process. The newly-created set of linked data is then subject to modern data asset management, the process of managing, organising, and optimising data as a valuable business asset. This involves the identification, classification, storage, safeguarding, retrieval, and destruction of data. It is not just about managing data, but also about extracting the maximum possible value from it.

Our data management principles are aligned with the European Commission’s (EC) Digital Europe Programme (DEP). ERA’s move to become a data-centric organisation represents a contribution to the common European mobility data space, which builds on and complements existing European Union and national legislation. Legislative initiatives that will contribute to the creation of this data space include the ERA ontology for the overall railway system.

In addition to Fair, the data sets under ERA’s supervision will become self-describing and machine-readable thanks to the common vocabulary established in the ERA ontology and the ERA knowledge graph. The knowledge graph is accessible to any external user or search engine to combine the different data sets and to build new use cases as they wish.

Another key advantage of the data- centric approach is that the different organisations feeding ERA’s databases, such as the NSAs, infrastructure managers and operators, need only provide the information once. As the silos are broken down to provide a single source of truth in ERA’s data space, making individual entries in several different databases using different interfaces has become obsolete.

ERA’s journey towards becoming a data-centric organisation is a gradual transition. The step-by-step approach will integrate an increasing number of data sets from ERA’s portfolio following carefully established use cases. For example, the ERA team is currently working on further streamlining and automating the processes of European vehicle authorisation (VA) and single safety certification (SSC) by integrating the European Railway Agency Database of Interoperability and Safety (Eradis) into the linked data space. As a database for all certificates concerning safety and interoperability constituents in the European railway system, there is considerable potential for further streamlining of the European authorisation processes. Further candidates for timely integration are the European Vehicle Register (EVR), the Safety Alerts IT tool (SAIT) and the Organisational Code Register (OCR).

ERA’s linked data space is in no way limited to internal data sets, and depending on the use case, external data may also be integrated. Current plans feature integration of data sets produced by Eulynx, a consortium of 15 infrastructure managers working to standardise interfaces and elements within signalling systems, and the Rail Facilities Portal commissioned by the EC but currently run by Rail Net Europe (RNE), a separate association of 38 European infrastructure managers.

The quality of use cases depends on the quality of the data input, and this requires authorities and organisations within the rail sector to collaborate. The level of cooperation within ERA’s linked data project has been very good because true value can be extracted from this process to make the rail sector more competitive and well-equipped for a future of multimodality and digital connectivity. While ERA is not at the forefront of developing new technology, it can be an early adopter. With our linked data project, we hope to be exactly that while breaking down digital barriers to serve our customers.