Purpose Federated training is often challenging on heterogeneous datasets due to divergent data storage options, inconsistent naming schemes, varied annotation procedures, and disparities in label quality. This is particularly evident in the emerging multi-modal learning paradigms, where dataset harmonization including a uniform data representation and filtering options are of paramount importance.Methods DICOM-structured reports enable the standardized linkage of arbitrary information beyond the imaging domain and can be used within Python deep learning pipelines with highdicom. Building on this, we developed an open platform for data integration with interactive filtering capabilities, thereby simplifying the process of creation of patient cohorts over several sites with consistent multi-modal data.Results In this study, we extend our prior work by showing its applicability to more and divergent data types, as well as streamlining datasets for federated training within an established consortium of eight university hospitals in Germany. We prove its concurrent filtering ability by creating harmonized multi-modal datasets across all locations for predicting the outcome after minimally invasive heart valve replacement. The data include imaging and waveform data (i.e., computed tomography images, electrocardiography scans) as well as annotations (i.e., calcification segmentations, and pointsets), and metadata (i.e., prostheses and pacemaker dependency).Conclusion Structured reports bridge the traditional gap between imaging systems and information systems. Utilizing the inherent DICOM reference system arbitrary data types can be queried concurrently to create meaningful cohorts for multi-centric data analysis. The graphical interface as well as example structured report templates are available at https://github.com/Cardio-AI/fl-multi-modal-dataset-creation .