In additive manufacturing (AM), heating and cooling cycles are the main factors influencing component quality. Physics-based models are the most powerful tool for simulating such processes. However, the increasing availability of multimodal sensor data in complex industrial systems requires hybrid digital twins. Machine learning and physics-based process simulation capabilities are crucial to optimize the complex production environment. To integrate and analyze multimodal sensor data while capturing the underlying physical behaviour, we propose a multimodal physics-contrained graph transformer learning framework (MPGT). The framework is made up of three parts: A graph neural diffusion approach is employed to model the continuous heat transport in AM and predict temperature features. An autoencoder and principal component analysis (PCA) generate dynamic feature embeddings representing potential structural weaknesses, such as corners and edges on the top surface of the printed components. The intermediate fused features are processed by a temporal-spatial graph transformer, which aggregates multimodal information to predict the component quality. Our method enables precise defect identification and predictive insight into component integrity. Experimental evaluations of grayscale images and thermal data demonstrate the effectiveness of our framework in accurately predicting quality outcomes for 3D-printed components. This emphasises the potential of the framework for complex industrial applications in engineering and manufacturing. The MPGT code will be made open-sourced on GitHub once the paper is published.