FACTMS: ENHANCING MULTIMODAL FACTUAL CONSISTENCY IN MULTIMODAL SUMMARIZATION

FACTMS: Enhancing Multimodal Factual Consistency in Multimodal Summarization

FACTMS: Enhancing Multimodal Factual Consistency in Multimodal Summarization

Blog Article

Multimodal summarization (MS) generates text summaries from multimedia chevy sonic rims for sale articles with textual and visual content.Therefore, MS can suffer from the multimodal factual inconsistency problem, where the generated summaries may distort or deviate from both the textual and visual content in the original multimodal input.Existing MS approaches mainly focus on improving the degree of word overlap between output summaries and reference summaries while ignoring the factual inconsistency problem.In this work, we propose a fact-aware multimodal summarization model (FactMS) to improve the multimodal factual consistency of MS.Specifically, we extract the factual information in the textual and visual fig leaf apron modalities and remove the text knowledge with low correlation with the visual input.

A dual heterogeneous knowledge graph is then proposed to learn the intra-modality factual information in both textual and visual content.By fusing the multimodal factual information in the decoding stage, summaries with better factual consistency are generated.In addition, we propose an automatic method for evaluating visual factual consistency in MS based on the text-image consistency.Extensive experiments conducted on three real-world datasets demonstrate that FactMS achieves an excellent performance of multimodal factual consistency and has a comparable performance in terms of word overlap.

Report this page