Literature Review

A review of relevant literature

Literature Review

Our paper builds on Neural Radiance Fields (NeRF), Infinigen and 3D-GPT.

[22] NeRF factorisation addresses the challenge of recovering the shape and reflectance of multi-view images. NeRFactor converts a NeRF’s volumetric geometry into a surface representation before refining it without supervision.

[19] The 3D-GPT framework introduces a novel approach to 3D modelling by leveraging Large Language Models (LLMs) like GPT-4. It segments complex 3D tasks, assigning specialised LLM agents to each. This guides our development of a multi-agent, modular pipeline for multimodal inputs.

[17] The rendering component draws from Infinigen’s framework, where photorealistic 3D scenes and assets are procedurally generated using randomised mathematical rules and synthetic data. We utilise this framework for rendering intricate visual assets and generate synthetic datasets.

Nevertheless, existing research only handles text and image inputs for developing 3D scenes. Physics of the 3D scenes are also yet to be customisable. Thus, we will implement a fully multimodal framework that involves more varied inputs ranging from text, images, audio to video. This will be pre-trained on existing datasets. Our framework would be two-pronged: it would utilise an innovative technique called multi-modal radiance fields (MMRF) to render 3D scenes, followed by a customised physics engine enabling prompt-based physics. We developed the SIS metric to gauge our new unique model’s efficacy (see Appendix B).

Last updated