Methodology: Introduction to MMRF
What is MMRF?
Last updated
What is MMRF?
Last updated
Figure 9: MMRF Model
Initialising MMRF for Textual Description Inputs
We introduce a Multimodal Radiance Field (MMRF) model (Fig. 9) with the aforementioned pre-trained datasets: MMRF combines 2D images and textual descriptions, inspired by NeRF [22]. By initialising NeRF’s architecture with a generated Signed Distance Function (SDF), it creates features that instantiate geometrically specific 3D mesh models, like “reinforced concrete beams”.
Audio-Driven Scene Dynamics
MMRFs will incorporate an advanced audio processing module that utilises acoustic scene analysis and auditory scene synthesis, a combination of self-supervised speech representations and neural network-based audio processing algorithms. This translates real-world auditory data into dynamic scene influencers.
Adaptive Resolution Scaling
MMRF further implements adaptive resolution scaling. With a fixed camera location, we apply a transvoxel algorithm to SDF points in spherical coordinates. This concentrates computational resources on key focal areas, such as beam-column intersections, while relegating peripheral zones to lower resolution for efficiency.
Modular Design
Architecturally, MMRF is modular, with dedicated processors for each modality (visual, textual, auditory), mirroring 3D-GPT’s multi-agent approach [19] for future scalability. MMRF hence manages the rendering process of our 3D assets and open world. Graphically, we will convert the open world into a VR asset using game engine plugins, while a separate physics engine handles the customised physics.