Methodology: Introduction to MMRF

What is MMRF?

Developing MMRF to Render 3D scenes based on Multimodal Inputs

Figure 9: MMRF Model

Initialising MMRF for Textual Description Inputs

We introduce a Multimodal Radiance Field (MMRF) model (Fig. 9) with the aforementioned pre-trained datasets: MMRF combines 2D images and textual descriptions, inspired by NeRF [22]. By initialising NeRF’s architecture with a generated Signed Distance Function (SDF), it creates features that instantiate geometrically specific 3D mesh models, like “reinforced concrete beams”.

Audio-Driven Scene Dynamics

MMRFs will incorporate an advanced audio processing module that utilises acoustic scene analysis and auditory scene synthesis, a combination of self-supervised speech representations and neural network-based audio processing algorithms. This translates real-world auditory data into dynamic scene influencers.

Adaptive Resolution Scaling

MMRF further implements adaptive resolution scaling. With a fixed camera location, we apply a transvoxel algorithm to SDF points in spherical coordinates. This concentrates computational resources on key focal areas, such as beam-column intersections, while relegating peripheral zones to lower resolution for efficiency.

Modular Design

Architecturally, MMRF is modular, with dedicated processors for each modality (visual, textual, auditory), mirroring 3D-GPT’s multi-agent approach [19] for future scalability. MMRF hence manages the rendering process of our 3D assets and open world. Graphically, we will convert the open world into a VR asset using game engine plugins, while a separate physics engine handles the customised physics.

PreviousMethodology: Creation and Preprocessing of Datasets NextMethodology: Personalised Physics

Last updated 1 year ago