Graphyti
  • Table of Contents
  • Problem Statement
  • Literature Review
  • Methodology: Creation and Preprocessing of Datasets
  • Methodology: Introduction to MMRF
  • Methodology: Personalised Physics
  • Expected Outcome
  • Conclusion
  • Bibliography
  • Appendix A: Implementation Timeline
  • Appendix B: Metric to gauge Accuracy & Precision of MMRF
Powered by GitBook
On this page

Methodology: Introduction to MMRF

What is MMRF?

PreviousMethodology: Creation and Preprocessing of DatasetsNextMethodology: Personalised Physics

Last updated 1 year ago

Developing MMRF to Render 3D scenes based on Multimodal Inputs

Figure 9: MMRF Model

Initialising MMRF for Textual Description Inputs

We introduce a Multimodal Radiance Field (MMRF) model (Fig. 9) with the aforementioned pre-trained datasets: MMRF combines 2D images and textual descriptions, inspired by NeRF [22]. By initialising NeRF’s architecture with a generated Signed Distance Function (SDF), it creates features that instantiate geometrically specific 3D mesh models, like “reinforced concrete beams”.

Audio-Driven Scene Dynamics

MMRFs will incorporate an advanced audio processing module that utilises acoustic scene analysis and auditory scene synthesis, a combination of self-supervised speech representations and neural network-based audio processing algorithms. This translates real-world auditory data into dynamic scene influencers.

Adaptive Resolution Scaling

MMRF further implements adaptive resolution scaling. With a fixed camera location, we apply a transvoxel algorithm to SDF points in spherical coordinates. This concentrates computational resources on key focal areas, such as beam-column intersections, while relegating peripheral zones to lower resolution for efficiency.

Modular Design

Architecturally, MMRF is modular, with dedicated processors for each modality (visual, textual, auditory), mirroring 3D-GPT’s multi-agent approach [19] for future scalability. MMRF hence manages the rendering process of our 3D assets and open world. Graphically, we will convert the open world into a VR asset using game engine plugins, while a separate physics engine handles the customised physics.

A diagram of a multimodal model Description automatically generated