Methodology: Creation and Preprocessing of Datasets

How we create and preprocess datasets

Methodology

Creating and Processing of Datasets

The different datasets can be pre-processed as follows (Table 1):

Textual datasets will undergo tokenisation, stemming, and lemmatization, focusing on physics -related keywords for structural integrity and material properties analysis.

Image datasets will be standardized, converted to grayscale, and normalized. Structural feature extraction will be applied to analyse structural integrity and failures of various structures, while stress-strain graphs will be converted to usable synthetic material data.

In video datasets, we will extract footage segments of structural collapses via sequence formation and encoding to ensure contextual relevance. Spectrograms produced from multiframe audio extraction will undergo Fourier Transforms to produce realistic impact audio effects.

Types of Datasets

Examples

Material Textures & Structural Properties

(Text, Image)

Materials Used in Building Construction - Construction Tuts

Figure 2: Different Materials used in Structures

[7]

Stress & Strain Relationships of Materials

(Text, Graph – Image)

What is a Stress-Strain Curve? | SimWiki | SimScale

Figure 3: Stress-Strain Curve

[21]

Labelled images of Structures

(Image)

traditonal beam formwork

Figure 4: Labelled diagram of Structure

[3]

Live footages of Structures

(Video)

Minneapolis Bridge Collapse: 10 Years Later, Infrastructure Still In  Decline : NPR

Figure 5: Interstate 35W bridge Collapse

[18]

Game footages of Structures

(Video)

COLLAPSES! (Cities Skylines Natural Disasters) - YouTube

Figure 6: Cities Skyline Game

[5]

Impact Audio

(Audio)

Bank Implosion Progression - building implosion sound effectsBank Implosion Spectrogram

Figure 7: Spectrogram of Implosion of Pasadena State Bank

[6]

Structural Effects of Natural Events

(Text, Video)

Turkey earthquake: Why did so many buildings collapse? - BBC News

Figure 8: Turkey Earthquake Building Collapse

[2]

Table 1: Potential Datasets

Last updated