# Methodology: Creation and Preprocessing of Datasets ### **Methodology** ### **Creating and Processing of Datasets** The different datasets can be pre-processed as follows (Table 1): Textual datasets will undergo tokenisation, stemming, and lemmatization, focusing on physics -related keywords for structural integrity and material properties analysis. Image datasets will be standardized, converted to grayscale, and normalized. Structural feature extraction will be applied to analyse structural integrity and failures of various structures, while stress-strain graphs will be converted to usable synthetic material data. In video datasets, we will extract footage segments of structural collapses via sequence formation and encoding to ensure contextual relevance. Spectrograms produced from multiframe audio extraction will undergo Fourier Transforms to produce realistic impact audio effects. | **Types of Datasets** | **Examples** | | | | ----------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | ------------------------------------------------------------------------------------------------- | |

Material Textures & Structural Properties

(Text, Image)

Materials Used in Building Construction - Construction Tuts