The Urban Soundscapes of the World database currently contains about 130 high-quality audiovisual recordings performed within 9 cities worldwide; more recordings are underway. One-minute fragments are available online. Contact us in case you need fragments of longer duration. The recordings are free to use for research and educational purposes.
Each recording consists of a 360-degree video file (4096 x 2048 resolution, 30 fps), a 4-channel first-order ambisonics (ACN/SN3D) audio file and/or a binaural audio file. All audio files have a sample rate of 48 kHz and are 24-bit PCM encoded. All audio and video files are time-synchronized. A YouTube preview is available for all recordings.
Recordings are made during the day, in favorable weather conditions with little to no wind. Note that the recordings always present a snapshot in time.
Combined and simultaneous audio and video recordings are performed using a portable, stationary recording setup as shown on the picture. The setup consists of the following components (from top to bottom):
- First order ambisonics: Core Sound TetraMic with windshield and Tascam DR-680 MkII 4-channel recording device;
- 360-degree video camera: GoPro Omni spherical camera system.
- Binaural audio: HEAD acoustics HSU III.2 artificial head with windshield and SQobold 2-channel recording device;
The ears of the artificial head, the video camera system and the ambisonics microphone are located at heights of about 1.5m, 1.7m and 1.9m, respectively. The recording setup is highly portable and takes only about 10 minutes to assemble/disassemble.
At each location, the recording system is oriented towards the most important sound source and/or the most prominent visual scene—this orientation defines the initial frontal viewing direction for the 360-degree video and ambisonics recordings, and the fixed orientation for the binaural recordings.
All audio files are calibrated to the same reference, so once you have your playback setup calibrated, it can be used to play all files. The Excel file below contains the one-minute LAeq values of the binaural recordings (average of left and right channel and left and right channels separately). These values are the most representative for the LAeq at the location.
The second column presents the LAeq of the mono mix (superposition) of both left and right channels of the binaural recording. Note that this is not necessarily the same as the (energetic) average of the LAeq’s of both left and right channels separately, because both channels are to some degree correlated (depending on the diffuseness of the sound field). This explains why the (energetic) average of the third and fourth column will not always exactly correspond to the value in the second column, but the difference is usually small. Roughly speaking, the larger the difference, the more the sound at both ears is correlated.
More details on the recording setup and protocol can be found in our publications. Note that some publications contain LAeq values that were calculated from the ambisonics recordings (W channel). There is not really a standard way of calculating LAeq values from ambisonics recordings, so these are maybe less suitable to use in most cases.