OpenBreastUS: Benchmarking Neural Operators or Wave Imaging Using Breast Ultrasound Computed Tomography

Zhijun Zeng1,2, Youjia Zheng1, Hao Hu1, Zeyuan Dong3, Yihang Zheng1, Xinliang Liu4, Jinzhuo Wang1, Zuoqiang Shi2, Linfeng Zhang5, Yubing Li3and He Sun1

1Peking University 2Tsinghua University 3Chinese Academy of Sciences
4King Abdullah University of Science and Technology 5DP Technology
Corresponding author

Schematic diagram of a USCT system and the OpenBreastUS dataset. The imaging target is placed inside an annular transducer array, with each transducer emitting waves sequentially while the others act as receivers. The OpenBreastUS dataset includes anatomically realistic human breast phantoms and their corresponding wavefields at different frequencies.

Dataset Specifications

OpenBreastUS is a large-scale wave equation dataset designed to bridge the gap between theoretical equations and practical imaging applications, consisting of 8,000 breast phantoms and 16,384,000 wavefields. To represent the distribution of diverse human breast types, the dataset is divided into four groups, each corresponding to a specific breast density type: heterogeneous (HET), fibroglandular (FIB), all fatty (FAT), and extremely dense (EXD).

Breast Type Frequency (MHz) Phantoms Storage
Heterogeneous (HET) 0.30-0.65 2000 7.2TB
Fibroglandular (FIB) 0.30-0.65 3000 10.8TB
Fatty (FAT) 0.30-0.65 2000 7.2TB
Extremely Dense (EXD) 0.30-0.65 1000 3.6TB

The wavefields are simulated using parameters from a real annular USCT system. We focus on 8 frequencies between 300 kHz and 650 kHz, sampled at 50 kHz intervals, resulting in ROIs with approximately 50 to 100 wavenumbers.

Generation Steps

1. Download source data from huggingface

OpenBreastUS Dataset

2. Prepare speed data

>> run split_data.m

output:
your_project_path/organ_speed/train/train_xx.mat
your_project_path/organ_speed/test/test_xx.mat

3. Launch the executable file

This is the runtime interface example. Configure the following parameters to control the data generation process. Set the speed path to your output dir from the last step, and the system will generate data and detailed log files in your specified output directory.

Recommended configuration

16-core CPU; MATLAB 2020b+; 64GB RAM