FlySight Collaborates with Data Machine Intelligence to Validate the Gap Between Synthetic and Real-World Data

May 28, 2024

FlySight and DMI Benchmark Synthetic Data, Achieving Training Results at the Level to the Renowned Singapore Maritime Dataset

From fusing virtual scenarios with real-world data to rigorously validating the “GAP” between the virtual simulations created by DMI and actual real-world scenarios, the core objective of this collaboration is to ascertain how closely DMI’s virtual representations can mirror reality. This enhances the fidelity and applicability of virtual simulations.

The essence of this collaboration lies in accurately determining and minimizing the discrepancies between synthetic datasets and real-world observations. Through a meticulous comparison process, this initiative aims to refine the precision of virtual models, ensuring they closely align with real-world data. This endeavor is crucial for fields that rely on exacting standards of data validation and model reliability, providing a concrete basis for the application of virtual models in real-world scenarios.

THE CONTEXT:
Overcoming Data Challenges in AI – The Role of Synthetic Training Data

In the field of safety-critical AI development, the scarcity of suitable training data represents a significant obstacle. This is especially pertinent for AI systems that must be prepared for events that are not only infrequent but could also pose substantial risks to human safety. For example, an autonomous Search and Rescue (SAR) system needs to reliably identify sinking ships. To effectively train a computer vision system for this task, images of ships in distress, often in extreme weather conditions, are essential.

To avoid the necessity of sinking ships just for photographic purposes, generating artificial training data presents a promising alternative. This method could significantly simplify the development process and reduce costs.

Recognizing this potential, FlySight and Data Machine Intelligence have conducted a test to ascertain whether synthetic data can be produced to a standard where its training effectiveness is on par with that of actual datasets.

Targeting Precision: Bridging the Virtual-Real Divide

The metric leveraged by FlySight in this innovative venture is the Mean Average Precision (MAP). This metric serves as a critical tool in quantifying the accuracy of object detection models by gauging their precision in identifying and classifying various targets within images. The application of MAP is instrumental in achieving a standardized, objective measure of how virtual data stands up against real-world counterparts, thus facilitating a thorough validation of the “GAP” between virtual and real.

About the test

The initiative ventures into simulating complex scenarios where targets appear at differing distances and may be partially obscured by other targets or obstacles, like buoys. This nuanced approach aims to replicate real-world complexities within virtual simulations, offering a robust platform for validation.

The dataset pivotal to this collaboration has been meticulously captured under varied environmental conditions, predominantly during daylight hours. A commitment to realism drives the endeavor to simulate a broad spectrum of lighting conditions, alongside introducing elements of unpredictability, such as a 10% inclusion of images with light haze effects.

Scene setups are thoughtfully constructed, ranging from stationary cameras positioned on the shore to dynamic setups on board speed boats. In scenarios involving movement, particular attention is paid to adjusting camera angles to accurately capture the influence of environmental factors, such as wave movements, thus enriching the virtual data with real-world complexities.

Test results

We trained three different YOLO-based object detectors using these data variations:

Singapore real data
Synthetic data
Hybrid Dataset: 100 real data images selected randomly and all the synthetic data

The results show that real and synthetic data models perform in comparable ways (MAP 31 % vs 32%) whereas the “hybrid model” behaves considerably better (MAP 39%).

Conclusion

The results confirm that together, FlySight and Data Machine Intelligence were able to find a reliable and scalable way to generate synthetic data for AI model training, either in combination with a scarce number of real data or on its own.

Based on these results, we will take the next steps and plan to expand our collaboration.

DMI Labs automation development platform

The synthetic dataset was created using the DMI Labs automation development platform. Data Machine Intelligence has leveraged its proprietary simulation engine, connected to Unreal Engine for rendering, to produce highly accurate datasets. By reproducing an environment similar to that captured in the Singapore Maritime Dataset, the simulation utilized a ray tracing engine adapted to emulate realistic camera characteristics. This process resulted in an annotated dataset with 20,000 instances, comprising 2,893 images of synthetic data and 4,062 images of real data.

Key Features of the Synthetic Dataset

Boat Categories and Frequencies: Four categories of boats were generated to match the frequencies observed in the real data.
Simulated Environment: A 5 Km x 5 Km field was created, where boats were placed randomly to reflect natural distribution.
Position Probability Enhancements: Parameters were set to increase the likelihood of finding boats in their natural harbor positions. Sailing boats, boats, and speedboats were positioned nearer to the camera, while container ships were placed further away. These thresholds were adjusted throughout the simulation.
Dynamic Boat Placement: Boats were placed and removed after each frame, creating high frame-to-frame variation.
Water and Weather Variability: Two water lines were simulated to vary water roughness, with parameters adjusted throughout. Sky conditions, sun positions, and weather elements were generated procedurally, with particular attention to fog, using four different models.
Realistic Occlusions: static environment objects, such as buoys, were added in some frames to simulate real data conditions and introduce occlusions.
Bounding Box Adjustments: Encapsulated bounding boxes were merged when fully included in another of the same type.

This advanced simulation process ensured that the synthetic dataset closely mirrored the real-world conditions of the reference dataset, providing a robust foundation for our benchmarking test.

We are truly satisfied with the results we have achieved. The test shows the possibilities of synthetic data and has given us clear confirmation that the additional improvements that we have on our roadmap are attainable and will provide real value. In the Flysight simulation, we used an optical camera, but there are technically no limits regarding the sensor types. For example, very shortly we will be able to provide simulation in thermal imaging, radar and lidar, within the same scenario, to also train and assess sensor fusion.

Our mission is to accelerate the development of safe and robust AI systems – having a fine tuned data set generation engine at hand is a big step forward in this way

Matteo Marone

CTO Synthetic Data, Data Machine Intelligence

FlySight method of validation

Experiments were conducted using different sets of data to fine-tune the YOLO-X object detector. Specifically, fine-tuning was tested with real data only, synthetic data only , and a combination of both. All models were tested on real data. The results showed that models fine-tuned with real and synthetic data alone achieved similar performance (31% mean average precision). However, combining 100 randomly selected real images with synthetic data boosted the performance to 39% .

The first thing that comes to mind when someone asks us to develop an AI method is: where do we get the data for training and testing? My first choice is always to look for a public dataset. However, I often encounter the same questions: Is the data sufficient in quantity? Is there enough variety in the data? Considering privacy and ownership, can I use this data for training and testing my algorithm? What if I need more data? Synthetic data generation has the potential to address all these issues and many more. Our tests with DMI have shown that this is not just a possibility for the future—it is a reality right now. We are very interested in continuing this collaboration and exploring new scenarios and contexts.

Niccolò Camarlinghi

Head of Research, FlySight

Meet with DMI team at ILA Berlin Air Show, June 5th-8th- Berlin ExpoCenter Airport at MSC Booth

RELEASE:

. Report Difesa

. Ares Difesa

Discover if OPENSIGHT can meet your needs

Want to be always

up to date?

FlySight team constantly develops cutting-edge solutions to provide real-time support in defence and security operations decision making.

Subscribe to our newsletter to be always updated with our latest innovations!

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__hssc	30 minutes	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.

Cookie	Duration	Description
__hstc	1 year 24 days	This is the main cookie set by Hubspot, for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_M1K4S0H3XQ	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_177522470_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
hubspotutk	1 year 24 days	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.

Cookie	Duration	Description
_pk_id.30133.057d	1 year 27 days	No description
_pk_ses.30133.057d	30 minutes	No description

FlySight Collaborates with Data Machine Intelligence to Validate the Gap Between Synthetic and Real-World Data

FlySight and DMI Benchmark Synthetic Data, Achieving Training Results at the Level to the Renowned Singapore Maritime Dataset

THE CONTEXT:
Overcoming Data Challenges in AI – The Role of Synthetic Training Data

Targeting Precision: Bridging the Virtual-Real Divide

About the test

Test results

Conclusion

DMI Labs automation development platform

Key Features of the Synthetic Dataset

FlySight method of validation

Discover if OPENSIGHT can meet your needs

Recent Posts

Archives

Categories

Want to be always

up to date?

Headquarters

Branch Offices

TAX DATA

Home Page

OPENSIGHT

Blog

Resources

About

Contacts Us

FlySight Collaborates with Data Machine Intelligence to Validate the Gap Between Synthetic and Real-World Data

FlySight and DMI Benchmark Synthetic Data, Achieving Training Results at the Level to the Renowned Singapore Maritime Dataset

THE CONTEXT:Overcoming Data Challenges in AI – The Role of Synthetic Training Data

Targeting Precision: Bridging the Virtual-Real Divide

About the test

Test results

Conclusion

DMI Labs automation development platform

Key Features of the Synthetic Dataset

FlySight method of validation

Discover if OPENSIGHT can meet your needs

Recent Posts

Archives

Categories

Want to be always

up to date?

THE CONTEXT:
Overcoming Data Challenges in AI – The Role of Synthetic Training Data