CRBC News

From Concrete to Community: How Synthetic Data Makes Urban Digital Twins Human-Centered

Key idea: Synthetic data can close the human gap in urban digital twins by modeling residents’ movements and activities while protecting privacy.

These artificial datasets let planners simulate how diverse groups — including older adults and people with disabilities — experience public space and transit, and enable risk-free testing of policies. Trustworthy use requires validation against real-world decisions, routine bias audits, and direct community participation.

From Concrete to Community: How Synthetic Data Makes Urban Digital Twins Human-Centered

Urban leaders often talk about creating “smart” cities by building urban digital twins — high-resolution 3D models that map buildings, roads and utilities. Constructed with precise instruments such as cameras and LiDAR (light detection and ranging), these virtual replicas excel at reproducing a city’s physical form. But when digital twins focus only on infrastructure, they miss the most dynamic element of cities: people.

The missing piece: people and everyday behavior

Residents move through, live in and use streets, parks and transit systems. A digital twin that mirrors infrastructure but ignores how people use spaces — how they walk, wait for buses or access parks — offers an incomplete picture and cannot reliably guide equitable planning or complex policy decisions.

Why synthetic data helps

Directly using detailed personal data raises major privacy and legal concerns. Regulations such as the European Union’s General Data Protection Regulation (GDPR) limit broad sharing of sensitive personal information, which impedes collaboration and cumulative learning. Real-world datasets can also be biased and uneven: low-income neighborhoods often have sparse sensor coverage, causing models trained on those data to reproduce and even amplify existing inequities. While statistical weighting can partially correct for underrepresentation, it has limits.

Synthetic data — artificially generated records that mirror the statistical relationships and patterns of real-world observations — offers a way forward. It can preserve privacy while filling gaps in coverage and represent diverse behaviors and demographic groups without exposing individual identities.

What synthetic human dynamics change

Adding realistic, synthetic patterns of walking, transit use and public-space occupancy transforms a digital twin from a static map into a dynamic simulation. Planners can test how elderly residents or people with disabilities would traverse redesigned streets, or how different populations respond to service changes, without exposing or endangering real people.

A practical example: Bogotá’s TransMilenio

In Bogotá, Colombia, planners used synthetic data to populate a digital twin of the TransMilenio bus rapid transit system. Rather than relying solely on limited or sensitive sensor feeds, they generated millions of simulated bus arrivals, vehicle speeds and queue lengths calibrated to the system’s real-world peak and off-peak patterns. That approach helped evaluate operations and scenarios while avoiding privacy risks and filling gaps in measured data.

Trust, validation and fairness

For synthetic data to be useful, planners must be able to trust it. One practical validation is to compare planning decisions derived from synthetic datasets with those that would result from real-world, privacy-sensitive data; if they align for the task at hand, the synthetic data is an acceptable stand-in. Beyond that, synthetic models should undergo routine audits to detect hidden biases and underrepresentation, and to ensure simulations — for example, evacuation plans — work for vulnerable groups such as older adults with limited mobility.

Involving communities and ethical practice

Ethical and effective use of synthetic data requires community involvement. Setting up citizen advisory boards and co-designing simulation scenarios with residents helps ensure models reflect lived experience and local priorities. Transparency about methods and clear governance around use are equally important to maintain trust.

Conclusion

By shifting focus from static infrastructure to dynamic models of human behavior, synthetic data can make urban digital twins more humane, equitable and useful. When combined with robust validation, fairness audits and community co-design, synthetic datasets enable safer, privacy-preserving experimentation and better-informed planning decisions.

Author: Wei Zhai, University of Texas at Arlington. Disclosure: Wei Zhai receives funding from the National Science Foundation.

Similar Articles