Apple Releases LiTo Large Model: Generate 3D Objects from Single Images, AI Highly Reproduces Multi-View Lighting and Shadows

robot
Abstract generation in progress

Tech News 3/17: The technology media outlet 9to5Mac published a blog post yesterday (March 16) reporting that Apple’s AI research team has released a research paper demonstrating a breakthrough in the field of 3D reconstruction: the ability to reconstruct a complete 3D object from just a single flat image.

The patent describes a new model called LiTo (Surface Light Field Tokenization), which breaks the traditional requirement for multi-angle image inputs. After reconstructing the 3D object, when users switch different viewing angles, the model’s generated reflections, highlights, and other lighting effects still maintain high physical realism and consistency.

The core of this breakthrough lies in the innovative application of “Latent Space.” In machine learning, latent space compresses complex information into multi-dimensional mathematical vectors, significantly reducing computational costs.

The LiTo model pioneers a unified 3D latent representation, encoding randomly sampled surface light field data into a compact set of vectors. This means the model doesn’t need to memorize every visual detail but instead describes objects mathematically, capturing both their physical shape and the underlying laws of how light interacts with their surfaces.

In terms of operation, the LiTo encoder is responsible for “compressing information,” transforming the geometric structures and view-dependent appearance features from input images into streamlined codes within the latent space.

Subsequently, the decoder performs “inverse decompression,” using these underlying codes to fully restore the 3D object. This bidirectional mechanism allows the model to accurately reproduce advanced lighting effects such as specular highlights and Fresnel reflections under complex lighting conditions.

To develop this model, Apple researchers trained it intensively using thousands of 3D objects rendered from 150 different viewpoints under three different lighting conditions. The system continuously sampled small data subsets to train the decoder to reconstruct complete objects under various lighting and viewing angles.

Ultimately, the model can predict its 3D latent representation from just a single image. In official comparison tests released by Apple, LiTo significantly outperformed existing TRELLIS models in multi-view lighting and shadow reconstruction.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin