2D to 3D new breakthrough! In-depth analysis of AIGC technology, an article to understand the history and current situation of 3D data generation

巴比特_

Author: Chengxi Editor: Manman Zhou

Source: Silicon Rabbit Race

In the past 18 months, AI Content Generation (AIGC) is undoubtedly the hottest and hottest topic in the Silicon Valley technology venture capital circle.

DALL-E (coming January 2021)

Midjourney (coming July 2022)

Stable Diffusion (coming August 2022)

This type of 2D generative tool can convert text prompts ( ) into artistic images in just a few seconds. With the evolution and advancement of such 2D AIGC tools, the creative workflows of artists, designers and game studios are being rapidly revolutionized.

Where is the next breakthrough of AIGC? Many investors and industry veterans have given predictions - 3D data generation.

We noticed that 3D AIGC is going through a stage where 2D AIGC has been developed. In this article, we will discuss AIGC’s new breakthroughs in the field of 3D data in more depth, and look forward to how generative AI tools can improve the efficiency and innovation of 3D data generation.

01 Review of the rapid development of 2D AIGC

The development of 2D AIGC can be briefly summarized into the following three stages of development:

Phase 1: Smart Image Editing

As early as 2014, with the introduction of generative confrontation network (GAN, typical follow-up work StyleGAN) and variational autoencoder (VAE, typical follow-up work VQVAE, alignDRAW), AI models began to be widely used in the intelligent generation of 2D pictures and editing. Early AI models were mainly used to learn some relatively simple image distributions or perform some image editing. Common applications include: face generation, image style transfer, image super-resolution, image completion, and controllable image editing.

But early image generation/editing networks have very limited multimodal interaction with text. In addition, GAN networks are usually difficult to train, and often encounter problems such as mode collapse and instability. The generated data is usually poor in diversity, and the model capacity also determines the upper limit of the available data scale; VAE often encounters The generated image is blurry and other issues.

The second stage: the leap of Vincent’s graph model

With the breakthrough of diffusion generation (diffusion) technology, the emergence and development of large-scale multi-modal data sets (such as LAION data set) and multi-modal representation models (such as the CLIP model released by OpenAI), the field of 2D image generation will be in 2021. Significant progress has been made. The image generation model began to interact deeply with the text, and the large-scale Vincent graph model made an amazing debut.

When OpenAI releases DALL-E in early 2021, AIGC technology will really start to show great commercial potential. DALL-E can generate realistic and complex images from arbitrary text cues with a greatly improved success rate. Within a year, a large number of Vincent graph models quickly followed, including DALL-E 2 (upgraded in April 2022) and Imagen (released by Google in May 2022). Although these technologies were not yet efficient at helping art creators produce content that could be directly put into production, they have attracted public attention and stimulated the creativity and production potential of artists, designers and game studios.

Phase Three: From Amazing to Productive

With the improvement of technical details and the iteration of engineering optimization, 2D AIGC has developed rapidly. By the second half of 2022, models such as Midjourney and Stable Diffusion have become popular AIGC tools. Driven by their large-scale training datasets, the performance of AIGC techniques in real-world applications has benefited early adopters in the media, advertising, and gaming industries. In addition, the emergence and development of large model fine-tuning technologies (such as ControlNet and LoRA) also enable people to “customize” adjust and expand AI large models according to their actual needs and a small amount of training data, so as to better adapt to different specific applications ( Such as two-dimensional stylization, logo generation, QR code generation, etc.).

Ideation and prototyping with AIGC tools now takes hours or less in many cases, rather than the days or weeks it used to take. While most professional graphic designers still modify or recreate AI-generated sketches, it is increasingly common for personal blogs or advertisements to use AI-generated images directly.

Different effects of alignDRAW, DALL-E 2, and Midjourney text to image conversion.

In addition to text-to-image conversion, 2D AIGC continues to have more recent developments. For example, Midjourney and other startups like Runway and Phenaki are developing text-to-video capabilities. In addition, Zero-1-to-3 has proposed a method to generate corresponding pictures from different viewing angles from a single 2D image of an object.

Due to the growing demand for 3D data in the gaming and robotics industries, the current cutting-edge research on AIGC is gradually shifting to 3D data generation. We expect a similar development pattern for 3D AIGC.

3D AIGC’s “DALL-E” moment

The recent technological breakthroughs in the 3D field tell us that the “DALL-E” moment of 3D AIGC is coming!

From DreamFields at the end of 2021 to DreamFusion and Magic3D in the second half of 2022, and then to ProlificDreamer in May this year, thanks to the development of multimodal domains and Vincent graph models, many breakthroughs have been made in the 3D model of Wensheng in the academic world. Several methods are capable of generating high-quality 3D models from input text.

However, most of these early explorations need to optimize a 3D representation from scratch when generating each 3D model, so that the 2D perspectives corresponding to the 3D representation meet the expectations of the input and prior models. Since such optimizations typically require tens of thousands of iterations, they are often time-consuming. For example, generating a single 3D mesh model can take up to 40 minutes in Magic3D and hours in ProlificDreamer. In addition, one of the great challenges of 3D generation is that the 3D model must have the consistency of the shape of the object from different angles. The existing 3D AIGC methods often encounter the Janus Problem, that is, the 3D objects generated by AI have multiple heads or multiple faces.

Janus issue due to lack of 3D shape consistency in ProlificDreamer. On the left is a frontal view of a seemingly normal bluebird. On the right is a confusing image depicting a bird with two faces.

But on the other hand, some teams are trying to break through the existing optimization-based generation paradigm, and generate 3D models through a single forward prediction technical route, which greatly improves the speed and accuracy of 3D generation. These methods include Point-E and Shap-E (released by OpenAI in 2022 and 2023, respectively) and One-2–3–45 (released by UC San Diego in 2023). Of particular note is the One-2–3–45, released in the last month, capable of generating a high quality and consistent 3D mesh from a 2D image in just 45 seconds!

A Comparative Analysis of Single Image to 3D Mesh Methods. From left to right, we can observe that the processing time has dropped dramatically from over an hour to less than a minute. The Point-E, Shap-E, and One-2–3–45 all excel in speed and accuracy.

These latest technological breakthroughs in the field of 3D AIGC not only greatly improve the generation speed and quality, but also make user input more flexible. Users can either input through text prompts, or generate the desired 3D model through a single 2D image with more information. This greatly expands the possibilities of 3D AIGC in terms of commercial applications.

AI revolutionizes 3D production process

First, let us understand the workflow that traditional 3D designers need to go through to create 3D models:

  1. Concept sketches: Concept art designers brainstorm and ideate the required mockups based on client input and visual references.

  2. 3D Prototyping: Model designers use professional software to create the basic shape of the model and iterate based on customer feedback.

  3. Model refinement: Add detail, color, texture and animation properties (such as rigging, lighting, etc.) to the rough 3D model.

  4. Model finalization: Designers use image editing software to enhance the final rendering, adjust colors, add effects, or perform element synthesis.

This process usually takes a few weeks, possibly even longer if animation is involved. However, each of these steps could potentially be made faster with the help of AI.

  1. A powerful multi-view image generator (e.g., Zero-1–to–3 based on Stable Diffusion and Midjourney) facilitates creative brainstorming and generates multi-view image sketches.

  2. Text-to-3D or image-to-3D technologies (for example, One-2–3–45 or Shap-E) can generate multiple 3D prototypes in minutes, providing designers with a wide range of options.

  3. Using 3D model optimization (e.g., Magic 3D or ProlificDreamer), selected prototypes can be automatically refined within hours.

  4. Once the refined model is ready, the 3D designer can further design and complete the high-fidelity model.

A Comparison of Traditional and AI-Driven 3D Production Workflows

**Will 3D AIGC replace humans? **

Our conclusion is that not yet. People are still an indispensable link in the 3D AIGC link.

Although the 3D model generation technology mentioned above can have many applications in robotics, autonomous driving and 3D games, the current production process still cannot meet a wide range of applications.

To this end, Silicon Rabbit Jun interviewed Professor Su Hao from the University of California, San Diego. He is a leading expert in the fields of 3D Deep Learning and Embodied AI. One of the authors of the –3–45 model. Professor Su Hao believes that the main bottleneck of the current 3D generation model is the lack of a large number of high-quality 3D data sets. Currently commonly used 3D data sets such as ShapeNet (about 52K 3D grids) or Objaverse (about 800K 3D models) contain models that need to be improved in terms of quantity and detail quality. Especially compared to large datasets in the 2D domain (e.g., LAION-5B), their data volume is still far from enough to train large 3D models.

Professor Su Hao once studied under Professor Leonidas Guibas, a pioneer of geometric computing and a member of the American Academy of Sciences, and participated in the ImageNet project led by Professor Feifei Li as an early contributor. Inspired by them, Professor Su Hao emphasized the key role of extensive 3D datasets in advancing technology, and laid the groundwork for the emergence and prosperity of the field of 3D deep learning.

In addition, 3D models are far more complex than 2D images, for example:

  1. Part structure: Games or digital twin applications require structured parts of 3D objects (e.g., PartNet), rather than a single 3D mesh;

  2. Joints and bindings: key properties for interacting with 3D objects;

  3. Texture and material: such as reflectance, surface friction coefficient, density distribution, Young’s modulus and other key properties that support interaction;

  4. Operation and manipulation: Allow designers to interact and manipulate 3D models more effectively.

And the above points are where human expertise can continue to play an important role.

Professor Su Hao believes that in the future, AI-driven 3D data generation should have the following characteristics:

  1. Support the generation of 3D models that support interactive applications. This interaction includes both the physical interaction between objects (such as collisions) and the interaction between people and objects (physical and non-physical interaction methods), making 3D data in the game , metaverse, physical simulation and other scenarios can be widely used;

  2. Support AI-assisted 3D content generation, making modeling more efficient;

  3. Support the creation process of Human-in-the-loop, and use human artistic talent to improve the quality of generated data, thereby further improving modeling performance and forming a closed-loop data flywheel effect.

Similar to the amazing development of technologies like DALL-E and ChatGPT in the past 18 months, we firmly believe that in the field of 3D AIGC, its innovation and application are very likely to exceed our expectations, and Silicon Rabbit will continue to deepen Exploration and output.

View Original
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Comment
0/400
No comments