Since late 2022, top venture capital firms in Silicon Valley have increasingly turned their attention to AI startups, with generative AI art particularly booming. Stability AI and Jasper have successively completed over $100 million in funding each, with valuations surpassing the billion-dollar mark, successfully entering the unicorn club. Behind this wave of funding is the deep logic of AIGC (AI-Generated Content) as a new paradigm shift.
AIGC is not only a product of technological advancement but also a revolution in content production methods. With the advent of the Web3 era, the integration of artificial intelligence, related data, and semantic networks has created a new connection between humans and the internet, leading to explosive growth in content consumption demand. Traditional PGC (Professionally Generated Content) and UGC (User-Generated Content) can no longer meet this expanding need, making AIGC a new productivity tool in the Web3 era, providing solutions for large-scale metaverse content generation.
The Rise of AIGC: From Edge to Mainstream
From the perspective of technological progress and commercial application, there are three core reasons why AIGC has attracted so much capital in a short period: first, breakthroughs in underlying algorithms and hardware; second, rapid maturity of applications across multiple verticals; third, the track itself is still in early stages—despite some value being captured by large tech companies, startups still have opportunities to break through.
At the application level, AIGC has demonstrated potential in multiple directions. In text generation, Jasper helps users create Instagram titles, TikTok scripts, ad copy, and email content through AI writing features. As of the report release, Jasper has over 70,000 clients, including industry giants like Airbnb and IBM, with annual revenue reaching $40 million in 2022 alone.
In image generation, breakthroughs have been achieved through diffusion models. The release of Stable Diffusion ushered in an explosion in AI painting. Media platforms are adopting AI-generated images at scale, reducing production costs and avoiding copyright risks. OpenAI has also partnered deeply with Shutterstock, the world’s largest copyright image library, with images generated via DALL-E becoming a new choice for commercial applications.
Video, audio, and code generation also show broad application prospects. Google’s Phenaki model can generate long videos based on text within two minutes; virtual humans combined with AIGC-generated speech can achieve automatic broadcasting and role-playing; GitHub Copilot has become a coding assistant for developers. The maturity of these applications marks AIGC’s evolution from peripheral tools to mainstream productivity tools.
The Technical Foundation of AIGC: Natural Language Processing and Generation Algorithms
Understanding how AIGC works requires a deep dive into its two core technological pillars: Natural Language Processing (NLP) and generative algorithms.
Evolution of NLP
NLP is fundamental to enabling humans and computers to interact via natural language. This technology combines linguistics, computer science, and mathematics, allowing computers to understand, extract information, translate, and process content in natural language. Since NLP’s development, its core tasks have been divided into two directions:
Natural Language Understanding (NLU): aims to give computers human-like language comprehension. Unlike earlier systems limited to structured data, NLU enables computers to recognize and extract implicit intentions in language, achieving true understanding. However, due to the diversity, ambiguity, and context dependence of natural language, current computers still lag behind humans in understanding.
Natural Language Generation (NLG): converts non-linguistic data into human-understandable language. From simple data merging to template-driven and advanced NLG, this field now allows systems to understand intent, consider context, and produce natural, fluent language outputs.
The core breakthrough in NLP came with Google’s Transformer model in 2017. This architecture uses self-attention mechanisms, assigning different weights to parts of input data based on importance. Compared to traditional RNNs, Transformers can process all input data simultaneously, greatly improving parallel processing efficiency. This technological maturity has led to the development of large pre-trained models like BERT and GPT, providing a solid linguistic foundation for AIGC.
Two Main Generative Algorithm Paradigms
In generative algorithms, the two most mainstream approaches are Generative Adversarial Networks (GANs) and diffusion models.
GAN, proposed by Ian J. Goodfellow in 2014, involves a generator and discriminator competing against each other. The generator tries to produce “fake” data to fool the discriminator, which aims to distinguish real from fake. Through this adversarial training, both networks improve until the discriminator can no longer tell fake from real. GANs are widely used in advertising, gaming, and entertainment—for creating fictional characters, face morphing, and style transfer.
However, GANs face issues like unstable training and mode collapse. The generator and discriminator need careful synchronization; in practice, the discriminator often converges faster, causing the generator to diverge. Sometimes, the generator gets stuck producing similar samples, unable to learn further—known as mode collapse.
Diffusion models, by contrast, offer a more human-like generative logic and are a key driver of AIGC’s rapid development. They work by gradually adding Gaussian noise to training data, then learning to reverse this process to recover the original data. After training, the model can generate new data by adding learned denoising steps to random noise.
For example, with DALL-E, a user inputs a text description, which is first encoded by a text encoder (like OpenAI’s CLIP) into a semantic space; then, a “prior” model maps this text encoding to an image encoding, capturing semantic information; finally, an image decoder generates a visual representation, completing the image creation. This process resembles human imagination—starting with a basic concept and gradually adding details and semantic layers.
Compared to GANs, diffusion models have three major advantages: higher image quality, no adversarial training leading to higher efficiency, and better scalability and parallelism. These features have made diffusion models the next-generation standard for image generation.
Commercialization Path of AIGC: From Assistant to Creator
From the perspective of application maturity, AIGC has demonstrated clear business models across text, image, audio, gaming, and code generation. Particularly in tasks with high repetition and moderate accuracy requirements, AIGC applications are relatively mature and actively exploring monetization. These service providers typically adopt SaaS subscription models.
Text Creation SaaS Model
Jasper exemplifies AI-driven text generation. Founded less than two years ago, this platform enables individuals and teams to create commercial content using AI. Users input descriptions and requirements, and the system automatically fetches data and generates content based on instructions. For example, when an author inputs “Write an article about AIGC, including definition, history, applications, current status, and future trends,” Jasper produces a coherent, well-structured article with examples within seconds. The platform offers hundreds of templates for flexible use.
In terms of business performance, Jasper has achieved impressive results. It recently raised $125 million, with a valuation of $1.5 billion. Its client base exceeds 70,000, including enterprise clients like Airbnb and IBM. Revenue growth is remarkable—$40 million in 2022, with an estimated full-year revenue of $90 million.
Large-Scale Image Creation
MidJourney simplifies operation interfaces, allowing even users with no art background to create artworks via text prompts. The backend interprets semantics through NLP, translates into computer language, and combines proprietary datasets to generate new works. Such AI-created works are considered AI copyright, leading to widespread use in news media and social platforms, reducing costs and avoiding copyright disputes. Some data library bloggers are already creating content with AIGC and monetizing through their social media channels.
Video, Audio, and Other Vertical Fields
Google’s Phenaki demonstrates the potential for video generation, capable of producing logically coherent long videos from text in a very short time. When combined with virtual human technology, AIGC-generated speech and expressions become more realistic and natural—significantly improving efficiency and diversity compared to single virtual presenters.
In audio, AIGC is already widely used in daily applications. Navigation apps can switch between different celebrity voices; Gaode Maps users can record personalized voice packs. Deeper applications are seen in virtual humans, where AIGC can generate speech and content, enabling virtual characters to express ideas like real humans.
In game development, AIGC can be used for scene building, story creation, and NPC generation, greatly improving development efficiency. Players can also create virtual characters for in-game activities via AIGC platforms. GitHub Copilot provides code suggestions, trained on billions of lines of public code.
Core Investment Framework of AIGC: Software, Hardware, and Data Ecosystem
From an investment perspective, the success of AIGC depends on three layers: software (algorithms and models), hardware (computing power), and data ecosystem (training datasets).
Software Layer: Technological Accumulation
This includes NLP technologies and AIGC generation models. Companies like Google, Microsoft, iFlytek, Tuoris, and others have technological advantages in NLP. In models and datasets, Nvidia, Meta, Baidu, BlueFocus, Visual China, and Kunlun Wanwei lead. These companies have accumulated large-scale training data and optimized algorithms, forming a technological moat.
Hardware and Computing Power
Power is paramount in AIGC. For example, Stable Diffusion currently relies on a cluster of 4,000 Nvidia A100 GPUs, with operational costs exceeding $50 million. This underscores that large-scale hardware investment is the foundation of AIGC development. Participants include companies like Lanqi Technology, ZTE, EasyE, Tianfutong, Baoxin Software, and Zhongji Xuchuang. With export restrictions on high-end Nvidia chips, domestic chipmakers will gain incremental opportunities.
Quality of Datasets Sets the Upper Limit
OpenAI’s CLIP model was trained on 400 million high-quality English image-text pairs, demonstrating the decisive role of massive high-quality data. However, replicating this success is extremely difficult—overseas teams used around 2 billion image-text pairs to approximate CLIP’s performance. This indicates that acquiring, cleaning, and annotating datasets is costly, and data quality, compliance, and diversity directly impact the quality of AIGC outputs.
Technical Bottlenecks and Breakthrough Directions
Although AIGC has achieved initial commercial scale, there are still significant technical shortcomings. Current generated content often lacks the detail precision required for high-end commercial applications.
Root Causes of Precision Issues
In image generation, AIGC performs well in anime or abstract images but struggles with specific details. For example, generating “a beautiful woman with a ragdoll cat” may result in inaccuracies—such as the system depicting a woman with a cat face. The fundamental reason lies in insufficient understanding and processing of natural semantics, especially spatial and quantitative relationships.
Language and Localization Challenges
The imbalance in text encoders exacerbates this problem. The main CLIP model by OpenAI, trained on 400 million English image-text pairs, is open-source in code but trained on a closed dataset. This makes it difficult for non-English regions to access high-quality, large-scale text-image pairs, requiring additional translation steps. Translation involves semantic understanding, cultural nuances, and language habits, making precise translation challenging and posing a major obstacle for localized AIGC applications.
Impact of Algorithm and Dataset Variations
Different platforms use different algorithms and datasets, leading to large output quality disparities for the same input. Data quality, compliance, and stylistic tendencies directly influence generation results.
Future Pillars: Large Models, Big Data, and Massive Computing Power
Looking ahead, the core development directions of AIGC are concentrated in three areas: large-scale pre-trained models, extensive data accumulation, and massive computing investments. These are essential for AIGC to evolve from a “helper” role to an “independent creator.”
Yin Hang Huang summarized AIGC’s three-stage development path: first, the “assistant stage,” where AIGC aids human creation; second, the “collaborative stage,” with virtual humans coexisting with humans; third, the “original creation stage,” where AIGC independently produces content. Over the next decade, AIGC is expected to generate original content at one-tenth of current costs and at 100 to 1,000 times the current production speed, fundamentally transforming existing content creation models.
To realize this vision, specialized vertical applications will become a focus. Compared to general large models, vertical applications can be trained more precisely for specific functions, with lower costs and better results. Meanwhile, before establishing a robust intellectual property and ethical framework for AIGC, acquiring high-quality, compliant datasets remains a strategic priority.
Clear Investment Roadmap
From a macro perspective, concepts like blockchain, metaverse, and Web3 depict grand application scenarios for the digital economy. Virtual humans, NFTs, and other recent capital hotspots are just specific manifestations. As a key productivity tool driving the upgrade from Web2 to Web3, AIGC will have a disruptive impact on existing killer applications like short videos and gaming. Under the Web3 ethos of open co-creation, UGC and AIGC content will become more attractive, with waves of secondary creation and open imagination approaching.
In terms of investment strategy, opportunities exist across software, hardware, and data ecosystems:
Software Innovation: NLP companies, vertical AIGC applications, large model training firms
Hardware Support: Domestic chipmakers and GPU cluster service providers gaining incremental opportunities amid Nvidia chip restrictions
Data Ecosystem: Providers of high-quality data collection, cleaning, and annotation services will become scarce resources
Currently, AIGC has become the hottest startup trend in Silicon Valley, with increasing attention from domestic primary markets and internet giants. This marks AIGC’s transition from technological research to large-scale application.
Risk Alerts and Key Observations
Technical Risks: AIGC’s development may fall short of expectations; innovations in underlying hardware (supercomputers, chips) could slow down.
Policy Risks: As AIGC is still in early stages, future regulations on AI-generated content, intellectual property, copyrights, or other legal frameworks may be introduced, directly impacting industry development.
Competitive Risks: The entry of large tech companies could accelerate industry consolidation, putting pressure on startups’ survival.
Overall, the value of AIGC lies in its fundamental transformation of content creation. On the demand side, the Web3 era’s need for diverse and abundant content is surging; on the supply side, AIGC offers unprecedented efficiency. This perfect match of supply and demand presents a golden window for rapid AIGC growth and industry-wide transformation.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
AIGC Innovation in Content Production: How Web3 Era Productivity Tools Are Changing Industry Landscapes
Since late 2022, top venture capital firms in Silicon Valley have increasingly turned their attention to AI startups, with generative AI art particularly booming. Stability AI and Jasper have successively completed over $100 million in funding each, with valuations surpassing the billion-dollar mark, successfully entering the unicorn club. Behind this wave of funding is the deep logic of AIGC (AI-Generated Content) as a new paradigm shift.
AIGC is not only a product of technological advancement but also a revolution in content production methods. With the advent of the Web3 era, the integration of artificial intelligence, related data, and semantic networks has created a new connection between humans and the internet, leading to explosive growth in content consumption demand. Traditional PGC (Professionally Generated Content) and UGC (User-Generated Content) can no longer meet this expanding need, making AIGC a new productivity tool in the Web3 era, providing solutions for large-scale metaverse content generation.
The Rise of AIGC: From Edge to Mainstream
From the perspective of technological progress and commercial application, there are three core reasons why AIGC has attracted so much capital in a short period: first, breakthroughs in underlying algorithms and hardware; second, rapid maturity of applications across multiple verticals; third, the track itself is still in early stages—despite some value being captured by large tech companies, startups still have opportunities to break through.
At the application level, AIGC has demonstrated potential in multiple directions. In text generation, Jasper helps users create Instagram titles, TikTok scripts, ad copy, and email content through AI writing features. As of the report release, Jasper has over 70,000 clients, including industry giants like Airbnb and IBM, with annual revenue reaching $40 million in 2022 alone.
In image generation, breakthroughs have been achieved through diffusion models. The release of Stable Diffusion ushered in an explosion in AI painting. Media platforms are adopting AI-generated images at scale, reducing production costs and avoiding copyright risks. OpenAI has also partnered deeply with Shutterstock, the world’s largest copyright image library, with images generated via DALL-E becoming a new choice for commercial applications.
Video, audio, and code generation also show broad application prospects. Google’s Phenaki model can generate long videos based on text within two minutes; virtual humans combined with AIGC-generated speech can achieve automatic broadcasting and role-playing; GitHub Copilot has become a coding assistant for developers. The maturity of these applications marks AIGC’s evolution from peripheral tools to mainstream productivity tools.
The Technical Foundation of AIGC: Natural Language Processing and Generation Algorithms
Understanding how AIGC works requires a deep dive into its two core technological pillars: Natural Language Processing (NLP) and generative algorithms.
Evolution of NLP
NLP is fundamental to enabling humans and computers to interact via natural language. This technology combines linguistics, computer science, and mathematics, allowing computers to understand, extract information, translate, and process content in natural language. Since NLP’s development, its core tasks have been divided into two directions:
Natural Language Understanding (NLU): aims to give computers human-like language comprehension. Unlike earlier systems limited to structured data, NLU enables computers to recognize and extract implicit intentions in language, achieving true understanding. However, due to the diversity, ambiguity, and context dependence of natural language, current computers still lag behind humans in understanding.
Natural Language Generation (NLG): converts non-linguistic data into human-understandable language. From simple data merging to template-driven and advanced NLG, this field now allows systems to understand intent, consider context, and produce natural, fluent language outputs.
The core breakthrough in NLP came with Google’s Transformer model in 2017. This architecture uses self-attention mechanisms, assigning different weights to parts of input data based on importance. Compared to traditional RNNs, Transformers can process all input data simultaneously, greatly improving parallel processing efficiency. This technological maturity has led to the development of large pre-trained models like BERT and GPT, providing a solid linguistic foundation for AIGC.
Two Main Generative Algorithm Paradigms
In generative algorithms, the two most mainstream approaches are Generative Adversarial Networks (GANs) and diffusion models.
GAN, proposed by Ian J. Goodfellow in 2014, involves a generator and discriminator competing against each other. The generator tries to produce “fake” data to fool the discriminator, which aims to distinguish real from fake. Through this adversarial training, both networks improve until the discriminator can no longer tell fake from real. GANs are widely used in advertising, gaming, and entertainment—for creating fictional characters, face morphing, and style transfer.
However, GANs face issues like unstable training and mode collapse. The generator and discriminator need careful synchronization; in practice, the discriminator often converges faster, causing the generator to diverge. Sometimes, the generator gets stuck producing similar samples, unable to learn further—known as mode collapse.
Diffusion models, by contrast, offer a more human-like generative logic and are a key driver of AIGC’s rapid development. They work by gradually adding Gaussian noise to training data, then learning to reverse this process to recover the original data. After training, the model can generate new data by adding learned denoising steps to random noise.
For example, with DALL-E, a user inputs a text description, which is first encoded by a text encoder (like OpenAI’s CLIP) into a semantic space; then, a “prior” model maps this text encoding to an image encoding, capturing semantic information; finally, an image decoder generates a visual representation, completing the image creation. This process resembles human imagination—starting with a basic concept and gradually adding details and semantic layers.
Compared to GANs, diffusion models have three major advantages: higher image quality, no adversarial training leading to higher efficiency, and better scalability and parallelism. These features have made diffusion models the next-generation standard for image generation.
Commercialization Path of AIGC: From Assistant to Creator
From the perspective of application maturity, AIGC has demonstrated clear business models across text, image, audio, gaming, and code generation. Particularly in tasks with high repetition and moderate accuracy requirements, AIGC applications are relatively mature and actively exploring monetization. These service providers typically adopt SaaS subscription models.
Text Creation SaaS Model
Jasper exemplifies AI-driven text generation. Founded less than two years ago, this platform enables individuals and teams to create commercial content using AI. Users input descriptions and requirements, and the system automatically fetches data and generates content based on instructions. For example, when an author inputs “Write an article about AIGC, including definition, history, applications, current status, and future trends,” Jasper produces a coherent, well-structured article with examples within seconds. The platform offers hundreds of templates for flexible use.
In terms of business performance, Jasper has achieved impressive results. It recently raised $125 million, with a valuation of $1.5 billion. Its client base exceeds 70,000, including enterprise clients like Airbnb and IBM. Revenue growth is remarkable—$40 million in 2022, with an estimated full-year revenue of $90 million.
Large-Scale Image Creation
MidJourney simplifies operation interfaces, allowing even users with no art background to create artworks via text prompts. The backend interprets semantics through NLP, translates into computer language, and combines proprietary datasets to generate new works. Such AI-created works are considered AI copyright, leading to widespread use in news media and social platforms, reducing costs and avoiding copyright disputes. Some data library bloggers are already creating content with AIGC and monetizing through their social media channels.
Video, Audio, and Other Vertical Fields
Google’s Phenaki demonstrates the potential for video generation, capable of producing logically coherent long videos from text in a very short time. When combined with virtual human technology, AIGC-generated speech and expressions become more realistic and natural—significantly improving efficiency and diversity compared to single virtual presenters.
In audio, AIGC is already widely used in daily applications. Navigation apps can switch between different celebrity voices; Gaode Maps users can record personalized voice packs. Deeper applications are seen in virtual humans, where AIGC can generate speech and content, enabling virtual characters to express ideas like real humans.
In game development, AIGC can be used for scene building, story creation, and NPC generation, greatly improving development efficiency. Players can also create virtual characters for in-game activities via AIGC platforms. GitHub Copilot provides code suggestions, trained on billions of lines of public code.
Core Investment Framework of AIGC: Software, Hardware, and Data Ecosystem
From an investment perspective, the success of AIGC depends on three layers: software (algorithms and models), hardware (computing power), and data ecosystem (training datasets).
Software Layer: Technological Accumulation
This includes NLP technologies and AIGC generation models. Companies like Google, Microsoft, iFlytek, Tuoris, and others have technological advantages in NLP. In models and datasets, Nvidia, Meta, Baidu, BlueFocus, Visual China, and Kunlun Wanwei lead. These companies have accumulated large-scale training data and optimized algorithms, forming a technological moat.
Hardware and Computing Power
Power is paramount in AIGC. For example, Stable Diffusion currently relies on a cluster of 4,000 Nvidia A100 GPUs, with operational costs exceeding $50 million. This underscores that large-scale hardware investment is the foundation of AIGC development. Participants include companies like Lanqi Technology, ZTE, EasyE, Tianfutong, Baoxin Software, and Zhongji Xuchuang. With export restrictions on high-end Nvidia chips, domestic chipmakers will gain incremental opportunities.
Quality of Datasets Sets the Upper Limit
OpenAI’s CLIP model was trained on 400 million high-quality English image-text pairs, demonstrating the decisive role of massive high-quality data. However, replicating this success is extremely difficult—overseas teams used around 2 billion image-text pairs to approximate CLIP’s performance. This indicates that acquiring, cleaning, and annotating datasets is costly, and data quality, compliance, and diversity directly impact the quality of AIGC outputs.
Technical Bottlenecks and Breakthrough Directions
Although AIGC has achieved initial commercial scale, there are still significant technical shortcomings. Current generated content often lacks the detail precision required for high-end commercial applications.
Root Causes of Precision Issues
In image generation, AIGC performs well in anime or abstract images but struggles with specific details. For example, generating “a beautiful woman with a ragdoll cat” may result in inaccuracies—such as the system depicting a woman with a cat face. The fundamental reason lies in insufficient understanding and processing of natural semantics, especially spatial and quantitative relationships.
Language and Localization Challenges
The imbalance in text encoders exacerbates this problem. The main CLIP model by OpenAI, trained on 400 million English image-text pairs, is open-source in code but trained on a closed dataset. This makes it difficult for non-English regions to access high-quality, large-scale text-image pairs, requiring additional translation steps. Translation involves semantic understanding, cultural nuances, and language habits, making precise translation challenging and posing a major obstacle for localized AIGC applications.
Impact of Algorithm and Dataset Variations
Different platforms use different algorithms and datasets, leading to large output quality disparities for the same input. Data quality, compliance, and stylistic tendencies directly influence generation results.
Future Pillars: Large Models, Big Data, and Massive Computing Power
Looking ahead, the core development directions of AIGC are concentrated in three areas: large-scale pre-trained models, extensive data accumulation, and massive computing investments. These are essential for AIGC to evolve from a “helper” role to an “independent creator.”
Yin Hang Huang summarized AIGC’s three-stage development path: first, the “assistant stage,” where AIGC aids human creation; second, the “collaborative stage,” with virtual humans coexisting with humans; third, the “original creation stage,” where AIGC independently produces content. Over the next decade, AIGC is expected to generate original content at one-tenth of current costs and at 100 to 1,000 times the current production speed, fundamentally transforming existing content creation models.
To realize this vision, specialized vertical applications will become a focus. Compared to general large models, vertical applications can be trained more precisely for specific functions, with lower costs and better results. Meanwhile, before establishing a robust intellectual property and ethical framework for AIGC, acquiring high-quality, compliant datasets remains a strategic priority.
Clear Investment Roadmap
From a macro perspective, concepts like blockchain, metaverse, and Web3 depict grand application scenarios for the digital economy. Virtual humans, NFTs, and other recent capital hotspots are just specific manifestations. As a key productivity tool driving the upgrade from Web2 to Web3, AIGC will have a disruptive impact on existing killer applications like short videos and gaming. Under the Web3 ethos of open co-creation, UGC and AIGC content will become more attractive, with waves of secondary creation and open imagination approaching.
In terms of investment strategy, opportunities exist across software, hardware, and data ecosystems:
Currently, AIGC has become the hottest startup trend in Silicon Valley, with increasing attention from domestic primary markets and internet giants. This marks AIGC’s transition from technological research to large-scale application.
Risk Alerts and Key Observations
Technical Risks: AIGC’s development may fall short of expectations; innovations in underlying hardware (supercomputers, chips) could slow down.
Policy Risks: As AIGC is still in early stages, future regulations on AI-generated content, intellectual property, copyrights, or other legal frameworks may be introduced, directly impacting industry development.
Competitive Risks: The entry of large tech companies could accelerate industry consolidation, putting pressure on startups’ survival.
Overall, the value of AIGC lies in its fundamental transformation of content creation. On the demand side, the Web3 era’s need for diverse and abundant content is surging; on the supply side, AIGC offers unprecedented efficiency. This perfect match of supply and demand presents a golden window for rapid AIGC growth and industry-wide transformation.