What is GenAI?
Generative AI (GenAI) empowers end-users to generate content, such as images and text, quickly and easily. Entrepreneurs are taking advantage of this technology to create a growing number of startups that utilize GenAI models for various aspects of content creation. In the coming year, we can expect to see a proliferation of new products that build on GenAI models like titans GPT-3 and Stable Diffusion. The GenAI renaissance is just beginning and the recent boom in niche end-user applications for this technology is just the tip of the iceberg. These models will serve as the foundation for many future applications ushering in a new GenAI-economy replete with add-ons to existing software and entirely new offerings for end-users. With GenAI, the possibilities for content creation are endless and entrepreneurs are poised to capitalize on this powerful technology to revolutionize the way we create and consume media.
Why Now?
As with many things, the fanfare around GenAI is due to a confluence of factors that converged in the right place at the right time. The technological infrastructure for training these models has advanced to the point where it is now possible to build robust pipelines with powerful GPUs that can process predictions quickly enough to meet the demands of millions of end-users. The availability of skilled professionals with expertise in machine learning has also grown, allowing teams of many engineers from academia and industry to collaborate and fine-tune models like Stable Diffusion. Additionally, a vast amount of data scraped from the internet has been used to train these large GenAI models, providing a diverse set of examples to cover various edge cases. This combination of factors has led to GenAI becoming a powerful and accessible tool (across multiple modalities) for anyone to use, with the potential to infiltrate nearly every industry. The release of GPT-3 two years ago was a major milestone, but the ChatGPT model, which was released in late November 2022, has quickly captured public attention due to its user-friendly chat interface and free availability (at least for now). With ChatGPT’s release we are now witnessing one of the biggest hype cycles in the history of AI.The Future of Content Creation: How Generative AI is Changing the Game
Here are my five predictions for GenAI in 2023:Prediction #1: The next push in GenAI will make models smaller
A powerful GenAI model, Stable Diffusion, that creates images from text was trained on a sizable (about 100,000 GB) dataset of image-text pairs containing 2 billion images. As a consequence, a 2GB model file that can generate any image was created. This is a remarkable accomplishment, and given how quickly infrastructure efficiency and model architecture are developing, it is possible that this model file will one day become small enough that it can run smoothly on a smartphone. The ability to generate content by merging images with audio and text is an exciting prospect that opens the door to a wide range of possibilities. While we may not see this running on your iPhone 11 in 2023, the field of GenAI is moving at a breakneck speed, it is likely that in the near future, it will no longer be necessary to train GenAI models from scratch. Instead, fine-tuning and few-shot learning will become the norm, allowing users to quickly and easily customize models to their specific needs. Training GenAI models from scratch is prohibitively difficult because of the cost of the GPUs and talent required for training and validating with massive datasets. Unless an open-source, decentralized compute resource becomes more freely available, this challenge will likely increase, where a substantial number of organizations will choose to fine-tune GenAI models instead and even outsource the compute and technical support too for training and even re-training these models. In a few years, GenAI for text, images, video and audio will reach steady state, and many will simply use those steady state GenAI models instead of creating their own from scratch. The Stable Diffusion model boasts 890 million parameters, significantly fewer than previously released Large Language Models (LLMs) such as OpenAI’s GPT-3, which has 175 billion parameters. While LLMs typically train only on text data, which has a denser encoding than images, Stable Diffusion combines text with images, making it an interesting case to consider when evaluating the relationship between data type, model size and performance. It’s possible that LLMs will continue to trend towards higher numbers of parameters until an optimal size is reached and/or optimally augmented, multimedia data is found and used. However, it is not clear if bigger models (with more parameters) always imply better performance as there is evidence from Hoffman et al. that LLM performance can improve over time. For example, a 67 billion parameter model trained for 5 epochs can outperform a 180 billion parameter model. Additionally, research indicates that it may be possible to create more efficient LLMs by priming unnecessary model parameters and creating smaller models that can run on low energy devices. As GenAI models are pushed to the edge, they will need to become smaller and more specific to a task or personalized to their owner.Prediction #2: Data quality will become even more important
Many of the GenAI models’ training data consist of low quality scrapes from the internet and we will see higher quality data used for model training instead. Data that is specifically acquired for the GenAI model’s purpose will create a powerful new sub-market offering new business lines and expansion opportunities for data brokers and alternative data companies. In addition, we will see more open source datasets being shared and used. With potential copyright concerns, some authors will be able to opt out of having their content included, while others will opt-in for more exposure. Communities will support each other and actively contribute to building better datasets, which will lead to more performant and smaller GenAI models.Prediction #3: Coding Interviews will adapt
The advent of GenAI for code creation has the potential to revolutionize the way software engineers and machine learning (ML) professionals are interviewed and evaluated. For instance, GPT-3 has the capability to write code faster and potentially even better than humans, which raises questions about the current coding interview process. While it is likely that testing on computer science fundamentals will continue to be an important part of the evaluation process, going forward, employers may also find it valuable to assess candidates on their ability to work effectively with AI tools and technologies. As developers have access to tools such as GitHub’s CoPilot or Replit’s Ghostwriter, which have been reported to boost productivity, the ability to collaborate with AI will become increasingly important. Employers may need to adapt their interview and evaluation processes to identify candidates who can work well with these technologies, and who have the ability to leverage them to drive innovation and improve efficiency.Prediction #4: Increased Focus on Creativity
Advancements in content creation due to GenAI have led some to worry that jobs in the creative industries may be automated. While it is true that GenAI will have an impact on many roles, I believe that the ability to adapt and work creatively with these tools become increasingly valuable. As GenAI tools make it easier to design and prototype custom content, such as cars, rooms, buildings, movies, and advertisements, the ability to infuse creativity into the output of these tools will be crucial for creating truly unique and engaging content. Companies will also likely fine-tune these models based on their back catalog to generate new concepts that are consistent with their brand. Instead of replacing creative jobs, GenAI will augment and enhance the work of creative professionals, giving them new tools to create and iterate on ideas faster and more efficiently. Eventually, long-form multimedia content that is tailored to your personal taste will be quickly produced. Business assets will also be transformed into interactive 3D experiences, with VR becoming more prevalent. In the future, we could have music that is automatically generated based on your mood to increase productivity. In this new landscape, being able to effectively work with GenAI tools while infusing creativity into their output will be a highly sought-after skill.Prediction #5: More Software will adopt GenAI
Several popular software products already incorporate GenAI and many more will follow suit to enhance the user experience and stay competitive. Products like Jasper, Notion, and copy.ai are already using LLMs to improve communication and user engagement. Successful companies will focus on niche markets and integrate multimedia content with easy-to-use interfaces. GenAI will improve collaboration and the creation process for products like search engines, word processors, and design studios. The user interface will be key to attracting new users and keeping loyal users.Limits & Ethics
While several measures have been taken to prevent misuse of GenAI like ChatGPT, it is still possible for GenAI models to misinterpret a prompt in a manner that results in harm to people. There have been many examples of people ‘hacking’ ChatGPT to circumvent its protective measures and to align its results with what is ‘expected’ and ethical. For instance, nudity and vulgarity have been removed from the training data of some of the GenAI models. There is concern about how these models may be used to harm others or otherwise ethically misuse the generated results and this concern is justified. ChatGPT has shared some erroneous results, which has led to Stack Overflow banning its use on their platform. But what causes ChatGPT to be wrong? It is a statistical or probabilistic model, i.e. not deterministic, meaning that it takes the prompt and makes its best guess on the next sequence of characters to deliver to the user. Sometimes, it has seen that prompt before and its answer and in other cases, it matches a pattern that may or may not be accurate. GenAI models may also share false information based on the bias of the training data and/or beliefs of those who trained the models. For model ethics, there are differing beliefs on how to best handle GenAI’s open-sourcing but, this may be good for the general public. We should debate and regulate these AI products as needed swiftly. Ideally, GenAI developers would anticipate bad actors and proactively propose regulation while striking a balance with promoting innovation on their models. Like all technology, folks have found ways to skirt the ethical constraints placed on GenAI models, however, I am hopeful that local lawmakers will step in to punish the bad actors without severely limiting other users. Although there have always been bad actors (with or without the help of technology), there are more good people who stand to benefit from this incredible technology.We’re still pretty early in this AI hype cycle
While the maturity of AI infrastructure has culminated in this current segment of the hype cycle, there is still a long way to go before the world has equal and unprecedented access to GPU compute that enables unlimited, easier, cheaper model training and inference. AI is a billion dollar industry and will continue to exponentially grow. It previously cost StabilityAI (creator of Stable Diffusion) about $600k USD to build the Stable Diffusion model from scratch due to labor and infrastructure. This big bet seems to have paid off for StabilityAI, with it currently valued at over $1B USD, based on the potential spin-off models and enterprise services around AI that it stands to provide such as tuning, serving and combining models. We could see more niche subscription services offered to the public based on GenAI as well as more support services to maintain, retrain and monitor GenAI models. Some AI-based services will move away from API-only access to more engaging user-interface with the model, while others will monetize via ads. Either way, I look forward to the joy and productivity that personalized and pocket-sized GenAI will bring in creating the AI economy. Sources- Invest Like the Best Podcast with Amjad Masad
- Stable Diffusion & Generative AI with Emad Mostaque
- Weights & Biases Podcast with Emad Mostaque
- Hoffman et al., Training Compute Optimal Large Language Models (2022). Arxiv
A Machine Learning Engineer’s Top 5 Predictions for the Future of Generative AI