OpenAI opens up AI text-to-image generation to businesses with launch of DALL-E API

Technology

OpenAI is making its image generation software DALL-E much more widely available to businesses with the launch of an API in public beta. The API will make it easier for companies to add DALL-E’s text-to-image functionality to their products, giving developers simplified tools to integrate and customize the software to their liking.

An early use case for the API is Microsoft’s Designer app, which uses the software to generate imagery for Office users, from PowerPoint slides to illustrations for homework. Microsoft is one of OpenAI’s major investors and unveiled the app last month.

Luke Miller, a product manager at OpenAI working on the API, told The Verge that the company was excited to see the new applications developers would find for DALL-E.

“We already have a few customers building on this in very interesting ways,” said Miller. “Some are creative explorations, some are more business oriented.” Miller gave the example of a startup called Mixtiles that is using the API to generate posters and art for home decoration and another called CALA that is using it to help customers design their own clothing. “It’s always inspiring to see the creative ideas people come up with,” he said.

Interest and adoption of text-to-image AI have exploded over the last year, and OpenAI — once the domain’s clear leader — has been challenged by newcomers like Midjourney and Stability AI. These organizations have put fewer restrictions on users, allowing them to build on their AI systems with little oversight. Meanwhile, other players in this space, like Google and Meta, have taken a far more cautious approach: developing systems with similar capabilities but restricting their public use to very limited scenarios.

As well as the obvious creative benefits offered by text-to-image AI, there are manifold dangers. The software can be used to generate misinformation and harmful imagery like nonconsensual nudes (though OpenAI makes such uses of its software difficult through keyword filters), and there are challenging ethical questions concerning data use.

Text-to-image AI systems like DALL-E are trained on images scraped from the web, which usually include copyrighted work by photographers, artists, and designers. Many artists are angry that the resulting technology can not only be used to imitate their individual style but also that they have not been compensated for the use of their work to generate income for multibillion-dollar companies like OpenAI.

Some firms developing text-to-image applications are beginning to offer compensation. Shutterstock, for example — which licensed its contributor data to OpenAI to create DALL-E and which is using its API to generate custom stock imagery — announced recently that it is setting up a Contributors Fund to reimburse individuals whose work is used to train AI.

When asked if OpenAI was planning to institute any similar schemes to compensate artists, Miller said the company had nothing concrete in the works. “I don’t have anything specific to share on this right now,” said Miller. “Obviously it’s something we’re continuing to seek feedback from the community on. It’s a very complicated question to think about from a lot of different perspectives. We want to learn from the community and what they value.”

With OpenAI, the question is even more difficult to answer, as the company has never shared what training data was used to create DALL-E (beyond the licensing of imagery from Shutterstock). Legal experts suggest that training AI models by scraping public images — even copyrighted ones — will likely be covered by fair use doctrine in the US. But as many artists have noted, adequate legal cover is not the same as ethical endorsement.

OpenAI says access to the DALL-E API will be rate-limited to begin with as the company spins up its systems and that it will not be vetting customers in how they use the technology. (Again, DALL-E’s filters do limit the creation of certain images containing nudity, gore, and politically sensitive material.) Customers will be charged per image generated and will be able to choose between three resolution tiers. 256 x 256 images will cost $0.016 apiece; 512 x 512 images will be $0.018 apiece; and 1024 x 1024 images will be $0.02 apiece.