Stable Diffusion is a deep learning, text-to-image model released in 2022 and among the trending AI tools. It can be used to generate images based on text descriptions among other use cases. Unlike other models with similar features, Stability AI made the source publically available, accelerating its adoption. Beyond current copyright infringement lawsuits against Stability AI, Midjourney, and Devian Art, deep generative neural networks are becoming more and more powerful and find widespread adoption.
In December 2022, Apple released optimizations to Core ML to use Stable Diffusion with macOS 13.1 and iOS 16.2. The release contained a Python package for converting Stable Diffusion models from PyTorch to Core ML with Apple's coremltools, and a Swift package to deploy the models.
The Swift package can be added to Xcode projects as a dependency to empower apps with the powerful image generation capabilities. In this article, we will focus on image generation from text prompts using Swift and how this can be applied in a native app.
As a prerequisite, ensure Xcode 14.2. and the Command Line Tools for Xcode 14.2 or newer are installed. The mcomand Line Tools are available in the Downloads section of the Apple Developer Platform.
Using Core ML Models from Hugging Face Hub
While you can also convert any derivative Stable Diffusion model to Core ML, there are some pre-made models publicly available on Hugging Face Hub, such as Stable Diffusion v1-4, Stable Diffusion v1-5, and Stable Diffusion v2. Any of those can be downloaded and used to generate images with Python or Swift. As the models are quite large in size, you have to use the git Larg File Storage (git lfs) extension. It stores large files outside the main git repo, and downloads them from the appropriate server the repository is cloned.
brew install git-lfs
Once the installation is completed, you can set up git lfs for our user account with the following command.
git lfs install
Then you can use git clone to the Stable Diffusion repository that includes all model variants. As an example, you can clone Stable Diffusion version 1.4.
git clone https://huggingface.co/apple/coreml-stable-diffusion-v1-4
Now, you can also clone Apple's Core ML Stable Diffusion package to get started using the downloaded model. For this, use the following command.
git clone https://github.com/apple/ml-stable-diffusion.git
The repository contains two packages, one python_coreml_stable_diffusion Python package for converting PyTorch models to Core ML formats and a StableDiffusion Swift package that can be used in Xcode projects.
Image Generation with Swift and the CLI
The Swift package contains a
StableDiffusion library as well as a
StableDiffusionSample executable that allows you to generate images with Swift via the CLI (command line interface) in the Terminal. To test the capabilities with the previously downloaded pre-converted Core ML models, you can use the following command:
swift run StableDiffusionSample "prompt to use for the image generation" --resource-path <output-mlpackages-directory> --seed 93 --output-path </path/to/output>
Note that beyond the prompt to inform the image generation process, there is a parameter to the --resource-path as well as for the --output-path. The resource path should be a folder containing the Core ML models and tokenization resources with at minimum following files:
TextEncoder.mlmodelc(text embedding model)
UnetChunk2.mlmodelc(denoising autoencoder model)
VAEDecoder.mlmodelc(image decoder model)
vocab.json(tokenizer vocabulary file)
merges.text(merges for byte pair encoding file)
The are some additional optional files for image2image or safety checker models that may also be included. Check out the package description for more details.
In our case, the previously cloned Stable Diffusion version 1.4 contains all the needed files in its original subfolder. There are also models optimized for low memory devices, such as iPhones etc., in the split_einsum subfolder. On Macs with M1 or M2 processors, the model files on the original subfolder can be used. However, models in the split_einsum subfolder will result in faster and more energy-efficient performance. In any case, with the Core ML model and a specified output path, you can run the command in Terminal to try Stable Diffusion.
A common reason for this may the usage of the Xcodes app to manage different versions of Xcode on the Mac, which adds version numbers to the Xcode versions in the Applications folder. So it may have to be set like this sudo xcode-select --switch /Applications/Xcode-14.2.0.app/Contents/Developer.
The first time the command is executed, the dependencies of the StableDiffusion Swift package have to be resolved, so it may take a few minutes. Then, the process is quite straight forward and the generated image will be stored at the output path using the prompt and random seed as a file name. In this example, the prompt of "a photo of a child looking at the stars" will result in a filename like
Now, let's have a closer look at how to use the
StableDiffusion library inside an Xcode project to empower Mac or iOS apps with the capability to generate images from text prompts.
Subscribe to become a free member or log in for full access.