Skip to content

On-device Inference of Diffusion Models for Apple Silicon

License

Notifications You must be signed in to change notification settings

argmaxinc/DiffusionKit

Repository files navigation

DiffusionKit

Run Diffusion Models on Apple Silicon with Core ML and MLX

This repository comprises

  • diffusionkit, a Python package for converting PyTorch models to Core ML format and performing image generation with MLX in Python
  • DiffusionKit, a Swift package for on-device inference of diffusion models using Core ML and MLX

Installation

The following installation steps are required for:

  • MLX inference
  • PyTorch to Core ML model conversion

Python Environment Setup

conda create -n diffusionkit python=3.11 -y
conda activate diffusionkit
cd /path/to/diffusionkit/repo
pip install -e .

Hugging Face Hub Credentials

Stable Diffusion 3 requires users to accept the terms before downloading the checkpoint. Once you accept the terms, sign in with your Hugging Face hub READ token as below:

Important

If using a fine-grained token, it is also necessary to edit permissions to allow Read access to contents of all public gated repos you can access

huggingface-cli login --token YOUR_HF_HUB_TOKEN

Converting Models from PyTorch to Core ML

Click to expand

Step 1: Follow the installation steps from the previous section

Step 2: Verify you've accepted the StabilityAI license terms and have allowed gated access on your HuggingFace token

Step 3: Prepare the denoise model (MMDiT) Core ML model files (.mlpackage)

python -m tests.torch2coreml.test_mmdit --sd3-ckpt-path stabilityai/stable-diffusion-3-medium --model-version 2b -o <output-mlpackages-directory> --latent-size {64, 128}

Step 4: Prepare the VAE Decoder Core ML model files (.mlpackage)

python -m tests.torch2coreml.test_vae --sd3-ckpt-path stabilityai/stable-diffusion-3-medium -o <output-mlpackages-directory> --latent-size {64, 128}

Note:

  • --sd3-ckpt-path can be a path any HuggingFace repo (e.g. stabilityai/stable-diffusion-3-medium) OR a path to a local sd3_medium.safetensors file

Image Generation with Python MLX

Click to expand

CLI

For simple text-to-image in float16 precision:

diffusionkit-cli --prompt "a photo of a cat" --output-path </path/to/output/image.png> --seed 0 --w16 --a16

Some notable optional arguments:

  • For image-to-image, use --image-path (path to input image) and --denoise (value between 0. and 1.)
  • T5 text embeddings, use --t5
  • For different resolutions, use --height and --width
  • For using a local checkpoint, use --local-ckpt </path/to/ckpt.safetensors> (e.g. ~/models/stable-diffusion-3-medium/sd3_medium.safetensors).

Please refer to the help menu for all available arguments: diffusionkit-cli -h.

Code

After installing the package, import it using:

from diffusionkit.mlx import DiffusionPipeline

Then, initialize the pipeline object:

pipeline = DiffusionPipeline(
  model="argmaxinc/stable-diffusion",
  w16=True,
  shift=3.0,
  use_t5=False,
  model_size="2b",
  low_memory_mode=False,
  a16=True,
)

Some notable optional arguments:

  • For T5 text embeddings, set use_t5=True
  • For using a local checkpoint, set local_ckpt=</path/to/ckpt.safetensors> (e.g. ~/models/stable-diffusion-3-medium/sd3_medium.safetensors).
  • If you want to use the pipeline object more than once, set low_memory_mode=False.
  • For loading weights in FP32, set w16=False
  • For FP32 activations, set a16=False

Note: Only 2b model size is available for this pipeline.

Finally, to generate the image, use the generate_image() function:

HEIGHT = 512
WIDTH = 512

image, _ = pipeline.generate_image(
  "a photo of a cat holding a sign that says 'Hello!'",
  cfg_weight=5.0,
  num_steps=50,
  latent_size=(HEIGHT // 8, WIDTH // 8),
)

Some notable optional arguments:

  • For image-to-image, use image_path (path to input image) and denoise (value between 0. and 1.) input variables.
  • For seed, use seed input variable.
  • For negative prompt, use negative_text input variable.

The generated image can be saved with:

image.save("path/to/save.png")

Image Generation with Swift

Click to expand

Core ML Swift

Apple Core ML Stable Diffusion is the initial Core ML backend for DiffusionKit. Stable Diffusion 3 support is upstreamed to that repository while we build the holistic Swift inference package.

MLX Swift

🚧

License

DiffusionKit is released under the MIT License. See LICENSE for more details.

Citation

If you use DiffusionKit for something cool or just find it useful, please drop us a note at info@takeargmax.com!

If you use DiffusionKit for academic work, here is the BibTeX:

@misc{diffusionkit-argmax,
   title = {DiffusionKit},
   author = {Argmax, Inc.},
   year = {2024},
   URL = {https://github.com/argmaxinc/DiffusionKit}
}