ANEMLL (pronounced like “animal”) is an open-source project focused on accelerating the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE).
The goal is to provide a fully open-source pipeline from model conversion to inference for common LLM architectures running on ANE.
This enables seamless integration and on-device inference for low-power applications on edge devices, ensuring maximum privacy and security.
This is critical for autonomous applications, where models run directly on the device without requiring an internet connection.We aim to:
- Provide flexible and easy to use library/framework to port LLMs to ANE directly from Hugging Face models
- Provide on-device examples for iOS and macOS swift or C/C++ Applications
See update Roadmap.md for more details
Main Components in 0.3.0 Alpha Release
ANEMLL provides five main components for Apple Neural Engine inference development:
-
LLM Conversion Tools – Scripts and code to convert models directly from Hugging Face weights
-
Swift Reference Implementation – Optimized inference code for Swift applications
- Sample CLI application in
anemll-swift-cli
- Core inference engine implementation
- Sample CLI application in
-
Python Sample Code – Reference implementation and testing tools
- Basic chat interface (
chat.py
) - Advanced conversation management (
chat_full.py
)
- Basic chat interface (
-
iOS/macOS Sample Applications – Ready-to-use example applications (Alpha, now on TestFlight)
- SwiftUI Chat interface
- Model Downloads and integration example
- Conversation management
-
ANEMLL-BENCH – Apple Neural Engine Benchmarking
- Performance testing and comparison
- Model optimization metrics
- Hardware-specific benchmarks
- GitHub Repository
We provide sample converted models ready for use:
- LLAMA 3.1 (1B and 8B variants) including iOS “friendly builds”
- DeepSeek distilled models
- DeepHermes distilled models
Note
Please note that Quantization should be improved. LUT4 quality is fairly low due to lack of Block Quantization on Apple Neural Engine.
Some GPTQ and Spin Quant should greatly improve LUT4 models.
Visit our Hugging Face repository for the latest converted models.
Important
This is Alpha Release 0.3.0 for the library. It is designed to process Model Weights directly from Hugging Face models and convert them to the CoreML format for Apple Neural Engine (ANE for short).
This is Alpha Release 0.3.0 for the library. It is designed to process Model Weights directly from Hugging Face models and convert them to the CoreML format for Apple Neural Engine (ANE for short).
- This release only supports LLAMA models including DeepSeek and DeepHermes distilled models on LLaMA 3.1 architecture
- The future release will add support for more models and architectures
- Please visit https://huggingface.co/anemll where we upload the latest models and X: @anemll for updates
- Please star this repo to support the project!
Swift UI Sample Code
- Sample iOS/macOS inference Chat-Bot App (Alpha)
- Updates to Model conversion and upload scripts
- Updates to Swift Package and CLI App
Sample iOS/macOS Applications
- Downloads reference or custom models from HuggingFace
- Inference / chat implementation use Swift Library
- Sample TestFlight App for a quick test
- See iOS/macOS Sample Applications Guide for details
Swift CLI Reference Implementation
The Swift CLI provides a reference implementation for running models on Apple Neural Engine. For detailed documentation, see Swift CLI Guide.
- Download a model from Hugging Face
- Convert the model using our single-shot conversion script:
./anemll/utils/convert_model.sh --model <path_to_model> --output <output_directory>
- Run the model using our sample code:
python ./tests/chat.py --meta <output_directory>/meta.yaml
For detailed conversion steps and advanced options, see:
We provide two chat interfaces:
chat.py
– Basic chat interface for quick testingchat_full.py
– Advanced chat with conversation history management
Features of chat_full.py:
- Maintains full conversation history within context window
- Automatically truncates older messages when needed
- Shifts context window dynamically during long responses
- Shows generation speed and token statistics
- Better handles multi-turn conversations
Example running Chats:
# Basic chat
python ./tests/chat.py --meta ./converted_models/meta.yaml
# Full conversation mode
python ./tests/chat_full.py --meta ./converted_models/meta.yaml
See chat.md for more details
[Note]
The first time the model loads, macOS will take some time to place it on the device. Subsequent loads will be instantaneous. Use Ctrl-D to exit, Ctrl-C to interrupt inference.
- macOS Sequoia with Apple Neural Engine
- Minimum 16GB RAM
- Python 3.9
- Install ANEMLL:
We recommend creating a new virtual environment for this project.
python -m venv anemll-env
source anemll-env/bin/activate
pip install -r requirements.txt
# pip install anemll
# due to Alpha Release, we do not recommend installing ANEMLL as a package yet
CoreML compiler is required to compile the model. It is part of the Xcode command line tools.
- Ensure that Xcode Command Line Tools are installed, as they include
coremlcompiler
. - You can install them by running
xcode-select --install
. - Verify that the
xcrun
command is available and correctly configured in your PATH. - Use
xcrun --find coremlcompiler
to verify the installation. - If above fails, please try following steps:
- Download Xcode from the App Store.
- Run
sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer/
to set the path. - Use
xcrun --find coremlcompiler
to verify the installation. - Run
sudo xcodebuild -license
and agree to the license.
Currently optimized for:
- Meta’s LLaMA 3.2 1B and 8B (1024 context) model including DeepSeek R1 8B distilled model, DeepHermes 3B and 8B models
- More models are coming soon
Inspirations, feedback and other resources
Note
We welcome contributions! Please read our contributing guidelines before submitting PRs.
Feel free to submit issues and pull requests to improve ANEMLL!
Note
If you’re using ANEMLL in your project, please submit a PR to add it to this list.
We love to showcase how the community is using ANEMLL!
Third-Party Applications Using ANEMLL
Note
If you’re using ANEMLL in your project, please submit a PR to add it to this list.
We love to showcase how the community is using ANEMLL!
For examples of how to integrate ANEMLL into your projects, see:
For any questions or support, reach out to us at realanemll@gmail.com
ANEMLL is licensed under the MIT License.
https://opensource.org/license/mit