Anemll/Anemll: Artificial Neural Engine Machine Learning Library

ANEMLL (pronounced like “animal”) is an open-source project focused on accelerating the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE).

The goal is to provide a fully open-source pipeline from model conversion to inference for common LLM architectures running on ANE.
This enables seamless integration and on-device inference for low-power applications on edge devices, ensuring maximum privacy and security.
This is critical for autonomous applications, where models run directly on the device without requiring an internet connection.

We aim to:

Provide flexible and easy to use library/framework to port LLMs to ANE directly from Hugging Face models

Provide on-device examples for iOS and macOS swift or C/C++ Applications

See update Roadmap.md for more details

Main Components in 0.3.0 Alpha Release

ANEMLL provides five main components for Apple Neural Engine inference development:

LLM Conversion Tools – Scripts and code to convert models directly from Hugging Face weights
Swift Reference Implementation – Optimized inference code for Swift applications
- Sample CLI application in anemll-swift-cli
- Core inference engine implementation
Python Sample Code – Reference implementation and testing tools
- Basic chat interface (chat.py)
- Advanced conversation management (chat_full.py)
iOS/macOS Sample Applications – Ready-to-use example applications (Alpha, now on TestFlight)
- SwiftUI Chat interface
- Model Downloads and integration example
- Conversation management
ANEMLL-BENCH – Apple Neural Engine Benchmarking
- Performance testing and comparison
- Model optimization metrics
- Hardware-specific benchmarks
- GitHub Repository

We provide sample converted models ready for use:

LLAMA 3.1 (1B and 8B variants) including iOS “friendly builds”
DeepSeek distilled models
DeepHermes distilled models

Note

Please note that Quantization should be improved. LUT4 quality is fairly low due to lack of Block Quantization on Apple Neural Engine.
Some GPTQ and Spin Quant should greatly improve LUT4 models.

Visit our Hugging Face repository for the latest converted models.

Important

This is Alpha Release 0.3.0 for the library. It is designed to process Model Weights directly from Hugging Face models and convert them to the CoreML format for Apple Neural Engine (ANE for short).
This is Alpha Release 0.3.0 for the library. It is designed to process Model Weights directly from Hugging Face models and convert them to the CoreML format for Apple Neural Engine (ANE for short).

This release only supports LLAMA models including DeepSeek and DeepHermes distilled models on LLaMA 3.1 architecture
The future release will add support for more models and architectures
Please visit https://huggingface.co/anemll where we upload the latest models and X: @anemll for updates
Please star this repo to support the project!

Swift UI Sample Code

Sample iOS/macOS inference Chat-Bot App (Alpha)
Updates to Model conversion and upload scripts
Updates to Swift Package and CLI App

Sample iOS/macOS Applications

Downloads reference or custom models from HuggingFace
Inference / chat implementation use Swift Library
Sample TestFlight App for a quick test
See iOS/macOS Sample Applications Guide for details

Swift CLI Reference Implementation

The Swift CLI provides a reference implementation for running models on Apple Neural Engine. For detailed documentation, see Swift CLI Guide.

Download a model from Hugging Face
Convert the model using our single-shot conversion script:

./anemll/utils/convert_model.sh --model <path_to_model> --output <output_directory>

Run the model using our sample code:

python ./tests/chat.py --meta <output_directory>/meta.yaml

For detailed conversion steps and advanced options, see:

We provide two chat interfaces:

chat.py – Basic chat interface for quick testing
chat_full.py – Advanced chat with conversation history management

Features of chat_full.py:

Maintains full conversation history within context window
Automatically truncates older messages when needed
Shifts context window dynamically during long responses
Shows generation speed and token statistics
Better handles multi-turn conversations

Example running Chats:

# Basic chat
python ./tests/chat.py --meta ./converted_models/meta.yaml

# Full conversation mode
python ./tests/chat_full.py --meta ./converted_models/meta.yaml

See chat.md for more details

[Note]
The first time the model loads, macOS will take some time to place it on the device. Subsequent loads will be instantaneous. Use Ctrl-D to exit, Ctrl-C to interrupt inference.

macOS Sequoia with Apple Neural Engine
Minimum 16GB RAM
Python 3.9

Install ANEMLL:
We recommend creating a new virtual environment for this project.

python -m venv anemll-env
source anemll-env/bin/activate
pip install -r requirements.txt
# pip install anemll
# due to Alpha Release, we do not recommend installing ANEMLL as a package yet

CoreML compiler is required to compile the model. It is part of the Xcode command line tools.

Ensure that Xcode Command Line Tools are installed, as they include coremlcompiler.
You can install them by running xcode-select --install.
Verify that the xcrun command is available and correctly configured in your PATH.
Use xcrun --find coremlcompiler to verify the installation.
If above fails, please try following steps:
Download Xcode from the App Store.
Run sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer/ to set the path.
Use xcrun --find coremlcompiler to verify the installation.
Run sudo xcodebuild -license and agree to the license.

Currently optimized for:

Meta’s LLaMA 3.2 1B and 8B (1024 context) model including DeepSeek R1 8B distilled model, DeepHermes 3B and 8B models
More models are coming soon

Inspirations, feedback and other resources

Note

We welcome contributions! Please read our contributing guidelines before submitting PRs.

Feel free to submit issues and pull requests to improve ANEMLL!

Note

If you’re using ANEMLL in your project, please submit a PR to add it to this list.
We love to showcase how the community is using ANEMLL!

Third-Party Applications Using ANEMLL

Note

If you’re using ANEMLL in your project, please submit a PR to add it to this list.
We love to showcase how the community is using ANEMLL!

For examples of how to integrate ANEMLL into your projects, see:

For any questions or support, reach out to us at realanemll@gmail.com

ANEMLL is licensed under the MIT License.
https://opensource.org/license/mit

3D printing 3D scanning 5G 6G Adaptive learning AI AI ethics AI governance AI-driven automation AI-driven chatbots AI-driven healthcare AR/VR (Augmented and Virtual Reality)Artificial intelligence Augmented reality Automation Autonomous drones Autonomous vehicles Big data Bioinformatics Biometric security Blockchain Blockchain security Blockchain-as-a-Service Chatbots Cloud computing Cloud infrastructure Cloud security Cloud-native applications Cognitive computing Cryptocurrency Cyber defense Cyber-physical systems Cybersecurity Cybersecurity frameworks Data analytics Data governance Data lakes Data mining Data privacy Deep learning DevOps Digital currency Digital ecosystems Digital payments Digital transformation Digital twins Digital wallets Drones Edge AI Edge computing eSIM technology Fintech Fintech innovation Geospatial analytics Gig economy platforms Green technology Human augmentation Hybrid cloud Hyperautomation Image recognition Intelligent apps Internet of Behaviors (IoB)IoT (Internet of Things)IT operations IT security Machine learning Metaverse Microservices Mobile app development Multi-cloud environments Multi-factor authentication Natural language processing Neural networks Open-source software Predictive analytics Privacy-enhancing technologies Quantum computing Quantum encryption Quantum sensors Renewable energy storage Renewable energy tech Robotics Robotics process automation (RPA)SaaS (Software as a Service)Self-driving cars Serverless computing Smart cities Smart contracts Smart devices Smart grids Smart homes Supply chain tech Tech sustainability Video streaming Virtual assistants Virtual reality Voice recognition Wearable health tech Wearable technology Zero-trust security

Anemll/Anemll: Artificial Neural Engine Machine Learning Library

Main Components in 0.3.0 Alpha Release

Sample iOS/macOS Applications

Swift CLI Reference Implementation

Inspirations, feedback and other resources

Third-Party Applications Using ANEMLL

FBI Season 7 Episode 20 Release Date, Time, Where to Watch

Hong Kong Looks Inward to Reinvent Itself

Related Posts

Leave a Comment Cancel Reply