Aussie AI
Inference Frameworks
-
Last Updated 25 April, 2026
-
by David Spuler, Ph.D.
Inference frameworks are software platforms that take a model and execute it against requests from users. Many inference frameworks also provide training and fine-tuning capabilities, but not all do. Many frameworks have been open-sourced, but there are also many that remain proprietary, and there is much competition occurring in the space.
There is much overlap between the concept of a framework and a "deep learning compiler". And there is also overlap with companies that are offering "AI cloud hosting" services, including both new startups and the major cloud hosts (e.g. Amazon AWS, Microsoft Azure, and Google GCP), which typically include both training and inference features.
Software frameworks are only one part of the AI tech stack. Read more about inference optimization, training optimization, hardware accelerators, ML compilers, and our list of common and obscure AI optimization techniques.
List of Machine Learning Frameworks
Some of the many frameworks include:
- TensorFlow, open-sourced by Google.
- PyTorch
- Torch
- MXNet
- HuggingFace Transformers
- LangChain
- GGML
- Llama.cpp
- Llvm
- Caffe and Caffe2
- Theano
- RNN
- Keras
- Microsoft CNTK (Cognitive Toolkit)
- Amazon ML
- Google Cloud AutoML
- Microsoft Azure (various)
- SciKit-learn
Features of ML Frameworks
Some of the desirable features include:
- GPU and hardware acceleration support
- Training optimizations
- Quantization
- Pruning
- Kernel operator fusion
- Server hosting support (i.e. deployment to run your model as a website backend service)
Survey Papers on ML Software Frameworks
Papers that review or survey software frameworks:
- G Menghani, 2023, Efficient deep learning: A survey on making deep learning models smaller, faster, and better, ACM Computing Surveys, https://dl.acm.org/doi/abs/10.1145/3578938, https://arxiv.org/abs/2106.08962
- Maurizio Capra, Beatrice Bussolino, Alberto Marchisio, Guido Masera, Maurizio Martina, Muhammad Shafique, 2020, Hardware and software optimizations for accelerating deep neural networks: Survey of current trends, challenges, and the road ahead, https://ieeexplore.ieee.org/iel7/6287639/6514899/09269334.pdf, https://arxiv.org/abs/2012.11233
- Daniel Nichols, Siddharth Singh, Shu-Huai Lin, Abhinav Bhatele, July 2022, A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks, https://arxiv.org/abs/2111.04949, https://pssg.cs.umd.edu/assets/papers/2022-07-dl-survey-arxiv.pdf
- Saba Amiri; Sara Salimzadeh; A.S.Z. Belloum, 2019, A Survey of Scalable Deep Learning Frameworks, 2019 15th International Conference on eScience (eScience), https://ieeexplore.ieee.org/document/9041689, PDF: https://pure.uva.nl/ws/files/58721994/09041689.pdf (Short survey paper from 2019.)
- Daniel Nichols, Siddharth Singh, Shu-Huai Lin, Abhinav Bhatele, July 2022, A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks, https://arxiv.org/abs/2111.04949, PDF: https://pssg.cs.umd.edu/assets/papers/2022-07-dl-survey-arxiv.pdf (Survey of frameworks from the theoretical perspective of parallelism.)
- MM YAPICI, N Topaloğlu, 2021, Computers and Informatics, Performance comparison of deep learning frameworks https://dergipark.org.tr/en/pub/ci/issue/60236/769457, PDF: https://dergipark.org.tr/en/download/article-file/1201877 (Examines Torch, Theano, Caffe, Caffe2, MXNet, Keras, TensorFlow, and CNTK frameworks in terms of training speed.)
- Maurizio Capra, Beatrice Bussolino, Alberto Marchisio, Guido Masera, Maurizio Martina, Muhammad Shafique, 2020, Hardware and software optimizations for accelerating deep neural networks: Survey of current trends, challenges, and the road ahead, https://ieeexplore.ieee.org/iel7/6287639/6514899/09269334.pdf, https://arxiv.org/abs/2012.11233
General Research on ML Software Frameworks
Research papers about general issues or specific frameworks:
- F Mince, D Dinh, J Kgomo, N Thompson, S Hooker, 2023, The Grand Illusion: The Myth of Software Portability and Implications for ML Progress, arXiv preprint arXiv:2309.07181, https://arxiv.org/pdf/2309.07181.pdf (Examines ML software frameworks TensorFlow, Pytorch, and JAX, and their portability across hardware.)
- H Guan, Y Xiao, J Li, Y Liu, G Bai, May 2023, A comprehensive study of real-world bugs in machine learning model optimization, 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), https://ieeexplore.ieee.org/document/10172690, PDF: https://yepangliu.github.io/files/ICSE2023-MOB.pdf, PDF: https://baigd.github.io/files/ICSE23-MOB.pdf (Frameworks can have bugs? Who knew?)
- N Mungoli, Apr 2023, Scalable, Distributed AI Frameworks: Leveraging Cloud Computing for Enhanced Deep Learning Performance and Efficiency, arXiv preprint arXiv:2304.13738, https://arxiv.org/abs/2304.13738 (Extending frameworks for distributed AI.)
- Arpan Jain, Ammar Ahmad Awan, Quentin Anthony, Hari Subramoni, Dhableswar K. DK Panda, 2019, Performance Characterization of DNN Training using TensorFlow and PyTorch on Modern Clusters, 2019 IEEE International Conference on Cluster Computing (CLUSTER), https://ieeexplore.ieee.org/abstract/document/8891042, PDF Slides: http://nbcl.cse.ohio-state.edu/static/media/talks/slide/Arpan_booth_talk_2.pdf
- Marc-André Zöller, Marco F. Huber, Jan 2021, Benchmark and Survey of Automated Machine Learning Frameworks, https://arxiv.org/abs/1904.12054
- Yushuo Chen, Tianyi Tang, Erge Xiang, Linjiang Li, Wayne Xin Zhao, Jing Wang, Yunpeng Chai, Ji-Rong Wen, 17 Apr 2024, Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models, https://arxiv.org/abs/2404.11502 (Benchmarks the performance of various Transformer inference frameworks: Transformers, vLLM, DeepSpeed-MII, TGI, TenserRT-LLM, llama.cpp, LightLLM, LMDeploy, StreamingLLM.)
- Yiheng Liu, Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, Yi Pan, Shaochen Xu, Zihao Wu, Zhengliang Liu, Xin Zhang, Shu Zhang, Xintao Hu, Tuo Zhang, Ning Qiang, Tianming Liu, Bao Ge, Jan 2024, Understanding LLMs: A Comprehensive Overview from Training to Inference https://arxiv.org/abs/2401.02038
- MLC team. 2023. MLC-LLM. https://github.com/mlc-ai/mlc-llm
- tinygrad. 2023. Tinygrad. https://github.com/tinygrad/tinygrad
- Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica, Oct 2023, Efficient Memory Management for Large Language Model Serving with PagedAttention, SOSP ’23, October 23–26, 2023, Koblenz, Germany, https://dl.acm.org/doi/pdf/10.1145/3600006.3613165 (The original Paged Attention and vLLM paper, focusing on optimizing memory size of the KV cache using methods similar to operating-system memory paging.)
- Vince Lam, Mar 12, 2024, 50+ Open-Source Options for Running LLMs Locally, https://medium.com/thedeephub/50-open-source-options-for-running-llms-locally-db1ec6f5a54f
- Jason Perlow, Aug. 6, 2024, How to run dozens of AI models on your Mac or PC - no third-party cloud needed, https://www.zdnet.com/article/how-to-run-dozens-of-ai-models-on-your-mac-or-pc-no-third-party-cloud-needed/
- Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng, 6 Jun 2024 (v2), SGLang: Efficient Execution of Structured Language Model Programs, https://arxiv.org/abs/2312.07104 https://github.com/sgl-project/sglang
- The SGLang Team, Jul 25, 2024, Achieving Faster Open-Source Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM), https://lmsys.org/blog/2024-07-25-sglang-llama3/
- Anna Popovych, Sofiya Merenych, February 16, 2024, Top AI Frameworks in 2024: Comparison of Artificial Intelligence Frameworks, https://clockwise.software/blog/artificial-intelligence-framework/
- Hugging Face, 2024, Text Generation Inference, https://huggingface.co/docs/text-generation-inference/index
- ZML, Sep 2024, ZML: High performance AI inference stack. Built for productionl https://docs.zml.ai/ https://github.com/zml/zml?tab=readme-ov-file
- Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia, 23 Dec 2023, Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems, https://arxiv.org/abs/2312.15234
- Ruihao Gong, Yifu Ding, Zining Wang, Chengtao Lv, Xingyu Zheng, Jinyang Du, Haotong Qin, Jinyang Guo, Michele Magno, Xianglong Liu, 25 Sep 2024, A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms, https://arxiv.org/abs/2409.16694
- Sebastian Petrus, Sep 4, 2024, Top 10 RAG Frameworks Github Repos 2024, https://sebastian-petrus.medium.com/top-10-rag-frameworks-github-repos-2024-12b2a81f4a49
- Rick Zhou, Larme Zhao, Bo Jiang, and Sean Sheng, June 5, 2024, Benchmarking LLM Inference Backends: vLLM, LMDeploy, MLC-LLM, TensorRT-LLM, and TGI, https://www.bentoml.com/blog/benchmarking-llm-inference-backends
- Wenchao Xu, Jinyu Chen, Peirong Zheng, Xiaoquan Yi, Tianyi Tian, Wenhui Zhu, Quan Wan, Haozhao Wang, Yunfeng Fan, Qinliang Su, Xuemin Shen, https://arxiv.org/abs/2412.13437 18 Dec 2024, Deploying Foundation Model Powered Agent Services: A Survey, (A survey of not just deployment, but many inference optimization techniques.)
- Meta, Jan 2025 (accessed), Llama Stack: Composable building blocks to build Llama Apps, https://github.com/meta-llama/llama-stack
- Mozhgan Navardi, Romina Aalishah, Yuzhe Fu, Yueqian Lin, Hai Li, Yiran Chen, Tinoosh Mohsenin, 19 Feb 2025, GenAI at the Edge: Comprehensive Survey on Empowering Edge Devices, https://arxiv.org/abs/2502.15816
- Amr Elmeleegy, Harry Kim, David Zier, Kyle Kranen, Neelay Shah, Ryan Olson and Omri Kahalon, Mar 18, 2025, Introducing NVIDIA Dynamo, A Low-Latency Distributed Inference Framework for Scaling Reasoning AI Models, https://developer.nvidia.com/blog/introducing-nvidia-dynamo-a-low-latency-distributed-inference-framework-for-scaling-reasoning-ai-models/
- Matthias Jobst, Tim Langer, Chen Liu, Mehmet Alici, Hector A. Gonzalez, Christian Mayr, 18 Jul 2025, An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC, https://arxiv.org/abs/2507.13736
- Kaichuan Kong, Dongjie Liu, Xiaobo Jin, Guanggang Geng, Zhiying Li, Jian Weng, 6 Aug 2025, DMFI: Dual-Modality Fine-Tuning and Inference Framework for LLM-Based Insider Threat Detection, https://arxiv.org/abs/2508.05694
- Soorya Ram Shimgekar, Shayan Vassef, Abhay Goyal, Navin Kumar, Koustuv Saha, 24 Jul 2025, Agentic AI framework for End-to-End Medical Data Inference, https://arxiv.org/abs/2507.18115
- Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, Rajesh Ranganath, 18 Jul 2025, A General Framework for Inference-time Scaling and Steering of Diffusion Models, https://arxiv.org/abs/2501.06848
- Jiawen Qi, Chang Gao, Zhaochun Ren, Qinyu Chen, 25 Jul 2025, DeltaLLM: A Training-Free Framework Exploiting Temporal Sparsity for Efficient Edge LLM Inference, https://arxiv.org/abs/2507.19608
- Riddhi J. Pitliya, Ozan Catal, Toon Van de Maele, Corrado Pezzato, Tim Verbelen, 1 Aug 2025, Theory of Mind Using Active Inference: A Framework for Multi-Agent Cooperation, https://arxiv.org/abs/2508.00401
- Chakattrai Sookkongwaree, Tattep Lakmuang, and Chainarong Amornbunchornvej, 1 Aug 2025, Multi-Band Variable-Lag Granger Causality: A Unified Framework for Causal Time Series Inference across Frequencies, https://arxiv.org/abs/2508.00658
- Bo Wen, 7 Aug 2025, A Framework for Inherently Safer AGI through Language-Mediated Active Inference, https://arxiv.org/abs/2508.05766
- Bj\"orn Volkmann, Jan-Hendrik Ewering, Michael Meindl, Simon F. G. Ehlers, Thomas Seel, 21 Aug 2025, Bayesian Inference and Learning in Nonlinear Dynamical Systems: A Framework for Incorporating Explicit and Implicit Prior Knowledge, https://arxiv.org/abs/2508.15345
- Zucheng Liang, Wenxin Wei, Kaijie Zhang, Hongyi Chen, 5 Sep 2025, Research on Multi-hop Inference Optimization of LLM Based on MQUAKE Framework, https://arxiv.org/abs/2509.04770
- Yongsheng Feng, Yuetonghui Xu, Jiehui Luo, Hongjia Liu, Xiaobing Li, Feng Yu, Wei Li, 19 Sep 2025, TISDiSS: A Training-Time and Inference-Time Scalable Framework for Discriminative Source Separation, https://arxiv.org/abs/2509.15666
- Enyu Zhou, Kai Sheng, Hao Chen, Xin He, 19 Sep 2025, CARD: A Cache-Assisted Parallel Speculative Decoding Framework via Query-and-Correct Paradigm for Accelerating LLM Inference, https://arxiv.org/abs/2508.04462
- Yudong Shen, Wenyu Wu, Jiali Mao, Yixiao Tong, Guoping Liu, Chaoya Wang, 15 Sep 2025, Bridging the Gap Between Sparsity and Redundancy: A Dual-Decoding Framework with Global Context for Map Inference, https://arxiv.org/abs/2509.11731
- Giorgos Armeniakos, Alexis Maras, Sotirios Xydis, Dimitrios Soudris, 18 Sep 2025, MaRVIn: A Cross-Layer Mixed-Precision RISC-V Framework for DNN Inference, from ISA Extension to Hardware Acceleration, https://arxiv.org/abs/2509.15187
- Nathanael Jo, Ashia Wilson, 23 Sep 2025, What Does Your Benchmark Really Measure? A Framework for Robust Inference of AI Capabilities, https://arxiv.org/abs/2509.19590
- Miruna Oprescu, David K. Park, Xihaier Luo, Shinjae Yoo, Nathan Kallus, 28 Oct 2025, GST-UNet: A Neural Framework for Spatiotemporal Causal Inference with Time-Varying Confounding, https://arxiv.org/abs/2502.05295
- Aditya Puttaparthi Tirumala, 23 Oct 2025, DeepCausalMMM: A Deep Learning Framework for Marketing Mix Modeling with Causal Inference, https://arxiv.org/abs/2510.13087
- Qilin Liao, Anamika Lochab, Ruqi Zhang, 20 Oct 2025, VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models, https://arxiv.org/abs/2510.17759
- Arshika Lalan, Rajat Ghosh, Aditya Kolsur, Debojyoti Dutta, 8 Oct 2025, A Multi-Agent Framework for Stateful Inference-Time Search, https://arxiv.org/abs/2510.07147
- Haojie Ouyang, Jianwei Lv, Lei Ren, Chen Wei, Xiaojie Wang, Fangxiang Feng, 28 Sep 2025, ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference, https://arxiv.org/abs/2510.02361
- Ke Wang, Felix Qu, Libin Xia, Zishuo Zhao, Chris Tong, Lynn Ai, Eric Yang, 29 Sep 2025, VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference, https://arxiv.org/abs/2509.24257
- Subhodip Panda, MS Varun, Shreyans Jain, Sarthak Kumar Maharana and Prathosh A.P, 5 Oct 2025, Variational Diffusion Unlearning: A Variational Inference Framework for Unlearning in Diffusion Models under Data Constraints, https://arxiv.org/abs/2510.04058
- Christopher Klugmann and Daniel Kondermann, 5 Oct 2025, Quantifying Ambiguity in Categorical Annotations: A Measure and Statistical Inference Framework, https://arxiv.org/abs/2510.04366
- Yuxin Ma, Lun Du, Lanning Wei, Kun Chen, Qian Xu, Kangyu Wang, Guofeng Feng, Guoshan Lu, Lin Liu, Xiaojing Qi, Xinyuan Zhang, Zhen Tao, Haibo Feng, Ziyun Jiang, Ying Xu, Zenan Huang, Yihong Zhuang, Haokai Xu, Jiaqi Hu, Zhenzhong Lan, Junbo Zhao, Jianguo Li, Da Zheng, 9 Oct 2025, dInfer: An Efficient Inference Framework for Diffusion Language Models, https://arxiv.org/abs/2510.08666
- Peter Wayner, Jan 26, 2026, 16 open source projects transforming AI and machine learning, https://www.infoworld.com/article/2336757/16-open-source-projects-transforming-ai-and-machine-learning.html
- Zylos, 15 Jan 2026, LLM Inference Optimization and Quantization 2026, https://zylos.ai/research/2026-01-15-llm-inference-optimization
- Morph, March 27, 2026, LLM Inference Optimization: A Practical Guide to Cutting Cost and Latency (2026): Concrete techniques for optimizing LLM inference across model, system, and application layers. Quantization, KV cache compression, continuous batching, speculative decoding, and context compaction with real benchmarks, https://www.morphllm.com/llm-inference-optimization
- Eduardo Aguilar-Bejarano, Daniel Lea, Karthikeyan Sivakumar, Jimiama M. Mase, Reza Omidvar, Ruizhe Li, Troy Kettle, James Mitchell-White, Morgan R Alexander, David A Winkler, Grazziela Figueredo, 23 Jul 2025, Helix 1.0: An Open-Source Framework for Reproducible and Interpretable Machine Learning on Tabular Scientific Data, https://arxiv.org/abs/2507.17791
- Ankita Vaishnobi Bisoi, Shreyas V, Jose Siguenza and Bharath Ramsundar, 28 Jul 2025, A Modular Open Source Framework for Genomic Variant Calling, https://arxiv.org/abs/2411.11513
- Xiaoyu Kong, Leheng Sheng, Junfei Tan, Yuxin Chen, Jiancan Wu, An Zhang, Xiang Wang, Xiangnan He, 28 Oct 2025, MiniOneRec: An Open-Source Framework for Scaling Generative Recommendation, https://arxiv.org/abs/2510.24431
- Alejandro Escontrela, Justin Kerr, Arthur Allshire, Jonas Frey, Rocky Duan, Carmelo Sferrazza, Pieter Abbeel, 17 Oct 2025, GaussGym: An open-source real-to-sim framework for learning locomotion from pixels, https://arxiv.org/abs/2510.15352
- Jiaming Wang, Diwen Liu, Jizhuo Chen, Harold Soh, 5 Oct 2025, TOPO-Bench: An Open-Source Topological Mapping Evaluation Framework with Quantifiable Perceptual Aliasing, https://arxiv.org/abs/2510.04100
- Deven Panchal, 12 Oct 2025, Simpliflow: A Lightweight Open-Source Framework for Rapid Creation and Deployment of Generative Agentic AI Workflows, https://arxiv.org/abs/2510.10675
- Rohan Gupta, Trevor Asbery, Zain Merchant, Abrar Anwar, Jesse Thomason, 12 Oct 2025, RobotFleet: An Open-Source Framework for Centralized Multi-Robot Task Planning, https://arxiv.org/abs/2510.10379
- Tanguy Herserant and Vincent Guigue, 29 Aug 2025, AllSummedUp: un framework open-source pour comparer les metriques d'evaluation de resume, https://arxiv.org/abs/2508.21389
- Weige Cai, Tong Zhu, Jinyi Niu, Ruiqi Hu, Lingyao Li, Tenglong Wang, Xiaowu Dai, Weining Shen, and Liwen Zhang, 11 Sep 2025, LightAgent: Production-level Open-source Agentic AI Framework, https://arxiv.org/abs/2509.09292
- Hank Gerba, 28 Jul 2025, Narrative Context Protocol: An Open-Source Storytelling Framework for Generative AI, https://arxiv.org/abs/2503.04844
- Aditya Nagori, Ricardo Accorsi Casonatto, Ayush Gautam, Abhinav Manikantha Sai Cheruvu, and Rishikesan Kamaleswaran, 30 Jul 2025, Open-Source Agentic Hybrid RAG Framework for Scientific Literature Review, https://arxiv.org/abs/2508.05660
- Ryan Albert Antonio, Joren Dumoulin, Xiaoling Yi, Josse Van Delm, Yunhao Deng, Guilherme Paim, Marian Verhelst, 20 Aug 2025, An Open-Source HW-SW Co-Development Framework Enabling Efficient Multi-Accelerator Systems, https://arxiv.org/abs/2508.14582
- Abhimanyu Bambhaniya, Ritik Raj, Geonhwa Jeong, Souvik Kundu, Sudarshan Srinivasan, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna, 3 Jun 2024, Demystifying Platform Requirements for Diverse LLM Inference Use Cases, https://arxiv.org/abs/2406.01698 Code: https://github.com/abhibambhaniya/GenZ-LLM-Analyzer (Analysis of cost of serving LLMs, including separate profiles of prefill versus decoding phases, and the cost of extra prompt processing in RAG architectures with prepended information.)
- Jeon, Byungsoo, May 2024, Automated and Portable Machine Learning Systems, Ph.D. Thesis, Carnegie Mellon University, https://doi.org/10.1184/R1/25746708.v1 https://kilthub.cmu.edu/articles/thesis/Automated_and_Portable_Machine_Learning_Systems/25746708/1 PDF: https://kilthub.cmu.edu/ndownloader/files/46074087 Code: https://github.com/cmu-catalyst/collage (Portability layer to integrate the various kernels and low-level backends more easily. Also covers pipeline parallelism in graph models, and KV cache parallelism similar to FlashDecode.)
- Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang, 22 Apr 2024, A Survey on Efficient Inference for Large Language Models, https://arxiv.org/abs/2404.14294
- Martin Thissen, April 20, 2024, Llama 3 on Your Local Computer | Free GPT-4 Alternative, https://medium.com/@martin-thissen/llama-3-on-your-local-computer-free-gpt-4-alternative-1f533e9abff7 (Llama3-70B with 4-bit quantization using vLLM for inference on NVIDIA RTX 6000 Ada GPU.)
- Pierrick Pochelu, 9 Oct 2022, Deep Learning Inference Frameworks Benchmark, https://arxiv.org/abs/2210.04323 (Benchmarking study in 2022 of various frameworks.)
- Max A. Cherney, March 26, 2024, Exclusive: Behind the plot to break Nvidia's grip on AI by targeting software, https://www.reuters.com/technology/behind-plot-break-nvidias-grip-ai-by-targeting-software-2024-03-25/
- Fucheng Jia, Shiqi Jiang, Ting Cao, Wei Cui, Tianrui Xia, Xu Cao, Yuanchun Li, Deyu Zhang, Ju Ren, Yunxin Liu, Lili Qiu, Mao Yang, Sep 2023, Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations, https://arxiv.org/pdf/2309.08978.pdf
- Daniel Crankshaw, Gur-Eyal Sela, Xiangxi Mo, Corey Zumar, Ion Stoica, Joseph Gonzalez, and Alexey Tumanov. 2020. InferLine: latency-aware provisioning and scaling for prediction serving pipelines. Proceedings of the 11th ACM Symposium on Cloud Computing. 477–491, https://arxiv.org/abs/1812.01776
- Maurizio Capra, Beatrice Bussolino, Alberto Marchisio, Guido Masera, Maurizio Martina, Muhammad Shafique, 2020, Hardware and software optimizations for accelerating deep neural networks: Survey of current trends, challenges, and the road ahead, https://ieeexplore.ieee.org/iel7/6287639/6514899/09269334.pdf, https://arxiv.org/abs/2012.11233 (Analysis of optimizations for DNNs and SNNs.)
- Suresh G, Sep 25, 2023, 7 Frameworks for Serving LLMs, Medium, https://medium.com/@gsuresh957/7-frameworks-for-serving-llms-5044b533ee88
- Doug Eadline, October 5, 2023, How AMD May Get Across the CUDA Moat, HPC Wire, https://www.hpcwire.com/2023/10/05/how-amd-may-get-across-the-cuda-moat/
- Hayden Wolff, Jun 02, 2024, A Simple Guide to Deploying Generative AI with NVIDIA NIM, NVIDIA Technical Blog, https://developer.nvidia.com/blog/a-simple-guide-to-deploying-generative-ai-with-nvidia-nim/
- K Dinghofer, F Hartung, 2020, Analysis of criteria for the selection of machine learning frameworks 2020 International Conference on Computing, Networking and Communications (ICNC), https://ieeexplore.ieee.org/document/9049650
- H Dai, X Peng, X Shi, L He, Q Xiong, H Jin, 2022, Reveal training performance mystery between TensorFlow and PyTorch in the single GPU environment, Science China Information Sciences volume 65, Article number: 112103 (2022), https://link.springer.com/article/10.1007/s11432-020-3182-1 http://scis.scichina.com/en/2022/112103.pdf
- C Luo, X He, J Zhan, L Wang, W Gao, J Dai, 2020, Comparison and benchmarking of AI models and frameworks on mobile devices, https://arxiv.org/abs/2005.05085
- Daniel Nichols, Siddharth Singh, Shu-Huai Lin, Abhinav Bhatele, July 2022, A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks, https://arxiv.org/abs/2111.04949 PDF: https://pssg.cs.umd.edu/assets/papers/2022-07-dl-survey-arxiv.pdf (Survey of frameworks from the theoretical perspective of parallelism.)
- R. Sanchez-Iborra and A. F. Skarmeta, Tinyml-enabled frugal smart objects: Challenges and opportunities, IEEE Circuits and Systems Magazine, vol. 20, no. 3, pp. 4–18, 2020. https://ieeexplore.ieee.org/document/9166461 PDF: https://sci-hub.se/10.1109/MCAS.2020.3005467
- R. Immonen, T. Hämäläinen et al., Tiny machine learning for resource-constrained microcontrollers, Journal of Sensors, vol. 2022, 2022, https://www.hindawi.com/journals/js/2022/7437023/
- M. Giordano, L. Piccinelli, and M. Magno, Survey and comparison of milliwatts micro controllers for tiny machine learning at the edge, in 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS). IEEE, 2022, pp. 94–97. https://ieeexplore.ieee.org/document/9870017
- Hong Zhang, Yupeng Tang, Anurag Khandelwal, and Ion Stoica. 2023. SHEPHERD: Serving DNNs in the Wild. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). USENIX Association, Boston, MA, 787–808. https://www.usenix.org/conference/nsdi23/presentation/zhang-hong
- Arnav Chavan, Raghav Magazine, Shubham Kushwaha, Mérouane Debbah, Deepak Gupta, 24 Apr 2024 (v2), Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward, https://arxiv.org/abs/2402.01799 Code: https://github.com/nyunAI/Faster-LLM-Survey
- Myeonghwa Lee, Seonho An, Min-Soo Kim, 18 Jun 2024, PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers, https://arxiv.org/abs/2406.12430 Code: https://github.com/myeon9h/PlanRAG
- Fabian Both, June 2024, why we no longer use LangChain for building our AI agents , https://www.octomind.dev/blog/why-we-no-longer-use-langchain-for-building-our-ai-agents (Replaces LangChain with their own more-focused internal tool sets.)
- Mark Zuckerberg, July 23, 2024 Open Source AI Is the Path Forward https://about.fb.com/news/2024/07/open-source-ai-is-the-path-forward/
- Shrestha, Y.R., von Krogh, G. & Feuerriegel, S., 2023, Building open-source AI. Nat Comput Sci 3, 908–911 (2023). https://doi.org/10.1038/s43588-023-00540-0 https://www.nature.com/articles/s43588-023-00540-0
- Dennis Rall, Bernhard Bauer, Thomas Fraunholz, 8 Nov 2023, Towards Democratizing AI: A Comparative Analysis of AI as a Service Platforms and the Open Space for Machine Learning Approach, https://arxiv.org/abs/2311.04518
- DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, et. al. (many additional authors), 19 Jun 2024 (v5), DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model, https://arxiv.org/abs/2405.04434
- LiLMod, Aug 27, 2024, Haystack: the new LLM framework that is shaking its competitors, https://ai.plainenglish.io/haystack-the-new-llm-framework-that-is-shaking-its-competitors-1a083a153fd9
- Mistral, Sep 2024, AI in abundance. Introducing a free API, improved pricing across the board, a new enterprise-grade Mistral Small, and free vision capabilities on le Chat. https://mistral.ai/news/september-24-release/
- Aparna Dhinakaran, Sep 2024, Choosing Between LLM Agent Frameworks. The tradeoffs between building bespoke code-based agents and the major agent frameworks. https://towardsdatascience.com/choosing-between-llm-agent-frameworks-69019493b259
- Nicola Sessions, Oct 15, 2024, DataStax Announces New AI Development Platform, Built with NVIDIA AI, https://developer.nvidia.com/blog/datastax-announces-new-ai-development-platform-built-with-nvidia-ai/
- Anurag Guda and Shruthii Sathyanarayanan, Oct 16, 2024, Simplify AI Application Development with NVIDIA Cloud Native Stack, https://developer.nvidia.com/blog/simplify-ai-application-development-with-nvidia-cloud-native-stack/
- Chengyue Wu, Xiaokang Chen, Zhiyu Wu, Yiyang Ma, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan, Ping Luo, 17 Oct 2024, Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation, https://arxiv.org/abs/2410.13848 https://github.com/deepseek-ai/Janus?tab=readme-ov-file
- Robert Corwin Nov 2024, Running Large Language Models Privately: A comparison of frameworks, models, and costs, https://towardsdatascience.com/running-large-language-models-privately-a-comparison-of-frameworks-models-and-costs-ac33cfe3a462
- Krishna Teja Chitty-Venkata, Siddhisanket Raskar, Bharat Kale, Farah Ferdaus, Aditya Tanikanti, Ken Raffenetti, Valerie Taylor, Murali Emani, Venkatram Vishwanath, 31 Oct 2024, LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators, https://arxiv.org/abs/2411.00136
- Kristian McCann, November 13, 2024, Top 10 AI Frameworks, https://aimagazine.com/articles/top-10-ai-frameworks
- Sahar Mor, Nov 28, 2024, The Open-Source Toolkit for Building AI Agents. Curated frameworks, tools, and libraries every developer needs to build functional and efficient AI agents, https://www.aitidbits.ai/p/open-source-agents
- Devansh, Jun 1, 2025, The Costly Open-Source LLM Lie: Open Source LLMs are not Free, https://machine-learning-made-simple.medium.com/the-costly-open-source-llm-lie-f83fdc5d5701
- David Spuler, March 2024, Generative AI in C++: Coding Transformers and LLMs, https://www.aussieai.com/book/toc PDF: https://www.aussieai.com/pdf/BOOK-Generative-AI-CPP-Spuler-2024.pdf
AI Books from Aussie AI
|
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
|
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
|
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
|
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
|
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
|
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research
Read more about: