Introduction

We offer comprehensive consulting to build and optimize your complete AI software stack, covering key areas such as heterogeneous computing hardware (CPU/GPU/NPU), operator software stack development, AI compiler performance tuning, and AI inference systems and model optimization. Our tailored solutions boost the efficiency of our clients’ AI model inference and hardware execution, helping to construct a robust and scalable foundational AI platform and improving the client’s ability to convert technical capabilities into tangible business value.

Example #1

Optimizing the Operator Development Workflow for a Major AI Platform

folderProject Background

In collaboration with a major Chinese AI platform provider, we tackled their core engineering challenges: managing a large and diverse set of operators developed in multiple languages, which drove up technical stack costs and complexity. We also overhauled the client’s operator precision and performance testing frameworks and engineering processes, streamlining regression validation and shortening test execution times, improving overall development and delivery efficiency.

folderConsulting Process

Based on systematic diagnosis and unique characteristics of the client's AI platform software, we implemented multi-dimensional improvements: 1. Optimized the operator development technology stack, enhancing consistency and coherence of the developer experience while carefully balancing rapid delivery efficiency and ultimate performance; 2. Optimized the C++-based operator development framework, reducing the complexity of developing sophisticated operators, thereby increasing code reuse and overall development efficiency; 3. Streamlined the operator cross-language test mechanism across different languages, resolving the issue of fragmented test case creation, and enabling seamless end-to-end validation for individual operators; 4. Enabled effective operator fusion testing by designing and implementing a Domain-Specific Language (DSL) based on graph construction and validation, simplifying the creation of complex graph fusion test cases and significantly improving their effectiveness; 5. Accelerated CI and test execution for operators, optimizing the concurrent execution performance of operator tests, shortening performance test case runtimes, and significantly improving the overall efficiency of the CI (Continuous Integration) pipeline.

folderCollaborative Achievements

The optimized operator development engineering process has enhanced both end-to-end developer experience and delivery efficiency. The refactored C++ framework allows for more flexible code composition and reuse, substantially improving code reuse rates. Furthermore, the streamlined automated testing process for operators has boosted both testing efficiency and feedback speeds, with the average lines of code per test case dropping from 200 to just 50.

Example #2

Optimizing the AI Compiler and Inference Stack for a Custom NPU

folderProject Background

We partnered with a major NPU company to optimize the architecture, performance, and engineering efficiency of their in-house AI compiler and inference engine.

folderConsulting Process

In partnership with the client’s experts, we performed an in-depth performance analysis on their AI compiler and inference engine based on the unique characteristics of their in-house NPU. Then, we proceeded to redesign and model the core architecture, ensuring that the compiler and inference engine remained independent while still being able to work synergistically to boost overall performance. Pairing with the client’s developer team, we refactored, developed, and tested the core architecture and code based on high-performing coding practices. The entire engagement spanned 9 months, and culminated in a smooth and successful launch of the new version.

folderCollaborative Achievements

The new version of the AI compiler supports more flexible extensibility, fulfilling long-term evolutionary needs of the platform. Specifically, the refactored AI inference engine delivered an overall performance increase of more than threefold, far exceeding the project's original performance targets, and achieving the collaborative goal of enhancing the client's product competitiveness.

Research Areas

AI Paradigm Innovation and Research