OpenJDK Babylon Series: #1 From High-Level Tensor Operations to MLIR TOSA

Wednesday, Jan 14, 2026 | 5 minute read | Updated at Wednesday, Jan 14, 2026

@

mlir

Project Babylon extends Java beyond the JVM, enabling compiler-style transformation of Java code into alternative execution models such as machine learning runtimes and hardware accelerators through its Code Reflection infrastructure.

Babylon can export Java programs to ONNX , but if the real goal is efficient code generation for CPUs, GPUs, NPUs, and custom accelerators, ONNX is not always the best target. This post argues for MLIR TOSA as a more suitable compiler IR—one designed to lower cleanly across architectures and backend targets.


Introducing LLVM MLIR

To understand why TOSA is a strong target, we first need LLVM MLIR. MLIR (Multi-Level Intermediate Representation) is a compiler framework designed for domain-specific languages and accelerator-oriented compilation. It generalizes LLVM IR by supporting custom operations, types, and abstractions.

Programs in MLIR are represented in SSA form using operations organized into blocks and regions, enabling structured and hierarchical IRs. This makes it possible to model everything from high-level tensor programs to low-level hardware constructs.

MLIR’s defining feature is its dialect system: developers can introduce new operations and types as self-contained dialects, loaded without recompiling the compiler. This makes MLIR especially well-suited for rapidly evolving domains like machine learning and hardware accelerators.


What Is TOSA?

TOSA (Tensor Operator Set Architecture) is a first-class MLIR dialect that defines a compact, stable set of tensor operations for machine learning compilers and accelerator backends.

It is built around a few core ideas: a minimal but expressive operator set, precisely defined semantics, no framework- or training-specific behavior, and mechanical lowering to lower-level IRs.

TOSA covers common tensor workloads—element-wise ops, reductions, data movement, and neural network primitives such as conv2d and matmul. Because every operation has fully specified semantics, lowering to lower-level MLIR dialects or hardware-specific code is predictable and unambiguous.


TOSA vs. ONNX: A Compiler-Centric Comparison

One practical but often overlooked difference is where these IRs live:

TOSA is a first-class dialect inside LLVM MLIR. ONNX is not.

This distinction has deep consequences for compiler architecture, maintenance, and long-term stability.

AspectTOSAONNX
Primary goalCompiler IRModel interchange
MLIR integrationFirst-class MLIR dialectExternal format, requires import/conversion
Operator set~80 focused ops180+ ops, overlapping
SemanticsFully specifiedOften framework-defined
VersioningStable specFrequent opset changes
Lowering pathDirect to LinalgOften via TOSA anyway

The key insight is this:

When you target TOSA, you are already speaking MLIR. There is no foreign IR to import, normalize, or continuously chase as specifications evolve.

Many MLIR-based compilers lower ONNX into TOSA before doing real work.

If your frontend already exposes structured tensor semantics—whether from Java, DSLs, or domain-specific IRs—targeting TOSA directly removes an entire translation layer and keeps the compiler stack inside MLIR from day one.


Generating TOSA Programmatically

TOSA is not just a textual format—it is a first-class MLIR dialect with a full C++ API. This makes it ideal as a compilation target from custom frontends.

Below is a minimal example that generates a vector addition function in TOSA using MLIR’s C++ builder API:

#include "mlir/IR/Builders.h"
#include "mlir/IR/BuiltinOps.h"
#include "mlir/IR/BuiltinTypes.h"
#include "mlir/IR/MLIRContext.h"
#include "mlir/IR/Verifier.h"
#include "mlir/Dialect/Func/IR/FuncOps.h"
#include "mlir/Dialect/Tosa/IR/TosaOps.h"

using namespace mlir;

int main() {
    // Initialize context and load dialects
    MLIRContext context;
    context.getOrLoadDialect<func::FuncDialect>();
    context.getOrLoadDialect<tosa::TosaDialect>();

    // Create module and builder
    auto loc = UnknownLoc::get(&context);
    auto module = ModuleOp::create(loc);
    OpBuilder builder(&context);
    builder.setInsertionPointToEnd(module.getBody());

    // Define tensor type: tensor<?xf32>
    auto tensorType = RankedTensorType::get({ShapedType::kDynamic}, builder.getF32Type());

    // Create function: (tensor<?xf32>, tensor<?xf32>) -> tensor<?xf32>
    auto funcType = builder.getFunctionType({tensorType, tensorType}, {tensorType});
    auto funcOp = builder.create<func::FuncOp>(loc, "vector_add", funcType);
    funcOp.setSymVisibility("public");

    // Build function body
    Block *entry = funcOp.addEntryBlock();
    builder.setInsertionPointToStart(entry);
    auto addOp = builder.create<tosa::AddOp>(loc, tensorType,
                                              entry->getArgument(0),
                                              entry->getArgument(1));
    builder.create<func::ReturnOp>(loc, addOp.getResult());

    // Verify and print
    if (failed(verify(module)))
        return 1;
    module.dump();
    return 0;
}

Which produces:

module {
  func.func @vector_add(%arg0: tensor<?xf32>, %arg1: tensor<?xf32>) -> tensor<?xf32> {
    %0 = tosa.add %arg0, %arg1 : (tensor<?xf32>, tensor<?xf32>) -> tensor<?xf32>
    return %0 : tensor<?xf32>
  }
}

The Lowering Pipeline: TOSA to Native Code

Once we have TOSA IR, we need to lower it through several abstraction levels to reach native machine code. MLIR provides a well-defined path:

TOSA (tensor operations)
Linalg (structured linear algebra)
Bufferization (tensors → memrefs)
SCF (structured control flow / loops)
LLVM Dialect
LLVM IR → Native Code

Each stage reduces abstraction while preserving semantics:

StageTransformation
TOSA → LinalgHigh-level tensor ops become explicit element-wise computations
BufferizationImmutable tensors become mutable memory buffers (memref)
Linalg → SCFStructured ops become scf.for loops
SCF → CFStructured loops become branches and jumps
→ LLVMEverything converts to LLVM dialect primitives

The entire transformation can be executed with a single mlir-opt invocation using --pass-pipeline:

mlir-opt input.mlir --pass-pipeline='builtin.module(
    func.func(tosa-make-broadcastable, tosa-infer-shapes, tosa-to-linalg, tosa-to-arith),
    one-shot-bufferize{bufferize-function-boundaries},
    buffer-deallocation-pipeline,
    convert-linalg-to-loops,
    lower-affine,
    convert-scf-to-cf,
    convert-arith-to-llvm,
    convert-math-to-llvm,
    convert-index-to-llvm,
    finalize-memref-to-llvm,
    convert-func-to-llvm,
    reconcile-unrealized-casts,
    canonicalize
)' -o output_llvm.mlir

From there, standard LLVM tools take over:

mlir-translate output_llvm.mlir --mlir-to-llvmir -o output.ll
llc output.ll -march=x86-64 -filetype=obj -relocation-model=pic -o output.o
gcc -shared output.o -o libvector_add.so

The result is a native shared library that can be called directly from C.

The Memref Calling Convention

MLIR uses a memref descriptor ABI for tensor arguments. Each 1D tensor becomes 5 parameters:

FieldPurpose
allocatedBase pointer (for deallocation)
alignedAligned data pointer (for access)
offsetOffset into the buffer
sizeNumber of elements
strideStride between elements

The raw function signature looks like:

typedef struct {
    float *allocated;
    float *aligned;
    int64_t offset;
    int64_t sizes[1];
    int64_t strides[1];
} MemRef1D;

extern MemRef1D vector_add(
    float *a_alloc, float *a_aligned, int64_t a_offset, int64_t a_size, int64_t a_stride,
    float *b_alloc, float *b_aligned, int64_t b_offset, int64_t b_size, int64_t b_stride
);

For simple contiguous arrays, a wrapper function can hide this complexity:

void vector_add_simple(float *a, float *b, float *out, int64_t size) {
    MemRef1D result = vector_add(a, a, 0, size, 1, b, b, 0, size, 1);
    for (int64_t i = 0; i < size; i++) {
        out[i] = result.aligned[i];
    }
    free(result.allocated);
}

// Clean API for users
vector_add_simple(a, b, result, size);

This calling convention may seem verbose, but it enables MLIR to support strided tensors, views, and slices without copying data—essential for efficient tensor operations.


© 2026 Bruno Ah Um

🌱 Powered by Hugo with theme Dream.

About Me

I’m Bruno Caballero, a Lead Software Engineer based in Berlin, specializing in JVM engineering and low-level systems. I am interested in quantum computing through the lens of compiler design.

Social Links