Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Inference System & Performance Engineer - Member of Technical Staff

£44.5 - £52.1 per hourEstimated
Full-time

About Us

The last era of AI scaled on a single bet: bigger models, more identical chips, more data. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. Real-world problems are heterogeneous: no single model or chip can solve them alone. The next era of AI requires heterogeneity at the infrastructure level - diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability that move the Pareto frontier of what is possible. That's what we are building.

Callosum is the Intelligent Systems Company. We started from questioning what actually creates intelligence. We believe there is no single answer, but rather a system-level solution. We co-evolve models, workflows, and silicon together to show that intelligence does not come from a single component, but it emerges from the diversity of co-optimised mechanisms working together and aware of each other. Heterogeneity will define the next era of compute, and is a principle that holds in biological, neuronal, and economic systems alike.

In early 2026 we launched with results showing orders of magnitude improvements in performance, and this is only the beginning. Agentic AI is the future of how intelligence is deployed: multi-step, long-horizon, and operating in changing environments. These systems are inherently heterogeneous, and can only be as powerful as the infrastructure that runs them.

We are engineers and scientists based in London, working together across the full depth of the stack. We are curious, intellectually honest, and building what doesn't exist yet. If you thrive on uncharted territory and are energised by the scale of the challenge, we'd love to hear from you.

 

About the Role

Standard inference architectures typically focus on monolithic chip types and model classes. Callosum intentionally breaks this mold, operating heterogeneous hardware at scale across a diverse model portfolio. Success in this environment requires an inference layer built entirely from first principles.

Sitting at the heart of our technical mission, this position owns end-to-end performance for our inference platforms. Your focus will span KV cache strategies, batching internals, memory management, and multi-node scheduling. You will develop the core software driving execution speed, silicon efficiency, and platform scalability as we expand our hardware and model footprint. This is a high-leverage role tackling complex system challenges across the entire stack.

What You’ll Build

  • Design and optimise inference serving systems across heterogeneous multi-GPU and multi-node environments

  • Own KV cache lifecycle management, batching strategies, and memory allocation to maximise throughput and minimise latency

  • Profile and tune GPU kernels, identify bottlenecks across compute, memory, and network, and implement targeted optimisations

  • Build and improve scheduling logic for continuous batching, disaggregated prefill/decode, and speculative decoding

  • Work with networking primitives - NCCL, NVLink, RDMA, InfiniBand, RoCE - to optimise communication across distributed inference workloads

  • Develop tooling for performance visibility, regression detection, and benchmarking across hardware configurations

What you Bring

  • Deep understanding of LLM inference internals: KV cache lifecycle, memory management, attention mechanisms, and serving architectures

  • Strong systems engineering background with proven experience optimising distributed GPU workloads

  • Proficiency in C++, CUDA, Python, Rust, or similar - and the instinct to go low-level when it matters

  • Hands-on debugging skills across GPU, networking, and Linux systems - able to work from first principles with limited tooling

What Sets You Apart

  • Experience building or significantly optimising production-grade, high-throughput model serving stacks

  • Multi-GPU and multi-node inference optimisation using NCCL, NVLink, RDMA, InfiniBand, or RoCE

  • GPU memory profiling, CUDA or Triton kernel optimisation

  • Linux performance analysis and optimisation

What We Offer

  • Competitive Salary, determined by skills and experience

  • Equity & Ownership

  • Private healthcare

  • We offer Visa sponsorship and relocation benefits to hire the best in the world

  • We work in person at our London office. You'll have the tools, space and setup to do your best work, and if you have specific needs, just tell us

We're committed to building an inclusive workplace where everyone feels welcome, and believe in equal opportunities for all.

Vacancy posted 13 hours ago
Similar jobs that could be interesting for youBased on the Inference System & Performance Engineer - Member of Technical Staff in London vacancy
  • £68k - £88k per annumEstimated
     ...Member of Technical Staff, Training Infrastructure — Inherent (London) At Inherent, we are on a mission to build AI that recursively self-improves...  ...About the role We're looking for an infrastructure engineer to help build the training and inference systems that frontier... 
    Performance
    Full-time
    On-site
    Shift work

    inherent

    London
    13 hours ago
  • £111k - £142k per annumEstimated
     ...strengths, co-evolving into systems of capability that move the Pareto...  ...of magnitude improvements in performance, and this is only the...  ...infrastructure that runs them. We are engineers and scientists based in...  ..., evolved beyond GPUs. Inference engines were designed for single... 
    Performance
    Full-time
    Relocation package
    Visa sponsorship
    On-site

    callosum

    London
    13 hours ago
  • £73k - £94k per annumEstimated
     ...We are looking for an AI Inference Engineer to join our growing team. We build and run the inference...  ...up with rapidly growing traffic. Performance optimisation. Profile and fix bottlenecks...  ..., CUTLASS, or similar). Any other deep systems programming experience is a plus.... 
    Performance
    Full-time

    perplexity

    London
    13 hours ago
  • £61k - £79k per annumEstimated
     ...strengths, co-evolving into systems of capability that move the Pareto...  ...of magnitude improvements in performance, and this is only the...  ...infrastructure that runs them. We are engineers and scientists based in...  ...platform team, setting the technical direction. You will work closely... 
    Performance
    Full-time
    Relocation package
    Visa sponsorship
    On-site
    Shift work

    callosum

    London
    13 hours ago
  • £63k - £82k per annumEstimated
     ...strengths, co-evolving into systems of capability that move the Pareto...  ...of magnitude improvements in performance, and this is only the...  ...infrastructure that runs them. We are engineers and scientists based in...  ...Applied AI Engineers are the technical bridge between what customers... 
    Performance
    Full-time
    Relocation package
    Visa sponsorship
    On-site

    callosum

    London
    13 hours ago
  • £64k - £82k per annumEstimated
     ...strengths, co-evolving into systems of capability that move the Pareto...  ...of magnitude improvements in performance, and this is only the...  ...infrastructure that runs them. We are engineers and scientists based in...  ...new accelerators and complex inference workflows. You will own the... 
    Performance
    Full-time
    Relocation package
    Visa sponsorship
    On-site

    callosum

    London
    13 hours ago
  • £59k - £76k per annumEstimated
     ...Member of Technical Staff, Post-Training — Inherent (London) At Inherent, we are on a mission to build AI that recursively self-improves to discover...  ...and RL algorithms to post-train models that autonomously perform state-of-the-art research. Build the autocurricula,... 
    Performance
    Full-time
    On-site
    Shift work

    inherent

    London
    13 hours ago
  • £50k - £64k per annumEstimated
     ...hear from you. Our team spans machine learning, product, engineering, conversational design, clinical, growth, and operations,...  ...cutting-edge AI to improve mental health access globally. As a Member of Technical Staff focused on Software Engineering, you will ship product end-... 
    Performance
    Full-time
    On-site

    slingshotai

    London
    13 hours ago
  • £58k - £76k per annumEstimated
     ...developers and enterprises who are building AI systems to power magical experiences like...  .... Cohere is a team of researchers, engineers, designers, and more, who are...  ...between UTC−06:00 and UTC+01:00. As a Member of Technical Staff, you will: Design and write high... 
    Performance
    Remote job
    Full-time
    On-site
    Flexible hours

    Cohere

    London
    more than 2 months ago
  • £107k - £262k per annum

     ...ABOUT xAI xAI’s mission is to create AI systems that can accurately understand the...  ...small, highly motivated, and focused on engineering excellence. This organization is for individuals...  ...level, to fine-tuning filesystem performance on nodes. The Sandbox service enables Grok... 
    Performance
    Remote

    xAI

    London
    7 days ago
  • £63k - £82k per annumEstimated
     ...with distinct strengths, co-evolving into systems of capability that move the Pareto...  ...showing orders of magnitude improvements in performance, and this is only the beginning. Agentic...  ...infrastructure that runs them. We are engineers and scientists based in London, working... 
    Performance
    Full-time
    Relocation package
    Visa sponsorship
    On-site

    callosum

    London
    13 hours ago
  • £107k - £262k per annum

     ...ABOUT xAI xAI’s mission is to create AI systems that can accurately understand the...  ...small, highly motivated, and focused on engineering excellence. This organization is for individuals...  ...teammates. ABOUT THE ROLE: As a Member of Technical Staff, you will build frameworks to... 

    xAI

    London
    15 days ago
  • £47k - £63k per annumEstimated
     ...enterprises who are building AI systems to power magical experiences...  ...Cohere is a team of researchers, engineers, designers, and more, who are...  ...we’re looking for a senior member for the Agent Code team. You’ll...  ...collaboration. As a Member of Technical Staff on the Agent Code team,... 
    Full-time
    On-site
    Remote
    Flexible hours

    Cohere

    London
    more than 2 months ago
  • £54k - £70k per annumEstimated
     ...shape how frontier AI learns to operate in the real world, we'd like to hear from you. About the Role As a Member of Technical Staff on our Software Engineering team, you will build the platform that powers how Aptura operates and scales — the annotation tooling,... 
    Full-time

    aptura

    London
    13 hours ago
  • £51k - £66k per annumEstimated
     ...world, we'd like to hear from you. About the Role As a Member of Technical Staff on our Applied AI team, you will build the tasks and...  ...looks like in an LBO model or a clinical note. Some days it's engineering, some days it's closer to research. The common thread is that... 
    Full-time

    aptura

    London
    13 hours ago
  • £65k - £83k per annumEstimated
     ...Member of Technical Staff, Factory Redesign — Inherent (London) At Inherent, we are on a mission to build AI that recursively self-improves to...  ...someone to help redesign our research factory by inventing new systems, workflows, and measurements for human-machine teaming. You... 
    Full-time
    On-site
    Shift work

    inherent

    London
    13 hours ago
  • £81k - £108k per annumEstimated
     ...VAST Data is looking to hire a Senior Systems Engineer! This is a great opportunity to be part...  ...time data analysis and AI training and inference. Designed from the ground up to make AI...  ...and tactics and takes ownership of technical responsibilities within customer accounts... 
    Performance
    Traineeship

    VAST Data

    London
    a month ago
  • £76k - £100k per annumEstimated
     ...Singapore.   What you will do as a Senior Systems Engineer at Akuna:   We are looking for a...  ...role, you’ll tackle complex, large-scale technical challenges and play a key role in the...  ...optimizing and maintaining of our high-performance global trading environment. The ideal candidate... 
    Performance
    Full-time
    On-site

    Akuna Capital

    London
    22 days ago
  • £57k - £74k per annumEstimated
     ...Type: Permanent / Full-time Job Purpose The Senior Systems Engineer is technically responsible for both new product development and product...  ...business objectives in terms of scope, time, cost, quality, and performance. As a key member of the R&D team, the Senior Systems... 
    Performance
    Long-term contract
    Permanent
    Full-time
    Hybrid working

    Inspiration Healthcare

    Croydon, Greater London
    11 days ago
  • £47k - £60k per annumEstimated
     ...Our team spans machine learning, product, engineering, conversational design, clinical, growth,...  ...'re on a fast growth trajectory. As a member of our platform team, you'll build...  ...closely with our leadership team to build systems that manage the company. The Slingshot... 
    Full-time
    On-site

    slingshotai

    London
    13 hours ago
  • £10 per hour

     ...across the world. The Role: The Systems Engineering function at CoMind sits at the...  ...integration and verification campaigns. As a member of the System and Product Assurance team...  ...regulatory functions you will ensure the system performs as intended in its clinical use... 
    Performance
    Full-time
    On-site
    Remote
    Work from home
    Flexible hours

    comind

    London
    13 hours ago
  • £60k - £79k per annumEstimated
     ...into existing surgical vision systems, our technology transforms...  ...join us as our Imaging Systems Engineer, to play a key role in the development...  ...algorithm development and performance benchmarking. A particular...  ..., troubleshooting, and technical interactions with clinical and... 
    Performance
    Full-time
    Hybrid working
    On-site

    Hypervision Surgical

    London
    a month ago
  • £47k - £63k per annumEstimated
     ...ALTEN is a global engineering and technology consultancy operating...  ...Description Seeking a System Engineer to join our team in...  ...verification plans. Coordinate technical activities across...  ...including functional testing, performance assessment, and compliance checks... 
    Performance
    Full-time
    Flexible hours

    ALTEN

    London
    4 days ago
  •  ...We are looking for a Senior ML Systems Engineer to build and validate simulation infrastructure for large-scale...  ...behaviour of systems used for ML training and inference, and using simulation to guide architecture, performance optimization, and capacity planning. The ideal... 
    Performance

    Oriole Networks

    London
    a month ago
  • £55k - £72k per annumEstimated
    Linux System Engineers Position Description CGI was recognised in the Sunday Times Best Places...  ...3.5% + 3.5% matching) which makes you a member not just an employee. We are committed...  ...delivery of contemporary and innovative technical solutions for the government agencies... 
    Performance
    Hybrid working
    5 days/week
    London
    more than 2 months ago
  • £68k - £87k per annumEstimated
     ...software development. As an early member of the team, you’ll help shape...  ...We're hiring a Research Engineer to join our AI Research (AIR)...  ...workstreams — they're parts of one system, and we want people who see...  ...how you use agents. A 4 hour technical take-home exercise extending... 
    Full-time
    Hybrid working
    On-site
    Monday to Thursday

    tessl.io

    London
    2 days ago
  • £83k - £108k per annumEstimated
     ...environment where creativity meets technical challenge, take pride in...  ...small team building frontier systems. We are seeking a Machine...  ...Systems & Infrastructure Engineer to build and own the systems...  ...PyTorch DDP/FSDP, NCCL) for performance, stability, and reproducibility... 
    Performance

    Software Engineering, Other Engineering, Data Science

    London
    a month ago
  • £19.48 per hour

     ...Provide prompt and effective technical support for all aspects of personal...  ..., including fit for purpose systems, and high standards of access...  ...team, so that all its members benefit from your core skills...  ...⦁ Monitors service delivery performance metrics.  ⦁ Liaises with stakeholders... 
    Performance
    Immediate start
    Rotating shifts

    London Borough of Havering

    Romford, Greater London
    1 day ago
  •  ...experienced IT team committed to providing seamless technical support for all faculty, staff, and students. The Role In this pivotal...  ...will be responsible for ensuring the optimal performance and efficiency of the school's IT systems. You will provide comprehensive technical... 
    Performance
    Full-time

    Wayman Learning Trust

    London
    more than 2 months ago
  • £121k - £164k per annumEstimated
     ...Site Reliability Engineer – Fintech Quant Capital is urgently looking for a...  ...assist in investigating and tackling performance issues ·Extend technical operations procedures and use best practices...  ...whilst educating more junior members of the team ·Investigate, tackle... 
    Performance

    Quant Capital

    London
    more than 2 months ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Inference System & Performance Engineer - Member of Technical Staff. Be the first to apply!