Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Freelance Agent Evaluation Engineer

$50 per hour
Part-time

Please submit your CV in English and indicate your level of English proficiency.

Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems.  Participation is project-based, not permanent employment.

What this opportunity involves  

We're building a dataset to evaluate AI coding agents - how well a model handles real-world developer tasks.

You'll create challenging tasks and evaluation criteria within realistic simulated environments:

  • Build realistic developer environments - a virtual company with codebase, infrastructure, and context (tickets, docs, conversations) that forms a believable development history
  • Design tasks from intermediate states of these environments - craft the prompt, define what "solved" means, and ensure the task is solvable by an AI agent
  • Write tests that verify agent solutions - accept all valid approaches and reject incorrect ones, neither too strict nor too lenient
  • Iterate on tasks and tests based on QA feedback - review agent solutions, analyze failures, and refine until the evaluation is fair and robust

What this is NOT

  • Not data labeling
  • Not prompt engineering
  • Not writing code from scratch - the agent writes most of the code; you guide and evaluate

What we look for

  • 5+ years in software development
  • Core stack: Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, Redis
  • Experience writing tests (functional, integration)
  • English proficiency - B2+

Why this is hard 

Frontier models are already good at coding. Creating a task that genuinely challenges the best models is non-trivial. You need to deeply understand where models fail and what scenarios reveal the difference between a good and a bad solution. Tasks have many valid solutions - writing tests that accept all correct solutions and reject incorrect ones is harder than it sounds.

How it works

Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid

Effort estimate

Tasks for this project are estimated to take 20 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted.

Compensation

Up to $50/hr equivalent , depending on level and pace. Tasks are estimated at ~20 hours each; you set your own schedule.

Vacancy posted 9 days ago
Similar jobs that could be interesting for youBased on the Freelance Agent Evaluation Engineer in London vacancy
  • £61k - £78k per annumEstimated
     ...Back to jobs Frontier Agents Engineer London, UK AI is becoming vitally important in every function of our society. At Scale, our mission...  ...the same role. This allows us to ensure a fair and thorough evaluation of all applicants. About Us: At Scale, our mission is to... 
    Suggested
    Full-time

    Scale

    London
    6 hours ago
  •  ...English (Sri Lankan) Audio Evaluators Needed | Remote Freelance Opportunity Are you fluent in Sri Lankan English ? What You'll Do: • Evaluate short English (Sri Lankan) audio clips • Identify and assess regional accents • Compare recordings for accent accuracy and... 
    Freelance
    Remote

    Welo Data

    London
    10 days ago
  •  ...worthwhile, this is the perfect opportunity. We are looking for evaluators who are always on the go and enjoy sharing their feedback about...  ...to varying assignment types and industries. Benefits Freelance, part-time, project-based position. Flexible working hours.... 
    Freelance
    Part-time
    Flexible hours

    CXG

    London Heathrow Airport, Greater London
    more than 2 months ago
  • $44 per hour

     ...Are you passionate about linguistics, language quality, and evaluation? We are launching an exciting language evaluation project and are...  ...Commitment:  ~20 hours/week (flexible schedule) Job Type: Freelance contract Pay Rate: $44 USD/hour Location: United Kingdom... 
    Freelance
    Immediate start
    Flexible hours

    Welo Data

    London
    more than 2 months ago
  •  ...world of premium brands and make a lasting impact in fashion, beauty, jewelry, or automobiles. About the Role: As a luxury brand evaluator, you will step into the world of luxury to discreetly assess customer experiences, providing critical feedback that helps brands... 
    Freelance
    Flexible hours

    CXG

    London
    7 days ago
  • Company: MASENTO Job Type: Contract
    Freelance

    MASENTO

    London
    29 days ago
  •  ...We are seeking Audio Evaluators to participate in an exciting project focused on evaluating audio clips in Cantonese . This role involves...  ...Project Details • Location: Remote • Job Type : Freelance contract    Key Responsibilities • Listen to short audio clips... 
    Freelance
    Part-time
    Remote

    Welo Data

    London
    a month ago
  • £270 - £300 per day

     ...PSR Solutions are currently recruiting for a Freelance Setting Out Engineer on behalf of a specialist civil engineering contractor for a project based in East London. Our client is a well-established contractor delivering infrastructure and civil engineering works across... 
    Freelance

    PSR Solutions

    East London
    23 days ago
  •  ...Competitive (DOE) Joseph Gallagher is the leading UK based Civil Engineering & Tunnelling sub-contractor, growing significantly in the last 3...  ...taking place today. We are currently recruiting for an Agent to join our heavy civils team in Central London As the Agent... 
    Full-time

    Joseph Gallagher Limited

    London
    a month ago
  • £69k - £89k per annumEstimated
     ...Hungary and Singapore and 800 in Product and engineering. We serve a wide range of customers from personal users, freelancers, small businesses to enterprise businesses and...  ...what success looks like for your processes and evaluate progress using a blend of qualitative and quantitative... 
    Freelance
    Full-time

    Wise

    London
    2 days ago
  • £66k - £88k per annum

     ...technology to the military in months, not years. The Test and Evaluation team at Anduril works across the entire spectrum of products...  ...fielded this a fantastic opportunity. As a Test & Evaluation Engineer you will have responsibility for the planning, execution and reporting... 
    Full-time
    Immediate start

    Anduril Industries

    London
    1 day ago
  • £60k - £65k per annum

    Groundworks Engineer Ref: RG 6294 Type: Contract / Freelance Based in: Enfield Salary: £300 + / shift, negotiable Package: Negotiable Groundworks Engineer. Residential Groundworks + Infrastructure. 4- 5 years experience in setting out for Groundworks contractors on... 
    Freelance
    Shift work

    Argee Recruit

    Enfield, Greater London
    more than 2 months ago
  • £48k - £62k per annumEstimated
     ...Scientists with a passion for building complex agent-based AI systems in a data-rich, complex...  ...Learning, Learned datamixtures, etc.) Evaluation (Benchmarks, Human-in-the-loop, red...  ...specialists in ML/NLP/GenAI, as well as Engineering, to drive the company’s leading internal... 
    Full-time
    Hybrid working
    On-site
    Flexible hours

    Thomson Reuters

    London
    1 day ago
  • £57k - £73k per annumEstimated
     ...interdisciplinary teams where biology, computation, and engineering come together to solve complex problems...  ...reasoning systems and intelligent agents, using approaches such as reinforcement...  ..., or planning systems) Developing evaluation frameworks for reasoning or agentic... 
    Full-time

    Relation

    London
    4 days ago
  • £87k - £115k per annumEstimated
     ...applications go into production, we are expanding our Forward Deployed Engineering (FDE) team to drive hands-on innovation where it matters most:...  ...the same role. This allows us to ensure a fair and thorough evaluation of all applicants. About Us: At Scale, our mission is to... 

    Scale AI

    London
    1 day ago
  • $80 - $120 per hour

     ...Peter Thiel , Adam D'Angelo , Larry Summers , and Jack Dorsey . Position: Investor materials / fundraising / pitchbook Evaluator Type: Contract Compensation: $80–$120/hour Location: Remote Role Responsibilities Evaluate... 
    Remote job
    Summer work

    Mercor

    London
    11 days ago
  • £48k - £64k per annumEstimated
     ...Project Manager / Employers Agent (Cladding / Highrise) London / Sidcup / Hybrid 7...  ...closely with clients, contractors, fire engineers, fa ade specialists, and consultants, providing...  ...procurement documentation, tender evaluations, technical reports, and client updates... 
    Long-term contract
    Hybrid working
    London
    5 days ago
  •  ...A&R Outbound Sales Representative to join our team on a remote, freelance (self-employed) basis. In this role, you will be at the forefront...  ...like Spotify, YouTube, and social media to discover and evaluate potential artists for representation. Initial Pitch and Outreach... 
    Freelance
    Long-term contract
    Self-employed
    Remote
    Work from home
    Flexible hours

    Liberty Music PR

    London
    more than 2 months ago
  • £125k - £180k per annum

     ...Health, and Hunter Douglas to deploy AI agents that power personalized, deeply satisfying...  ...team. About the Team The Agent Engineering team at Decagon deploys mission-critical...  ...efforts Experiment with and run evaluations on the latest text and voice models, then... 
    Long-term contract
    Full-time
    On-site

    Decagon

    London
    2 days ago
  • $35 per hour

     ...opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not...  ...What we look for This opportunity is a good fit for optical engineers open to part-time, non-permanent projects. Ideally, contributors... 
    Freelance
    Hourly pay
    Permanent
    Temporary
    Part-time

    Mindrift

    London
    9 days ago
  • $35 per hour

     ...English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment. What this opportunity involves While... 
    Freelance
    Hourly pay
    Permanent
    Temporary
    Part-time

    Mindrift

    London
    9 days ago
  • £33k - £45k per annumEstimated
    This is a fantastic opportunity to join Luminance, the pioneer of Legal-Grade™ AI for enterprise. Backed by internationally renowned VCs and named in both the Forbes AI 50 list of ‘Most Promising Private AI Companies in the World’ and Inc. 5000’s ‘Fastest Growing Companies ...
    Full-time
    Immediate start

    Luminance

    London
    a month ago
  • £32.32k per annum

     ...Taxi Rank Agent - Heathrow Airport - £14.80 per hours, £32,323.20 per annum, Full Time, 42 hours per week. The shift pattern is 4 on, 4 off working days and nights.   Do you have excellent customer service skills? Do you want to work at Heathrow? Are you a team player?... 
    Hourly pay
    Full-time
    Bank staff
    Shift work
    Night shift

    HAL Associated Services

    London
    9 days ago
  •  ...different perks Joseph Gallagher is the leading UK based Civil Engineering & Tunnelling sub-contractor, growing significantly in the last 3...  ...taking place today. We are currently recruiting for an Agent to join our microtunnelling works in London As the Agent you... 
    Long-term contract
    Full-time

    Joseph Gallagher Limited

    London
    more than 2 months ago
  • $35.2 per hour

     ...Are you passionate about languages, writing, and quality evaluation? We are launching an exciting language evaluation project and...  ...Commitment: ~20 hours/week (flexible schedule). - Job Type: Freelance contract. - Location: Netherlands, Belgium (remote). -Rate:... 
    Freelance
    Part-time
    Immediate start
    Remote
    Flexible hours

    Welo Data

    London
    more than 2 months ago
  •  ...collective difference for our customers, employees, partners, and the world we live in. Role Responsibilities: The Warehouse Cargo Agent is responsible for handling and processing air cargo shipments efficiently and safely within a warehouse environment. This role... 

    PrimeFlight

    Hounslow, Greater London
    a month ago
  • £30k - £39k per annumEstimated
    We take care of our employees, and they take care of our customers! Become a member of a global community! The international logistics industry is an integral piece of the global trade puzzle; we make the world go round. Global supply chain management is what we do, and...
    Full-time

    Expeditors

    Feltham, Greater London
    11 days ago
  • £14.8 per hour

     ...Operational Support Agent – Full Time – Days and Nights - 7am until 7pm - £14.80 per hour Do you have excellent customer service skills? Are you a team player? Do you have good communication skills? Do you have a valid UK Driving License? If you answered yes to these questions... 
    Hourly pay
    Full-time
    Night shift

    Heathrow Airport T4 and T5

    Hounslow, Greater London
    8 days ago
  • £31k - £40k per annumEstimated
    We take care of our employees, and they take care of our customers! Become a member of a global community! The international logistics industry is an integral piece of the global trade puzzle; we make the world go round. Global supply chain management is what we do, and...
    Full-time

    Expeditors

    Feltham, Greater London
    16 hours ago
  • £13.5 - £20.25 per hour

     ...Logistics Agent Are you seeking a new opportunity? Join our client's team at Uxbridge and enjoy: Competitive Pay: Earn £13.50 per hour, overtime rates of £20.25 paid after 30 hours per week. Rates are PAYE, 28 days holiday per year which increase with length of service... 
    Hourly pay
    Full-time
    Temporary
    Immediate start
    Monday to Friday
    Shift work
    Uxbridge, Greater London
    17 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Freelance Agent Evaluation Engineer. Be the first to apply!