Please submit your CV in English and indicate your level of English proficiency.
Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment.
What this opportunity involves
We're building a dataset to evaluate AI coding agents - how well a model handles real-world developer tasks.
You'll create challenging tasks and evaluation criteria within realistic simulated environments:
- Build realistic developer environments - a virtual company with codebase, infrastructure, and context (tickets, docs, conversations) that forms a believable development history
- Design tasks from intermediate states of these environments - craft the prompt, define what "solved" means, and ensure the task is solvable by an AI agent
- Write tests that verify agent solutions - accept all valid approaches and reject incorrect ones, neither too strict nor too lenient
- Iterate on tasks and tests based on QA feedback - review agent solutions, analyze failures, and refine until the evaluation is fair and robust
What this is NOT
- Not data labeling
- Not prompt engineering
- Not writing code from scratch - the agent writes most of the code; you guide and evaluate
What we look for
- 5+ years in software development
- Core stack: Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, Redis
- Experience writing tests (functional, integration)
- English proficiency - B2+
Why this is hard
Frontier models are already good at coding. Creating a task that genuinely challenges the best models is non-trivial. You need to deeply understand where models fail and what scenarios reveal the difference between a good and a bad solution. Tasks have many valid solutions - writing tests that accept all correct solutions and reject incorrect ones is harder than it sounds.
How it works
Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid
Effort estimate
Tasks for this project are estimated to take 20 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted.
Compensation
Up to $50/hr equivalent , depending on level and pace. Tasks are estimated at ~20 hours each; you set your own schedule.
- £61k - £78k per annumEstimated...Back to jobs Frontier Agents Engineer London, UK AI is becoming vitally important in every function of our society. At Scale, our mission... ...the same role. This allows us to ensure a fair and thorough evaluation of all applicants. About Us: At Scale, our mission is to...SuggestedFull-time
- ...English (Sri Lankan) Audio Evaluators Needed | Remote Freelance Opportunity Are you fluent in Sri Lankan English ? What You'll Do: • Evaluate short English (Sri Lankan) audio clips • Identify and assess regional accents • Compare recordings for accent accuracy and...FreelanceRemote
- ...worthwhile, this is the perfect opportunity. We are looking for evaluators who are always on the go and enjoy sharing their feedback about... ...to varying assignment types and industries. Benefits Freelance, part-time, project-based position. Flexible working hours....FreelancePart-timeFlexible hours
$44 per hour
...Are you passionate about linguistics, language quality, and evaluation? We are launching an exciting language evaluation project and are... ...Commitment: ~20 hours/week (flexible schedule) Job Type: Freelance contract Pay Rate: $44 USD/hour Location: United Kingdom...FreelanceImmediate startFlexible hours- ...world of premium brands and make a lasting impact in fashion, beauty, jewelry, or automobiles. About the Role: As a luxury brand evaluator, you will step into the world of luxury to discreetly assess customer experiences, providing critical feedback that helps brands...FreelanceFlexible hours
- Company: MASENTO Job Type: ContractFreelance
- ...We are seeking Audio Evaluators to participate in an exciting project focused on evaluating audio clips in Cantonese . This role involves... ...Project Details • Location: Remote • Job Type : Freelance contract Key Responsibilities • Listen to short audio clips...FreelancePart-timeRemote
£270 - £300 per day
...PSR Solutions are currently recruiting for a Freelance Setting Out Engineer on behalf of a specialist civil engineering contractor for a project based in East London. Our client is a well-established contractor delivering infrastructure and civil engineering works across...Freelance- ...Competitive (DOE) Joseph Gallagher is the leading UK based Civil Engineering & Tunnelling sub-contractor, growing significantly in the last 3... ...taking place today. We are currently recruiting for an Agent to join our heavy civils team in Central London As the Agent...Full-time
- £69k - £89k per annumEstimated...Hungary and Singapore and 800 in Product and engineering. We serve a wide range of customers from personal users, freelancers, small businesses to enterprise businesses and... ...what success looks like for your processes and evaluate progress using a blend of qualitative and quantitative...FreelanceFull-time
£66k - £88k per annum
...technology to the military in months, not years. The Test and Evaluation team at Anduril works across the entire spectrum of products... ...fielded this a fantastic opportunity. As a Test & Evaluation Engineer you will have responsibility for the planning, execution and reporting...Full-timeImmediate start£60k - £65k per annum
Groundworks Engineer Ref: RG 6294 Type: Contract / Freelance Based in: Enfield Salary: £300 + / shift, negotiable Package: Negotiable Groundworks Engineer. Residential Groundworks + Infrastructure. 4- 5 years experience in setting out for Groundworks contractors on...FreelanceShift work- £48k - £62k per annumEstimated...Scientists with a passion for building complex agent-based AI systems in a data-rich, complex... ...Learning, Learned datamixtures, etc.) Evaluation (Benchmarks, Human-in-the-loop, red... ...specialists in ML/NLP/GenAI, as well as Engineering, to drive the company’s leading internal...Full-timeHybrid workingOn-siteFlexible hours
- £57k - £73k per annumEstimated...interdisciplinary teams where biology, computation, and engineering come together to solve complex problems... ...reasoning systems and intelligent agents, using approaches such as reinforcement... ..., or planning systems) Developing evaluation frameworks for reasoning or agentic...Full-time
- £87k - £115k per annumEstimated...applications go into production, we are expanding our Forward Deployed Engineering (FDE) team to drive hands-on innovation where it matters most:... ...the same role. This allows us to ensure a fair and thorough evaluation of all applicants. About Us: At Scale, our mission is to...
$80 - $120 per hour
...Peter Thiel , Adam D'Angelo , Larry Summers , and Jack Dorsey . Position: Investor materials / fundraising / pitchbook Evaluator Type: Contract Compensation: $80–$120/hour Location: Remote Role Responsibilities Evaluate...Remote jobSummer work- £48k - £64k per annumEstimated...Project Manager / Employers Agent (Cladding / Highrise) London / Sidcup / Hybrid 7... ...closely with clients, contractors, fire engineers, fa ade specialists, and consultants, providing... ...procurement documentation, tender evaluations, technical reports, and client updates...Long-term contractHybrid working
- ...A&R Outbound Sales Representative to join our team on a remote, freelance (self-employed) basis. In this role, you will be at the forefront... ...like Spotify, YouTube, and social media to discover and evaluate potential artists for representation. Initial Pitch and Outreach...FreelanceLong-term contractSelf-employedRemoteWork from homeFlexible hours
£125k - £180k per annum
...Health, and Hunter Douglas to deploy AI agents that power personalized, deeply satisfying... ...team. About the Team The Agent Engineering team at Decagon deploys mission-critical... ...efforts Experiment with and run evaluations on the latest text and voice models, then...Long-term contractFull-timeOn-site$35 per hour
...opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not... ...What we look for This opportunity is a good fit for optical engineers open to part-time, non-permanent projects. Ideally, contributors...FreelanceHourly payPermanentTemporaryPart-time$35 per hour
...English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment. What this opportunity involves While...FreelanceHourly payPermanentTemporaryPart-time- £33k - £45k per annumEstimatedThis is a fantastic opportunity to join Luminance, the pioneer of Legal-Grade™ AI for enterprise. Backed by internationally renowned VCs and named in both the Forbes AI 50 list of ‘Most Promising Private AI Companies in the World’ and Inc. 5000’s ‘Fastest Growing Companies ...Full-timeImmediate start
£32.32k per annum
...Taxi Rank Agent - Heathrow Airport - £14.80 per hours, £32,323.20 per annum, Full Time, 42 hours per week. The shift pattern is 4 on, 4 off working days and nights. Do you have excellent customer service skills? Do you want to work at Heathrow? Are you a team player?...Hourly payFull-timeBank staffShift workNight shift- ...different perks Joseph Gallagher is the leading UK based Civil Engineering & Tunnelling sub-contractor, growing significantly in the last 3... ...taking place today. We are currently recruiting for an Agent to join our microtunnelling works in London As the Agent you...Long-term contractFull-time
$35.2 per hour
...Are you passionate about languages, writing, and quality evaluation? We are launching an exciting language evaluation project and... ...Commitment: ~20 hours/week (flexible schedule). - Job Type: Freelance contract. - Location: Netherlands, Belgium (remote). -Rate:...FreelancePart-timeImmediate startRemoteFlexible hours- ...collective difference for our customers, employees, partners, and the world we live in. Role Responsibilities: The Warehouse Cargo Agent is responsible for handling and processing air cargo shipments efficiently and safely within a warehouse environment. This role...
- £30k - £39k per annumEstimatedWe take care of our employees, and they take care of our customers! Become a member of a global community! The international logistics industry is an integral piece of the global trade puzzle; we make the world go round. Global supply chain management is what we do, and...Full-time
£14.8 per hour
...Operational Support Agent – Full Time – Days and Nights - 7am until 7pm - £14.80 per hour Do you have excellent customer service skills? Are you a team player? Do you have good communication skills? Do you have a valid UK Driving License? If you answered yes to these questions...Hourly payFull-timeNight shift- £31k - £40k per annumEstimatedWe take care of our employees, and they take care of our customers! Become a member of a global community! The international logistics industry is an integral piece of the global trade puzzle; we make the world go round. Global supply chain management is what we do, and...Full-time
£13.5 - £20.25 per hour
...Logistics Agent Are you seeking a new opportunity? Join our client's team at Uxbridge and enjoy: Competitive Pay: Earn £13.50 per hour, overtime rates of £20.25 paid after 30 hours per week. Rates are PAYE, 28 days holiday per year which increase with length of service...Hourly payFull-timeTemporaryImmediate startMonday to FridayShift work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Freelance Agent Evaluation Engineer. Be the first to apply!

