Tech

Definity embeds agents within Spark pipelines to catch failures before they reach the agent’s AI systems

0 6 4 minutes read

Definity embeds agents within Spark pipelines to catch failures before they reach the agent’s AI systems

For many data engineering teams, managing pipeline reliability often means waiting for an alert, manually tracking failures across distributed operations and clusters, and fixing problems after they’ve already entered the business. Agent AI needs data to be there, clean and timely. A pipeline that silently fails or delivers stale data doesn’t just break a dashboard – it breaks the AI system depending on it.

That gap is what Definity, a Chicago-based data pipeline startup, is building on: agents that embed directly inside a Spark or DBT driver to run during pipeline execution, not after it. One enterprise customer identified 33% of its development opportunities in the first week of deployment and reduced troubleshooting and development effort by 70%, according to Definity. The company also claims that customers resolve Spark issues up to 10x faster.

"You need three major things for agent data operations: real-time, full-stack context and production information. Pipeline control. And the ability to validate in a feedback loop. Besides, you can be outside looking in and only reading," Roy Daniel, CEO and founder of Definity told VentureBeat in an exclusive interview.

The company announced on Wednesday that it has raised $12 million in Series A funding led by GreatPoint Ventures, with participation from Dynatrace and existing investors StageOne Ventures and Hyde Park Venture Partners.

Why is existing pipeline monitoring falling short of scale

Existing tools address the problem from outside the release layer – Datadog, which acquired Metaplane for data quality monitoring last year, Databricks system tables, and platforms like Unravel Data and Acceldata all read after the job is done. Dynatrace has monitoring capabilities; also participated in Definity’s Series A.

The Definity method is distinguished from other options by the way the solution is constructed. According to Daniel, that means that by the time a field monitoring tool encounters a problem, the pipeline is already running – and a failure, wasted computer or bad data is already downstream.

"It’s always after the fact," Daniel said. "By the time you knew something, it had happened."

How agents work within the Definity execution

The main architectural difference is where the agent lives – inside the pipe rather than watching outside of it.

Line instruments. The Definity system embeds the JVM agent directly within the pipeline’s execution layer with a single line of code, running beneath the platform layer and pulling signature data directly from Spark.

Context to execute at runtime. The agent captures query execution behavior, memory pressure, data skew, shuffling patterns and infrastructure utilization as the pipeline progresses. It also reduces the queue between pipes and tables dynamically – no predefined data catalog is required.

Intervention, not just observation. The agent can change resource allocations during runtime, stopping a job before bad data spreads or prioritizing a pipeline based on rising data conditions. Daniel described one production deployment where an agent discovered that the above job had been previously issued and that the input table it was supposed to write was already old – and stopped the pipeline before it started, before the bad data reached any dependent systems.

What is not real time. Detection and prevention is real time. The root analysis and optimization recommendations run on demand when the developer asks the assistant, which has a full implementation context already included.

Surface and residential data. The agent adds about one second of computation to the hour-long runtime. Only metadata transfers externally; Full local deployment is available for locations where no metadata can leave the circuit.

What artificial intelligence looks like in a manufacturing environment

One of the early adopters of the Definity platform is Nexxen, an ad technology platform that uses Spark’s main pipeline for critical, on-premise advertising work.

Dennis Meyer, Director of Data Engineering at Nexxen, told VentureBeat that the main problem he faced was not pipeline failure but the cumulative cost of inefficiencies in an environment without scalable cloud capacity to absorb waste.

"The biggest challenge was not the pipe breaking, but about managing the expanding and larger area," Meyer said. "Because we operate on-prem, we don’t have the flexibility to quickly expand, so downtime has a direct cost impact."

Existing monitoring tools have given Nexxen some visibility but not enough to act systematically. "We had monitoring tools in place, but we needed full visibility to fully understand workload behavior and systematically prioritize optimization," Meyer said.

Nexxen released Definity without pipeline code changes. According to Meyer, the team identified 33% of its improvement opportunities within the first week, and engineering efforts in troubleshooting and optimization decreased by 70%. The platform freed up infrastructure capacity, allowing the team to support operational growth without additional hardware investment.

"The key shift was from problem solving to active, continuous improvement," Meyer said. "At scale, the biggest gap is often not the use of tools — it’s the visibility of what’s happening."

What does this mean for enterprise data teams

For data engineering teams using Spark production environments, moving from active monitoring to transactional intelligence has architectural and organizational implications to consider.

Pipeline ops is becoming an AI infrastructure problem. Data pipelines that previously supported analytics now carry business-specific AI workloads. Failures that were once a nuisance are now hindering the delivery of productive AI.

Troubleshoot time is a reimbursable cost. According to Meyer, Nexxen cut engineering effort in troubleshooting and optimization by 70% after using Definity. For slow-moving teams, that time back on the road map is the closest thing to evaluating this stage.

Mosegas 5 hours ago

0 6 4 minutes read