Data pipelines reference

Configuration fields, execution behavior, and limits for data pipelines. For an overview of how pipelines work, see Data pipelines overview.

Pipeline configuration fields

FieldTypeRequiredDescription
namestringYesPipeline name. Must be unique within the organization.
organization_idstringYesOrganization UUID.
schedulestringYesCron expression in UTC. Determines both when the pipeline runs and the query time window. See Cron schedule.
mql_binaryarrayYesMQL aggregation pipeline as an array of stage objects. See Supported MQL operators.
enable_backfillboolYesWhether to process historical time windows. See Backfill behavior.
data_source_typeenumNoData source to query. Default: standard. See Data source types.

Cron schedule

The schedule field uses standard five-field cron syntax: minute hour day-of-month month day-of-week. All times are UTC.

The schedule determines both when the pipeline runs and the time range it queries. Each run processes the time window between the previous two schedule ticks.

ScheduleFrequencyQuery time range per run
0 * * * *HourlyPrevious hour
0 0 * * *DailyPrevious day
*/15 * * * *Every 15 minutesPrevious 15 minutes
*/5 * * * *Every 5 minutesPrevious 5 minutes

For example, a pipeline with schedule 0 * * * * that triggers at 03:00 PM UTC processes data from 02:00 PM to 03:00 PM UTC. The time window is [start, end) (start inclusive, end exclusive).

Choose a schedule that matches how frequently you need updated summaries. Shorter intervals produce more granular summaries but create more pipeline sink documents.

Data source types

TypeCLI flagPython SDKGo SDKDescription
StandardstandardTabularDataSourceType.TABULAR_DATA_SOURCE_TYPE_STANDARDapp.TabularDataSourceTypeStandardQueries the raw readings collection. Contains all historical tabular data.
Hot storagehotstorageTabularDataSourceType.TABULAR_DATA_SOURCE_TYPE_HOT_STORAGEapp.TabularDataSourceTypeHotStorageQueries the hot data store. Rolling window of recent data.
Pipeline sink(query only)TabularDataSourceType.TABULAR_DATA_SOURCE_TYPE_PIPELINE_SINKapp.TabularDataSourceTypePipelineSinkQueries the output of another pipeline. Requires a pipeline_id.

Run statuses

StatusValueDescription
UNSPECIFIED0Unknown or not set.
SCHEDULED1Run is queued. Execution begins after a 2-minute delay.
STARTED2MQL query is executing against the data source.
COMPLETED3Run finished successfully. Results are in the pipeline sink.
FAILED4Run encountered an error. Check the error_message field on the run.

If a run stays in STARTED for more than 10 minutes, it is automatically marked as FAILED and a new run is created for that time window.

Run fields

Each pipeline run record contains:

FieldTypeDescription
idstringRun identifier.
statusenumCurrent status. See Run statuses.
start_timetimestampWhen the run started executing.
end_timetimestampWhen the run completed or failed.
data_start_timetimestampStart of the data time window this run processed (inclusive).
data_end_timetimestampEnd of the data time window this run processed (exclusive).
error_messagestringError details if the run failed. Empty on success.

Backfill behavior

When enable_backfill is true:

  • On pipeline creation, Viam processes historical time windows backward from the creation time to the earliest available data.
  • When data syncs with a delay (machine was offline), the pipeline automatically reruns affected time windows to include the late-arriving data.
  • Backfill processes in batches of up to 10 concurrent time windows with a 2-minute delay between batches.
  • For standard data source, backfill may provision an Atlas Data Federation instance for faster historical queries.
  • Backfill results replace any existing results for the same time window.

When enable_backfill is false:

  • Each time window is processed exactly once.
  • Late-arriving data is not incorporated into past summaries.

Backfill does not apply to windows missed while a pipeline was disabled. If you disable a pipeline for 3 hours and re-enable it, those 3 hours are not backfilled.

Pipeline sink

Each pipeline stores its output in a dedicated sink collection named sink-<pipeline-id>. Each result document includes metadata:

{
  "_viam_pipeline_run": {
    "id": "run-id",
    "interval": {
      "start": "2025-03-15T14:00:00.000Z",
      "end": "2025-03-15T15:00:00.000Z"
    },
    "organization_id": "org-id"
  },
  "location": "warehouse-a",
  "avg_temp": 23.5,
  "count": 3600
}

The _viam_pipeline_run field is added automatically. Your pipeline’s $project output fields appear alongside it.

To query the sink, use data source type pipeline_sink with the pipeline’s ID. See Query pipeline results.

Deleting a pipeline deletes the sink collection and all its data. Export results before deleting if you need to preserve them.

Execution limits

LimitValue
Maximum output documents per run10,000
MQL execution timeout5 minutes
Execution start delay2 minutes after scheduled time
Hung run detection2x execution timeout (currently 10 minutes) in STARTED state
Backfill batch size10 concurrent time windows
Backfill throttle2-minute delay between batches

Permissions

Only organization owners can create, modify, and delete data pipelines. Query access to pipeline results follows the same permissions as other data queries. See Permissions.