Hiop in 30 Minutes: From Install to Live Data Pipeline

Build a production data pipeline in 30 minutes: no infrastructure required. • Define your data logic in YAML • Test locally with mock data • Deploy with one command. Hiop handles execution, orchestration, and monitoring automatically.

Hiop in 30 Minutes: From Install to Live Data Pipeline
Declarative Data Pipelines

Most data engineers spend more time managing infrastructure than writing data logic. You deploy Airflow, configure dbt servers, manage orchestrators, monitor runtime - before moving a single record.

This is backwards.

You shouldn't need to build infrastructure to move data. You should define what data moves where and how it transforms. The execution layer should adapt automatically.

That's what Hiop does.

Traditional platforms: Deploy infrastructure → Write code that runs on it → Configure orchestration → Manage runtime

Hiop: Define logic in YAML → Commit to Git → Deploy (backpack sync && backpack up) → Serverless execution handles the rest

This walkthrough demonstrates the difference. In 30 minutes, you'll build a production data pipeline without deploying any infrastructure. You'll define semantic models, transformations, and data routes in YAML. Hiop compiles these into execution plans and allocates runtime on demand.

If you've built data pipelines before, this will feel different. That's the point.

What You'll Build

A pipeline that:

  1. Defines semantic model for customers data (YAML)
  2. Declares data route: PostgreSQL → transform with SQL → S3
  3. Tests locally with mock data
  4. Deploys serverlessly with one command
  5. Runs on schedule automatically

You write logic. Hiop handles execution.

Time: ~30 minutes. Prerequisites: Python 3.8+, database credentials.


Step 1: Install Backpack CLI (2 minutes)

Download the wheel from Web Console > Downloads.

Execute:

pip install pybackpack-<version>.whl

Verify:

backpack --version


Step 2: Configure Credentials (1 minute)

Execute:

backpack setup --username your-email@company.com --password your-password

Your credentials are encrypted locally. For multiple environments (dev/prod), create profiles:

backpack setup --profile prod --username prod@company.com --password prodpass
backpack setup --profile dev --username dev@company.com --password devpass


Step 3: Initialize Workspace (2 minutes)

Create the new workspace from Web Console > Workspaces.

Then, back in the terminal, execute:

mkdir my-data-project
cd my-data-project
backpack init --organization your-org --workspace analytics-dev

This creates .jump/config.yml linking your local folder to the Hiop workspace.


Step 4: Add Data Locations (5 minutes)

Before creating pipelines, register where your data lives.

Option A: Via Web Console (recommended for first location)

  1. Go to the Web Console
  2. Navigate to Connect > Zones > Locations
  3. Click "Add Location"
  4. Select location type (PostgreSQL, S3, etc.)
  5. Enter connection details
  6. Save
  7. Assign the location to the workspace (Once saved it will automatically request to what workspace assign the location)

For this walkthrough, add:

  • Location 1: PostgreSQL database (source)
  • Location 2: S3 bucket (destination)

Step 5: Define Your First Resource (5 minutes)

Resources are semantic entities - your data ontology.

backpack add resource

Interactive prompts:

  • Resource name: customers
  • Description: Customers records from production database
  • Schema: Define columns (name, type, constraints)

Generated YAML structure:

description: Customers records

# Example schema in generated YAML:
schema:
  - name: id
    type: integer
  - name: email
    type: string
  - name: created_at
    type: timestamp
  - name: lifetime_value
    type: decimal

# **Variants** define how to read/write this resource at specific locations:
postgres_read:
    location: prod_postgres
    sql: 
	    query: SELECT * FROM public.customers

s3_write:
  location: analytics_s3
  parquet:
	  name: customers.parquet
    path_prefix: customers/

Step 6: Create an Itinerary (7 minutes)

Itineraries define data routes with optional transformations.

backpack add itinerary

Interactive prompts:

  • Itinerary name: customers_analytics
  • Source resource: customers (postgres_read variant)
  • Destination resource: customers (s3_write variant)
  • Transformations: Yes

Generated YAML structure:

description: Daily customer data sync with enrichment

inputs:
  customers:
	  postgres_read: {}

compute:
	steps:
	  - enrich:
	    sql:
		    query: |
		      SELECT *,
		        CASE
		          WHEN lifetime_value > 10000 THEN 'vip'
		          WHEN lifetime_value > 1000 THEN 'standard'
		          ELSE 'basic'
		        END as customer_tier
		      FROM input
		
outputs:
  customers:
	  s3_write: {}

schedules:
  fixed:
	  timezone: Europe/Rome
    cron: "0 2 * * *"  # Daily at 2 AM

Key concepts:

  • inputs: Where data comes from (resource + variant)
  • compute: Transformations (SQL, LLM prompts, etc.)
  • outputs: Where data goes (resource + variant)
  • schedules: When it runs (cron or event-driven)

Step 7: Test Locally (5 minutes)

Before deploying, test your logic with mock data.

Create test input data: test_data/customers.csv

id,email,created_at,lifetime_value
1,alice@example.com,2024-01-01,15000
2,bob@example.com,2024-02-01,500
3,carol@example.com,2024-03-01,5000

Create test expected result test_data/customers-tiered.csv

id,email,created_at,lifetime_value, customer_tier
1,alice@example.com,2024-01-01,15000, vip
2,bob@example.com,2024-02-01,500, basic
3,carol@example.com,2024-03-01,5000, standard

Create test file customers_analytics.test.yml

customers_analytics:
  overrides:
    inputs:
      customers:
        postgres_read:
          files:
            - ./test_data/customers.csv
  tests:
    customers_labeled:
      description: |
        Verify the users are correctly labeled with a tier
      output:
        equals:
          - /test_data/customers-tiered.csv

Run local test:

backpack test

Hiop runs your transformation against mock data, will compare output and will return success or fail based on the result

You defined logic in YAML. Hiop executed it locally. No infrastructure deployed. Same runtime that will execute in production.

This is why the declarative approach works: test the logic, not the infrastructure.


Step 8: Deploy (2 minutes)

# Sync local config to Hiop cloud
backpack sync

# Deploy and activate
backpack up

What just happened:

You wrote YAML describing WHAT you want:

  • Semantic model (customer resource)
  • Data route (PostgreSQL → S3)
  • Transformation logic (SQL enrichment)
  • Schedule (daily at 2 AM)

Hiop compiled this into HOW to execute:

  1. Generated execution plans from your YAML
  2. Allocated serverless runtime on demand
  3. Configured orchestration automatically
  4. Embedded lineage, quality checks, monitoring

You thought in logic. Hiop thought in execution.

No infrastructure deployed. No orchestrator configured. No separate catalog or quality tools.

This is the paradigm shift: declarative data logistics.


Step 9: Monitor Execution (Optional 1 minute)

Via CLI:

# Run on-demand
backpack run customers_analytics

# Check status
backpack status customers_analytics <run-id>

Via Web Console:

  1. Go to Workspaces > analytics-dev > Itineraries > customers_analytics
  2. Run your delivery by clicking on delivery
  3. View real-time logs
  4. Review data quality metrics

Everything's embedded - no separate monitoring tools.


What You Just Learned

The declarative paradigm:

  • Define WHAT: semantic models, transformations, routes (YAML)
  • Hiop handles HOW: compilation, runtime, orchestration, execution
  • Test logic locally before production
  • Deploy with sync && up

Collapsed stack:

  • No ETL infrastructure (serverless execution)
  • No separate orchestrator (schedule in YAML → auto-orchestrated)
  • No separate catalog (semantic model IS the catalog)
  • No separate quality tools (validation rules in schema)

Git-native workflow:

  • Logic lives in version control
  • Review changes like code
  • CI/CD for data (test → deploy → execute)

Next Steps

Add complexity gradually:

  1. LLM transformations:
compute:
  steps:
  - extract_intent:
	    ai:
		    table: customers:postgres_read
		    prompt: "Analyze this customer email and extract purchase intent: {{email_text}}"
		    model: claude_4_sonnet
		    result_column: purchase_intent

  1. Multi-step transformations:
steps:
  - clean:
	    sql:
		    query: "SELECT * FROM input WHERE email IS NOT NULL"
  - enrich:
	    ai:
	      table: customers:postgres_read
		    prompt: "Classify customer segment based on: {{attributes}}"
				model: claude_4_sonnet
		    result_column: customer_segment  
  - aggregate:
	    sql:
		    query: "SELECT segment, COUNT(*) FROM input GROUP BY segment"

  1. Multiple destinations:
outputs:
  - customers:
	    s3_write: {}
  - customers:
	    snowflake_write: {}

  1. Missions (orchestrated graphs of itineraries):
backpack add mission

The Workflow

Development cycle:

  1. Edit YAML locally
  2. Test with mock data (backpack test)
  3. Commit to Git
  4. Sync to Hiop (backpack sync)
  5. Review in Web Console (optional)
  6. Deploy (backpack up)

Daily operations:

  • Web Console: Monitor executions, trigger backfills
  • CLI: Run on-demand jobs, check status

The Bottom Line

You built a production pipeline in 30 minutes by defining logic in YAML. Hiop compiled it, allocated runtime, and handles orchestration.

Start building: https://hiop.io/request-dev-access