tiktok-pipeline-extension

TikTok Crisis Classification Pipeline

A multimodal AI pipeline that classifies TikTok videos of natural disasters into a structured crisis response taxonomy. Given a TikTok URL, it downloads the video, extracts frames, and uses a vision-capable LLM to determine what supplies, personnel, and actions are being requested or offered.

Deployed as a REST API on Google Cloud Run with CI/CD via GitHub Actions.

How It Works

Download — yt-dlp fetches the video and metadata into a temporary directory
Extract — OpenCV samples one frame per second, resizes to 448×448, and base64-encodes each frame
Classify — Frames and post text are sent to a vision LLM with a structured prompt grounded in a humanitarian crisis taxonomy
Return — The LLM returns a JSON object with labels, confidence score, and visual evidence summary
Cleanup — The temporary video file is deleted

Output Format

{
  "text": "post title and description",
  "type": ["Request"],
  "action_request": ["Search and Rescue"],
  "personnel_request": ["Search and Rescue Teams"],
  "supplies_request": [],
  "action_offer": [],
  "personnel_offer": [],
  "supplies_offer": [],
  "actionability": true,
  "explanation": "...",
  "visual_evidence": "...",
  "confidence": 0.91,
  "insufficient_visual_evidence": false
}

The taxonomy covers three top-level categories — Supplies, Emergency Personnel, and Actions — each with detailed subcategories drawn from humanitarian response frameworks.

API

Live endpoint: https://tiktok-pipeline-747586044805.us-central1.run.app

`POST /classify`

{
  "url": "https://www.tiktok.com/@user/video/...",
  "provider": "gemini",
  "api_key": "YOUR_API_KEY"
}

Field	Required	Values
`url`	Yes	Any public TikTok URL
`provider`	No	`gemini` (default), `openai`, `claude`
`api_key`	No	Falls back to environment variable

`GET /health`

Returns {"status": "ok"}.

Interactive docs available at /docs.

Local Development

Prerequisites: Python 3.11+, uv

git clone https://github.com/JDittles/tiktok-pipeline-extension
cd tiktok-pipeline-extension
uv sync
cp .env.example .env   # fill in your API keys

Run the API server:

ENV=development python main.py --serve

Classify a single video via CLI:

python main.py "https://www.tiktok.com/@user/video/..."

Test against the local server:

Edit TIKTOK_URL and PROVIDER in experimental/test_classify.py, then:

python experimental/test_classify.py

Supported Providers

Provider	Model	Notes
`gemini`	gemini-2.0-flash	Default. Requires `GEMINI_API_KEY`
`openai`	gpt-4o	Requires `OPENAI_API_KEY`
`claude`	claude-sonnet-4-6	Requires `ANTHROPIC_API_KEY`
`ollama`	gemma4:e4b	Local only. Requires Ollama

Deployment

The project deploys automatically to Google Cloud Run on every push to main.

Stack: Docker → Google Artifact Registry → Cloud Run

Required GitHub Secrets:

Secret	Description
`GOOGLE_CREDENTIALS`	Service account JSON key
`GCP_PROJECT_ID`	GCP project ID
`GCP_REGION`	e.g. `us-central1`
`CLOUD_RUN_SERVICE`	Cloud Run service name

To deploy manually:

docker build -t tiktok-pipeline .
docker run -p 8000:8000 --env-file .env tiktok-pipeline

Project Structure

src/
├── api.py                  # FastAPI app
└── pipeline/
    ├── classify.py         # Orchestrates the full pipeline
    ├── download.py         # yt-dlp video download
    ├── video.py            # Frame extraction via OpenCV
    └── prompts.py          # Taxonomy, prompt construction
main.py                     # CLI + server entry point
Dockerfile
.github/workflows/deploy.yml