English δΈ­ζ–‡

Documentation

OTK Prediction API

High-performance Scientific Computing API for ecDNA (extrachromosomal DNA) Prediction Service based on the OTK and GCAP projects.

🌐 Public API Address

Production API: http://biotree.top:38123/otk/

API Base URL: http://biotree.top:38123/otk/api/v1/

✨ Features

πŸš€ Quick Start

Using the Public API

You can immediately start using the public API without any installation:

# Health check
curl http://biotree.top:38123/otk/api/v1/health

# Submit prediction (async)
curl -X POST "http://biotree.top:38123/otk/api/v1/predict" \
  -F "file=@your_data.csv"

# Submit prediction (sync)
curl -X POST "http://biotree.top:38123/otk/api/v1/predict-sync" \
  -F "file=@your_data.csv"

Running Locally

  1. Install Dependencies bash cd otk/otk_api pip install -r requirements.txt

  2. Start the API bash cd otk/otk_api ./start_api.sh

  3. Access

  4. API: http://localhost:8000/api/v1/
  5. Web Interface: http://localhost:8000/

πŸ“‘ API Documentation

1. Health Check

Endpoint: GET /api/v1/health

Response:

{
  "status": "healthy",
  "version": "1.0.0",
  "gpu_available": false,
  "gpu_count": 0,
  "cpu_count": 192,
  "active_jobs": 0,
  "queue_size": 0
}

2. Submit Prediction (Async)

Endpoint: POST /api/v1/predict

Parameters:

Response:

{
  "id": "af0e5298-b326-40ca-83b5-76f54ad212e6",
  "status": "pending",
  "created_at": "2026-02-12T09:54:25.495083",
  "validation_report": {
    "is_valid": true,
    "errors": [],
    "warnings": ["Optional column missing: intersect_ratio, using default value 1.0"]
  }
}

3. Submit Prediction (Sync)

Endpoint: POST /api/v1/predict-sync

Parameters:

Response:

4. Get Task Status

Endpoint: GET /api/v1/jobs/{job_id}

Response:

{
  "id": "af0e5298-b326-40ca-83b5-76f54ad212e6",
  "status": "completed",
  "progress": 1.0,
  "completed_at": "2026-02-12T09:54:26.292634"
}

5. Download Results

Endpoint: GET /api/v1/jobs/{job_id}/download

Response:

6. Get Statistics

Endpoint: GET /api/v1/statistics

Response:

{
  "total_jobs": 28,
  "completed_jobs": 14,
  "failed_jobs": 13,
  "avg_processing_time": 0.605,
  "cpu_jobs": 14,
  "gpu_jobs": 5
}

πŸ“Š Data Format Requirements

Minimal Required Columns

For basic prediction, your CSV file only needs these minimum columns:

Column Description
sample Sample ID
gene_id Gene identifier
segVal Segment value

However, for optimal prediction accuracy, we recommend including as many features as possible.

Recommended Columns

Column Description Required by API Auto-fill Default
sample Sample ID βœ… Yes -
gene_id Gene identifier (e.g., ENSG00000284662) βœ… Yes -
segVal Gene total copy number βœ… Yes -
minor_cn Minor copy number βœ… Yes 0
purity Tumor purity βœ… Yes 0.8
ploidy Ploidy level βœ… Yes 2.0
AScore A-score value βœ… Yes 10.0
pLOH Loss of heterozygosity probability βœ… Yes 0.1
cna_burden Copy number alteration burden βœ… Yes 0.2
CN1 to CN19 Chromosome copy number signatures ⚠️ Recommended 0.05 each

Optional Columns

Column Description Auto-fill Behavior
type Cancer type (e.g., BRCA, LUAD) Auto-converts to type_* columns
age Sample age Filled with mean value
gender Gender (0/1 or Male/Female) Filled with 0
intersect_ratio Intersection ratio Filled with 1.0
y Ground truth label (for validation) Not used in prediction

Auto-Generated Features

The system automatically generates these features - you do NOT need to provide them:

Feature Type Columns Source
Cancer Type type_BLCA, type_BRCA, ... (24 columns) Converted from type column
Gene Frequency freq_Linear, freq_BFB, freq_Circular, freq_HR Matched from gene_id using precomputed prior data

Cancer Types

The following cancer types are supported (for type column):

BLCA, BRCA, CESC, COAD, DLBC, ESCA, GBM, HNSC,
KICH, KIRC, KIRP, LGG, LIHC, LUAD, LUSC, OV,
PRAD, READ, SARC, SKCM, STAD, THCA, UCEC, UVM

If an invalid cancer type is provided, all type_* columns will be set to 0.

Example Data

Minimal input (3 columns):

sample,gene_id,segVal
TCGA-TEST-01,ENSG00000284662,3.2
TCGA-TEST-01,ENSG00000187634,2.5

Recommended input (with type column):

sample,gene_id,segVal,minor_cn,purity,ploidy,AScore,pLOH,cna_burden,age,gender,type,CN1,CN2,CN3,CN4,CN5,CN6,CN7,CN8,CN9,CN10,CN11,CN12,CN13,CN14,CN15,CN16,CN17,CN18,CN19
TCGA-TEST-01,ENSG00000284662,3.2,1.1,0.85,2.8,12.5,0.15,0.25,65,1,LUSC,0.1,0.2,0.3,0.1,0.05,0.05,0.05,0.05,0.02,0.02,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01

Alternative: Using pre-encoded type_* columns:

sample,gene_id,segVal,minor_cn,purity,ploidy,AScore,pLOH,cna_burden,type_BRCA,type_LUAD,...(other type_* columns),CN1,CN2,...
TCGA-TEST-01,ENSG00000284662,3.2,1.1,0.85,2.8,12.5,0.15,0.25,1,0,...,0.1,0.2,...

Data Validation

The API validates your data and returns a detailed report:

{
  "validation_report": {
    "is_valid": true,
    "errors": [],
    "warnings": [
      "Optional column missing: intersect_ratio, will use default value 1.0",
      "CN signature columns incomplete: found 15/19 columns",
      "Missing type column, cannot validate cancer type"
    ],
    "info": {
      "total_rows": 100,
      "unique_samples": 50,
      "unique_genes": 100
    }
  }
}

🎯 Prediction Output

Output Format

The prediction result CSV includes:

Column Description
sample Sample ID
gene_id Gene identifier
prediction_prob Probability of ecDNA occurrence
prediction Binary prediction (0=no, 1=yes)
sample_level_prediction_label Overall sample prediction label
sample_level_prediction Overall sample prediction (0/1)

Example Output

sample,gene_id,prediction_prob,prediction,sample_level_prediction_label,sample_level_prediction
TCGA-TEST-01,ENSG00000284662,0.000279,0,nofocal,0
TCGA-TEST-01,ENSG00000187634,0.002650,0,nofocal,0
TCGA-TEST-01,ENSG00000243073,0.000036,0,nofocal,0

🌐 Web Interface

The API includes a user-friendly web interface:

Access

Language Support

πŸ“ Project Structure

otk_api/
β”œβ”€β”€ api/                  # API implementation
β”‚   β”œβ”€β”€ main.py           # FastAPI application
β”‚   β”œβ”€β”€ predictor_wrapper.py  # Prediction job handler
β”‚   └── routes/           # API endpoints
β”œβ”€β”€ config.yml           # Configuration file
β”œβ”€β”€ models/              # Model storage
β”‚   └── baseline/         # Example model
β”œβ”€β”€ uploads/              # Uploaded files
β”œβ”€β”€ results/              # Prediction results
β”œβ”€β”€ logs/                 # Log files
β”œβ”€β”€ start_api.sh          # Startup script
└── README.md             # This documentation

⚠️ Important Notes

  1. Job ID Security: Save your Job ID securely for async tasks. It's needed to query status and download results.

  2. Data Retention:

  3. Result files: Automatically deleted after 3 days
  4. Job records: Kept permanently for audit purposes

  5. File Size Limit: Maximum upload size is 100MB

  6. Processing Time: Depends on data size and server load, typically 1-5 seconds per sample

  7. Error Handling: If you receive an error, check your data format and try again

πŸ› οΈ Troubleshooting

Common Issues

  1. File Upload Errors

  2. Ensure your file is a valid CSV

  3. Check that all required columns are present
  4. Verify file size is under 100MB

  5. Prediction Failed

  6. Check server logs for detailed error messages

  7. Verify your data format matches requirements
  8. Try with a smaller dataset first

  9. API Unresponsive

  10. Check if the server is running

  11. Verify network connectivity
  12. Try the health check endpoint

πŸ“ž Support

For questions or issues:

  1. GitHub Issues: OTK Repository
  2. Email: Contact the maintainers
  3. Documentation: This README and API endpoints

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


Last Updated: February 12, 2026 Version: 1.1.0 Maintainers: Wang Lab @ CSU