High-performance Scientific Computing API for ecDNA (extrachromosomal DNA) Prediction Service based on the OTK and GCAP projects.
Production API: http://biotree.top:38123/otk/
API Base URL: http://biotree.top:38123/otk/api/v1/
models/ directoryYou can immediately start using the public API without any installation:
# Health check
curl http://biotree.top:38123/otk/api/v1/health
# Submit prediction (async)
curl -X POST "http://biotree.top:38123/otk/api/v1/predict" \
-F "file=@your_data.csv"
# Submit prediction (sync)
curl -X POST "http://biotree.top:38123/otk/api/v1/predict-sync" \
-F "file=@your_data.csv"
Install Dependencies
bash
cd otk/otk_api
pip install -r requirements.txt
Start the API
bash
cd otk/otk_api
./start_api.sh
Access
Endpoint: GET /api/v1/health
Response:
{
"status": "healthy",
"version": "1.0.0",
"gpu_available": false,
"gpu_count": 0,
"cpu_count": 192,
"active_jobs": 0,
"queue_size": 0
}
Endpoint: POST /api/v1/predict
Parameters:
file: CSV file with prediction dataResponse:
{
"id": "af0e5298-b326-40ca-83b5-76f54ad212e6",
"status": "pending",
"created_at": "2026-02-12T09:54:25.495083",
"validation_report": {
"is_valid": true,
"errors": [],
"warnings": ["Optional column missing: intersect_ratio, using default value 1.0"]
}
}
Endpoint: POST /api/v1/predict-sync
Parameters:
file: CSV file with prediction dataResponse:
Endpoint: GET /api/v1/jobs/{job_id}
Response:
{
"id": "af0e5298-b326-40ca-83b5-76f54ad212e6",
"status": "completed",
"progress": 1.0,
"completed_at": "2026-02-12T09:54:26.292634"
}
Endpoint: GET /api/v1/jobs/{job_id}/download
Response:
Endpoint: GET /api/v1/statistics
Response:
{
"total_jobs": 28,
"completed_jobs": 14,
"failed_jobs": 13,
"avg_processing_time": 0.605,
"cpu_jobs": 14,
"gpu_jobs": 5
}
For basic prediction, your CSV file only needs these minimum columns:
| Column | Description |
|---|---|
sample |
Sample ID |
gene_id |
Gene identifier |
segVal |
Segment value |
However, for optimal prediction accuracy, we recommend including as many features as possible.
| Column | Description | Required by API | Auto-fill Default |
|---|---|---|---|
sample |
Sample ID | β Yes | - |
gene_id |
Gene identifier (e.g., ENSG00000284662) | β Yes | - |
segVal |
Gene total copy number | β Yes | - |
minor_cn |
Minor copy number | β Yes | 0 |
purity |
Tumor purity | β Yes | 0.8 |
ploidy |
Ploidy level | β Yes | 2.0 |
AScore |
A-score value | β Yes | 10.0 |
pLOH |
Loss of heterozygosity probability | β Yes | 0.1 |
cna_burden |
Copy number alteration burden | β Yes | 0.2 |
CN1 to CN19 |
Chromosome copy number signatures | β οΈ Recommended | 0.05 each |
| Column | Description | Auto-fill Behavior |
|---|---|---|
type |
Cancer type (e.g., BRCA, LUAD) | Auto-converts to type_* columns |
age |
Sample age | Filled with mean value |
gender |
Gender (0/1 or Male/Female) | Filled with 0 |
intersect_ratio |
Intersection ratio | Filled with 1.0 |
y |
Ground truth label (for validation) | Not used in prediction |
The system automatically generates these features - you do NOT need to provide them:
| Feature Type | Columns | Source |
|---|---|---|
| Cancer Type | type_BLCA, type_BRCA, ... (24 columns) |
Converted from type column |
| Gene Frequency | freq_Linear, freq_BFB, freq_Circular, freq_HR |
Matched from gene_id using precomputed prior data |
The following cancer types are supported (for type column):
BLCA, BRCA, CESC, COAD, DLBC, ESCA, GBM, HNSC,
KICH, KIRC, KIRP, LGG, LIHC, LUAD, LUSC, OV,
PRAD, READ, SARC, SKCM, STAD, THCA, UCEC, UVM
If an invalid cancer type is provided, all type_* columns will be set to 0.
Minimal input (3 columns):
sample,gene_id,segVal
TCGA-TEST-01,ENSG00000284662,3.2
TCGA-TEST-01,ENSG00000187634,2.5
Recommended input (with type column):
sample,gene_id,segVal,minor_cn,purity,ploidy,AScore,pLOH,cna_burden,age,gender,type,CN1,CN2,CN3,CN4,CN5,CN6,CN7,CN8,CN9,CN10,CN11,CN12,CN13,CN14,CN15,CN16,CN17,CN18,CN19
TCGA-TEST-01,ENSG00000284662,3.2,1.1,0.85,2.8,12.5,0.15,0.25,65,1,LUSC,0.1,0.2,0.3,0.1,0.05,0.05,0.05,0.05,0.02,0.02,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01
Alternative: Using pre-encoded type_* columns:
sample,gene_id,segVal,minor_cn,purity,ploidy,AScore,pLOH,cna_burden,type_BRCA,type_LUAD,...(other type_* columns),CN1,CN2,...
TCGA-TEST-01,ENSG00000284662,3.2,1.1,0.85,2.8,12.5,0.15,0.25,1,0,...,0.1,0.2,...
The API validates your data and returns a detailed report:
{
"validation_report": {
"is_valid": true,
"errors": [],
"warnings": [
"Optional column missing: intersect_ratio, will use default value 1.0",
"CN signature columns incomplete: found 15/19 columns",
"Missing type column, cannot validate cancer type"
],
"info": {
"total_rows": 100,
"unique_samples": 50,
"unique_genes": 100
}
}
}
The prediction result CSV includes:
| Column | Description |
|---|---|
sample |
Sample ID |
gene_id |
Gene identifier |
prediction_prob |
Probability of ecDNA occurrence |
prediction |
Binary prediction (0=no, 1=yes) |
sample_level_prediction_label |
Overall sample prediction label |
sample_level_prediction |
Overall sample prediction (0/1) |
sample,gene_id,prediction_prob,prediction,sample_level_prediction_label,sample_level_prediction
TCGA-TEST-01,ENSG00000284662,0.000279,0,nofocal,0
TCGA-TEST-01,ENSG00000187634,0.002650,0,nofocal,0
TCGA-TEST-01,ENSG00000243073,0.000036,0,nofocal,0
The API includes a user-friendly web interface:
?lang=en for English: http://biotree.top:38123/otk/?lang=en?lang=zh for Chinese: http://biotree.top:38123/otk/?lang=zhotk_api/
βββ api/ # API implementation
β βββ main.py # FastAPI application
β βββ predictor_wrapper.py # Prediction job handler
β βββ routes/ # API endpoints
βββ config.yml # Configuration file
βββ models/ # Model storage
β βββ baseline/ # Example model
βββ uploads/ # Uploaded files
βββ results/ # Prediction results
βββ logs/ # Log files
βββ start_api.sh # Startup script
βββ README.md # This documentation
Job ID Security: Save your Job ID securely for async tasks. It's needed to query status and download results.
Data Retention:
Job records: Kept permanently for audit purposes
File Size Limit: Maximum upload size is 100MB
Processing Time: Depends on data size and server load, typically 1-5 seconds per sample
Error Handling: If you receive an error, check your data format and try again
File Upload Errors
Ensure your file is a valid CSV
Verify file size is under 100MB
Prediction Failed
Check server logs for detailed error messages
Try with a smaller dataset first
API Unresponsive
Check if the server is running
For questions or issues:
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
Last Updated: February 12, 2026 Version: 1.1.0 Maintainers: Wang Lab @ CSU