LSF Web Services提供RESTful API和编程接口,实现LSF与第三方系统的集成和自动化工作流开发。

API概述

RESTful API

基于HTTP的标准REST接口:

1
2
3
4
5
6
GET /lsf/v1/jobs              # 列出作业
POST /lsf/v1/jobs             # 提交作业
GET /lsf/v1/jobs/{id}         # 查询作业详情
DELETE /lsf/v1/jobs/{id}      # 终止作业
GET /lsf/v1/clusters/hosts    # 查询主机
GET /lsf/v1/clusters/queues   # 查询队列

认证

支持多种认证方式:

1
2
3
4
5
6
7
8
9
10
11
# API Key认证
curl -H "X-API-Key: your_api_key" \
  https://lsf-api.company.com/lsf/v1/jobs

# OAuth2
curl -H "Authorization: Bearer {token}" \
  https://lsf-api.company.com/lsf/v1/jobs

# Basic Auth
curl -u username:password \
  https://lsf-api.company.com/lsf/v1/jobs

作业管理API

提交作业

1
2
3
4
5
6
7
8
9
curl -X POST https://lsf-api.company.com/lsf/v1/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "command": "./my_simulation",
    "queue": "normal",
    "numProcessors": 16,
    "resourceReq": "rusage[mem=8000]",
    "jobName": "simulation_001"
  }'

响应:

1
2
3
4
{
  "jobId": 12345,
  "status": "PEND"
}

查询作业状态

1
curl https://lsf-api.company.com/lsf/v1/jobs/12345

响应:

1
2
3
4
5
6
7
8
9
10
11
{
  "jobId": 12345,
  "jobName": "simulation_001",
  "user": "john",
  "status": "RUN",
  "queue": "normal",
  "numProcessors": 16,
  "submitTime": "2025-12-17T23:00:00Z",
  "startTime": "2025-12-17T23:05:00Z",
  "execHost": "compute01"
}

终止作业

1
curl -X DELETE https://lsf-api.company.com/lsf/v1/jobs/12345

集群信息API

查询主机

1
curl https://lsf-api.company.com/lsf/v1/clusters/hosts

查询队列

1
curl https://lsf-api.company.com/lsf/v1/clusters/queues

Python SDK

官方Python库:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from lsf import Client

# 连接LSF
client = Client(
    host='lsf-api.company.com',
    api_key='your_api_key'
)

# 提交作业
job = client.submit_job(
    command='./simulation',
    queue='normal',
    num_processors=16,
    mem='8GB'
)

print(f"Job ID: {job.id}")

# 等待作业完成
job.wait()

# 获取作业状态
status = job.get_status()
print(f"Status: {status}")

# 获取作业输出
output = job.get_output()
print(output)

批量作业管理

1
2
3
4
5
6
7
8
9
10
11
12
13
# 提交作业阵列
jobs = client.submit_job_array(
    command='./process %I',
    array_spec='1-100',
    queue='normal'
)

# 等待所有作业完成
for job in jobs:
    job.wait()

# 收集结果
results = [job.get_output() for job in jobs]

Java SDK

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import com.ibm.lsf.client.LSFClient;
import com.ibm.lsf.client.Job;

LSFClient client = new LSFClient("lsf-api.company.com", apiKey);

Job job = client.submitJob()
    .command("./simulation")
    .queue("normal")
    .numProcessors(16)
    .submit();

System.out.println("Job ID: " + job.getId());

// 等待完成
job.waitForCompletion();

与CI/CD集成

Jenkins集成

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Jenkinsfile
stage('LSF Regression') {
    steps {
        script {
            def lsf = new LSFClient(env.LSF_API_URL, env.LSF_API_KEY)
            
            // 提交测试作业
            def job = lsf.submitJob(
                command: './run_tests.sh',
                queue: 'ci_queue',
                numProc: 8
            )
            
            // 等待完成
            job.wait(timeout: '1h')
            
            // 检查状态
            if (job.exitCode != 0) {
                error("Tests failed")
            }
        }
    }
}

GitLab CI集成

1
2
3
4
5
6
7
8
9
10
11
12
13
# .gitlab-ci.yml
simulation:
  script:
    - |
      JOB_ID=$(curl -X POST $LSF_API_URL/jobs \
        -H "X-API-Key: $LSF_API_KEY" \
        -d '{"command":"./sim.sh","queue":"ci"}' \
        | jq -r '.jobId')
      
      # 等待完成
      while [ "$(curl $LSF_API_URL/jobs/$JOB_ID | jq -r '.status')" != "DONE" ]; do
        sleep 10
      done

WebHooks

配置事件通知:

1
2
3
4
5
{
  "webhookUrl": "https://your-app.com/lsf-webhook",
  "events": ["JOB_STARTED", "JOB_FINISHED", "JOB_FAILED"],
  "queue": "normal"
}

接收通知:

1
2
3
4
5
6
7
8
9
10
11
12
from flask import Flask, request

app = Flask(__name__)

@app.route('/lsf-webhook', methods=['POST'])
def lsf_webhook():
    event = request.json
    if event['type'] == 'JOB_FINISHED':
        job_id = event['jobId']
        # 处理作业完成事件
        process_results(job_id)
    return '', 200

批量操作

批量提交

1
2
3
4
5
6
curl -X POST https://lsf-api.company.com/lsf/v1/jobs/batch \
  -d '[
    {"command": "./task1"},
    {"command": "./task2"},
    {"command": "./task3"}
  ]'

批量查询

1
2
# 查询多个作业
curl "https://lsf-api.company.com/lsf/v1/jobs?ids=123,124,125"

实时监控

WebSocket流

1
2
3
4
5
6
7
8
9
10
11
12
const ws = new WebSocket('wss://lsf-api.company.com/lsf/v1/stream');

ws.onmessage = (event) => {
    const update = JSON.parse(event.data);
    console.log(`Job ${update.jobId}: ${update.status}`);
};

// 订阅作业更新
ws.send(JSON.stringify({
    type: 'subscribe',
    jobIds: [12345, 12346]
}));

Portal开发

使用Web Services构建自定义Portal:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// React示例
import { LSFClient } from '@ibm/lsf-client';

const JobSubmitForm = () => {
    const [job, setJob] = useState(null);
    
    const submitJob = async (formData) => {
        const client = new LSFClient(API_URL, API_KEY);
        const newJob = await client.submitJob({
            command: formData.command,
            queue: formData.queue,
            numProcs: formData.cores
        });
        setJob(newJob);
    };
    
    return <Form onSubmit={submitJob} />;
};

安全最佳实践

  1. HTTPS only:强制加密通信
  2. API Key轮换:定期更换密钥
  3. 限流:防止API滥用
  4. 审计:记录所有API调用
  5. 最小权限:API Key绑定特定操作

性能优化

缓存

1
2
3
4
5
from functools import lru_cache

@lru_cache(maxsize=128)
def get_queue_info(queue_name):
    return client.get_queue(queue_name)

批量请求

避免循环调用API:

1
2
3
4
5
6
7
# ❌ 低效
for job_id in job_ids:
    status = client.get_job(job_id).status

# ✅ 高效
jobs = client.get_jobs(job_ids)
statuses = [job.status for job in jobs]

总结

LSF Web Services通过标准REST API和丰富的SDK,使LSF能够无缝集成到现代DevOps工具链、自动化流程和自定义应用中,大大扩展了LSF的应用场景。


参考资源