This tutorial focuses on the Prompt Tuning method, which optimizes your agent’s system prompt without modifying model weights. This is a lightweight alternative to model weight tuning, ideal for rapid prototyping and scenarios with limited computational resources.
Prerequisites
Before proceeding, ensure you have:
- AgentScope v1.0.14 or higher
- A DashScope API key (or another supported provider)
Install the required dependencies:
Set your API key:
export DASHSCOPE_API_KEY="your_api_key_here"
Never commit your API key to version control. Use environment variables or a .env file (with python-dotenv) for local development.
Refine Workflow Function
Continuing from Overview, assume you have a task dataset and judge function ready. Now, refine your workflow function for prompt tuning:
async def workflow(
task: Dict,
system_prompt: str,
) -> WorkflowOutput:
agent = ReActAgent(
name="react_agent",
sys_prompt=system_prompt, # use the optimizable system prompt
model=model,
formatter=OpenAIChatFormatter(),
)
response = await agent.reply(
msg=Msg("user", task["question"], role="user"),
)
return WorkflowOutput(
response=response,
)
The workflow function for prompt tuning accepts system_prompt (a str) instead of model. The system_prompt is optimized iteratively during the tuning process.
Configuration & Tuning
After refining the workflow function, set up the tuning configuration and start the tuning process.
Key configurations include:
-
DatasetConfig: Specifies the task dataset for tuning.
path: The path to the dataset, which can be a local path or a Hugging Face dataset.
split: The split to use for tuning (e.g., "train", "test").
name: (Optional) The subset name for Hugging Face datasets with multiple subsets.
-
PromptTuneConfig: Configures the optimization process.
lm_model_name: The model name for the prompt proposer (teacher model). Default is "dashscope/qwen-plus".
optimization_level: Optimization intensity — "light", "medium", or "heavy". Default is "light".
eval_display_progress: Whether to display progress during evaluation. Default is True.
eval_display_table: Number of table rows to display during evaluation. Default is 5.
eval_num_threads: Number of threads for parallel evaluation. Default is 16.
compare_performance: Whether to compare baseline vs. optimized performance. Default is True.
Here is a complete example:
import os
from typing import Dict
from agentscope.tuner import DatasetConfig, WorkflowOutput, JudgeOutput
from agentscope.tuner.prompt_tune import tune_prompt, PromptTuneConfig
from agentscope.agent import ReActAgent
from agentscope.model import DashScopeChatModel
from agentscope.formatter import OpenAIChatFormatter
from agentscope.message import Msg
model = DashScopeChatModel(
"qwen-turbo",
api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
)
async def workflow(
task: Dict,
system_prompt: str,
) -> WorkflowOutput:
from agentscope.tool import (
Toolkit,
execute_python_code,
)
toolkit = Toolkit()
toolkit.register_tool_function(execute_python_code)
agent = ReActAgent(
name="react_agent",
sys_prompt=system_prompt,
model=model,
formatter=OpenAIChatFormatter(),
toolkit=toolkit,
print_hint_msg=False,
)
agent.set_console_output_enabled(False)
response = await agent.reply(
msg=Msg("user", task["question"], role="user"),
)
return WorkflowOutput(response=response)
async def judge_function(
task: Dict, response: Msg
) -> JudgeOutput:
ground_truth = task["answer"]
reward = 1.0 if ground_truth in response.get_text_content() else 0.0
return JudgeOutput(reward=reward)
if __name__ == "__main__":
init_prompt = "You are an agent. Please solve the math problem given to you."
optimized_prompt, metrics = tune_prompt(
workflow=workflow,
init_system_prompt=init_prompt,
judge_func=judge_function,
train_dataset=DatasetConfig(path="train.parquet"),
eval_dataset=DatasetConfig(path="test.parquet"),
config=PromptTuneConfig(
lm_model_name="dashscope/qwen-plus",
optimization_level="light",
),
)
print(f"Metrics: {metrics}")
print(f"Optimized prompt: {optimized_prompt}")
Save the code to a Python file (e.g., main.py) and run it:
export DASHSCOPE_API_KEY="your_api_key_here"
python main.py > optimized.txt
The lm_model_name in PromptTuneConfig specifies the teacher model used to generate candidate prompts. You need to provide an API key for the corresponding provider when running the script.
Output
The following example selects a subset from GSM8K and optimizes a ReAct agent on it.
The results include the optimized prompt and its evaluation score:
Initial prompt: You are an agent. Please solve the math problem given to you with python code. You should provide your output within \boxed{}.
Score: 92.67
Optimized prompt: You are a meticulous math tutor who solves elementary-to-middle-school-level word problems step by step. For each problem, first reason through the narrative to identify the key quantities and relationships. Then, write clear, executable Python code that computes the answer using only integer arithmetic. Finally, present your solution in the format \boxed{answer}, ensuring the answer is an integer and matches the logic of your explanation. Always double-check your reasoning and code before finalizing the boxed result.
Score: 96.88 (+ 4.21)
A well-optimized prompt improved task accuracy from 92.67 to 96.88 — a gain of +4.21 points — with no changes to model weights.