七三笔记

MyChat·func call MyChat·注册函数数据库操作 exec 参考

MyChat·func call

函数的定义，注册在 /ai/wks/aitpf/src/tpf/llm/funcall.py

 
def register_function(self, *, name, description, parameters, function, **kwargs):
    #函数信息列表，最全的函数信息列表
    self._function_infos.update({
        name: {
            "type": "function",
            "function": {
                "name": name,
                "description": description,
                "parameters": parameters,
                **kwargs
            }
        }
    })
    
    #函数列表
    self._function_mappings.update({ name: function })

register_function：在收集函数描述信息之后，将函数本身注册到函数映射表中，方便后续调用
- 如此，当LLM判断需要调用某个函数时，可以通过函数名称在映射表中找到对应的函数并执行

 
def _call(self, function_calling_result:FunctionCallingResult):
    """openai格式调用函数"""
    function = self._function_mappings[function_calling_result.name]
    arguments = json.loads(function_calling_result.arguments)
    return function(**arguments)

主要步骤

 
1. LLM根据用户问题分析以及提供的函数摘要信息，确定是否需要函数调用，以及函数所需要的参数
  1.1 要提前知道用户问题涉及的范围
  1.2 与问题所对应的函数，进行调用
2. 解析LLM返回的需要调用的函数信息：函数名，参数，然后调用对应的函数
3. 函数返回结果，LLM根据结果进行下一步处理

 
函数调用，由LLM决定使用哪些工具
- 要为LLM提供tools
- LLM本身要支持工具调用

重点

 
大模型对用户问题的 理解 以及 行动规划

结构化输出：可以精准的描述需要调用哪个函数以及函数对应的参数 

上下文学习In-Context Learning: 能够将函数调用的结果与之前的对话信息结合在一起，综合理解后输出结果

func call 让LLM可以与外部系统交互，提升/扩展了自己的能力，可以完成更复杂的任务

 
deepseek 蒸馏版不支持func call，但可以自定义prompt实现 
- 1.5B的模型，其能力在以上三个方面能力不足，不足以支撑起函数调用

应用场景

 
1.查询检索，通过补充具体的知识来扩充大模型的知识面，比如，RAG，搜索 

2.协助用户输入，将用户自然语言等非结构化信息转化为结构化信息,再使用func calling就可以写入数据库

3.直接调用外部API，完成各种功能，比如下单，执行命令...

 
问题理解，行动规划，分析执行
- 形成思维链,第1步做什么,第2步做什么...需要几步方可解决该问题
- 每一步是否需要额外的条件

 
结构化输出 
- 大模型有较强的不确定性 
- 而结构化输出可确保有精准的输出

 
上下文理解能力 
- 不同于理解一句确定的话或者确定的名词概念 
- 而是要能够理解多句话之间的联系 
- 这样才能进行多方位的整合，最终输出一个结论/答案 返回给用户

 
from tpf.llm import MyChat
chat = MyChat(env_file="env.txt")

chat.set_local_model(['DeepSeek-R1-14B-Q8:latest','DeepSeek-R1-14B-F16:latest'])

chat.get_local_model()
['DeepSeek-14B-Q8:latest',
 'DeepSeek-R1-14B-Q8:latest',
 'DeepSeek-R1-14B-F16:latest',
 'DeepSeek-R1-14B-Q8:latest',
 'DeepSeek-R1-14B-F16:latest']

deepseek_func_call

 
prompt = "当前文件'.'的绝对路径是什么？" 
res = chat.deepseek_func_call(query=prompt,
    model_list=['deepseek-chat', 'deepseek-reasoner'],
    func_index=0)
print(res)

{'query': "当前文件'.'的绝对路径是什么？", 'result': '/mnt/g/wks/aiwks/bigmodel/kejian/day03_func_call', 'thinking': []}

openai_func_call

 
prompt = "当前文件'.'的绝对路径是什么？" 
chat.openai_func_call(query=prompt, 
                          model_list=["gpt-4o-mini","gpt-4o",'DeepSeek-R1-14B-Q8:latest','DeepSeek-R1-14B-F16:latest'], 
                          func_index=0, answer_index=2)

answer_index=2
- 指定的是本地的模型作为应答整合模型，会自动判断模型是否为本地模型

 
prompt = "当前文件'.'的绝对路径是什么？" 
res = chat.openai_func_call(query=prompt, 
                          model_list=["gpt-4o-mini","gpt-4o",'DeepSeek-R1-14B-Q8:latest','DeepSeek-R1-14B-F16:latest'], 
                          func_index=0)
print(res)

answer_index=None 
- 默认值，则自动指定为func_index，即函数调用所使用的模型

注册函数

 
from tpf.llm import MyChat
chat = MyChat(env_file="env.txt")

 
def dcn_start_service():
    from tpf.dc import dcn 
    dcn.start_service()
    
func_dcn_start_service = {
    # 函数名称（帮助我们去查找本地的函数在哪里，函数映射ID）
    "name": "dcn_start_service",
    # 函数描述（帮助模型去理解函数的作用，适用场景，可以理解为Prompt的一部分）
    "description": "启动本地dcn服务",
    # 函数依赖参数的定义（帮助模型去理解如果要做参数生成，应该怎么生成）
    "parameters": {
            # 参数形式
            "type": "object", # 对应输出JSON String
            # 参数结构
            "properties": {
            }},
    "function": dcn_start_service
}

chat.register_function(**func_dcn_start_service)

 
这种结构在调用openai工具时是固定的，比如这参数，该函数并没有参数，但这个格式必须写成这样 
- 当然，如果不走线上LLM的工具调用，格式就随意了 

chat.get_tool_list()
[{'type': 'function',
  'function': {'name': 'get_abs_path',
   'description': '获取文件的绝对路径',
   'parameters': {'type': 'object',
    'properties': {'file_path': {'type': 'string'}},
    'required': ['file_path']}}},
 {'type': 'function',
  'function': {'name': 'dcn_start_service',
   'description': '启动本地dcn服务',
   'parameters': {'type': 'object', 'properties': {}}}}]

函数信息会加入prompt模板

 
ps = chat.prompt_system()
print(ps)
    
# 主要任务
        
你的任务是识别回答用户问题是否需要调用指定的函数，如果需要则返回一个函数列表，告诉用户需要调用的函数名称以及函数参数，列表中每个元素如下：
1. 函数名称name,函数列表中name的值；如果要列举的函数名称在指定的函数列表中不存在，则不要显示该函数
2. 函数参数arguments,对应函数列表中parameters
3. 函数描述description，可以通过函数列表中字典元素description的值确定函数的用途，与当前用户问题的匹配程度




# 如果需要额外的函数调用才能回答该问题，则从下面给出的"函数列表"中选择出最合适的函数反馈回来
# 函数列表如下:
[{'type': 'function', 'function': {'name': 'get_abs_path', 'description': '获取文件的绝对路径', 'parameters': {'type': 'object', 'properties': {'file_path': {'type': 'string'}}, 'required': ['file_path']}}}, {'type': 'function', 'function': {'name': 'dcn_start_service', 'description': '启动本地dcn服务', 'parameters': {'type': 'object', 'properties': {}}}}]

# 输出格式:

输出为一个函数列表，包含在```json ```标记中，列表元素为需要回答用户问题涉及的函数信息，列表中每个元素是一个JSON Object，包含的字段有：
1. name字段的取值为string类型，取值必须为函数列表中某个元素name的值或null
(1)如果不需要函数调用，或者没有函数调用时，则JSON输出为空
2. arguments字段
(1) 字典类型，但其中的key与value皆是字符串类型
(2) 如果确定使用函数，格式为函数元素parameters.properties字段的说明来初始化该字段 
3.description字段，其值为函数列表中对应函数description的值
JSON输出中包含"name","arguments","description"三个字段，
  3.1 如果没有arguments字段，则找到上下文内容确定arguments的取值 
  3.2 输出包含 description字段字段，若不存在该字段，则从函数列表找对应函数的描述description的值


# examples:
[
{
    "name": "get_abs_path",
    "arguments": {
        "file_path": "."
    },
    "description": "获取文件的绝对路径"
}
]

 
chat.deepseek_func_call(query="请启动dcn服务")

 
No openai function is called... start customizing function calls...
dcn_start_service {}
✅ 已启动服务: configService
✅ 已启动服务: dcService

服务状态验证:
configService: 运行中
dcService: 运行中
func res:
    {'函数名称name': 'dcn_start_service', '函数描述description': '启动本地dcn服务', '函数回调结果func_call_result': None}
开始总结并给出最终结果...
{'query': '请启动dcn服务', 'result': '已启动本地dcn服务', 'thinking': []}

 
首先启动ollama 或者准备一个大模型服务 
llm@ii:~$ ./bin/ollama.sh

加载环境变量，对于本地服务则不需要

 
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv(filename="env.txt"))

 
from tpf.llm import FuncCall
fc = FuncCall()

prompt = "当前文件'.'的绝对路径是什么？" 
response = fc.chat(query=prompt, model_list=["gpt-4o-mini","gpt-4o",'DeepSeek-R1-14B-Q8:latest','DeepSeek-R1-14B-F16:latest'], func_index=3, answer_index=2 )

print(response)

 
主要分三步
- 大模型分析用户问题确定需要哪些函数调用，需要提前准备好可能需要的函数
- 根据上一步大模型返回的结果，调用相应的函数
- 将函数调用的结果与用户的提问结合起来，送给大模型整合，返回最终结果

最终返回

 
```json
{
    "query": 当前文件'.'的绝对路径是什么？
,
    "result": {
    "abs_path": "/mnt/g/wks/bigmodel/kejian/day03_func_call"
    }
}
```

 
本次返回的格式并不是所期望的{"query":...,"result":...} 
是大模型自主决定了result后面也是一个字典
但不同的问题对应不同的函数，先这样吧...

deepseek-r1:32b

 
prompt = "当前文件'.'的绝对路径是什么？" 

# deepseek-r1:32b
response = fc.chat(query=prompt, model_list=["gpt-4o-mini","gpt-4o",'DeepSeek-R1-14B-Q8:latest','DeepSeek-R1-14B-F16:latest','DeepSeek-R1-32B-Q6:latest'], func_index=4, answer_index=2 )

print(response)

 
```json
{
    "query": "当前文件'.'的绝对路径是什么？",
    "result": "/mnt/g/wks/bigmodel/kejian/day03_func_call"
}
```

 
function_calling_request = ModelRequestWithFunctionCalling()

(
    function_calling_request
        .register_function(
            name="get_location_coordinate",
            description="根据POI名称，获得POI的经纬度坐标",
            parameters={
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "POI名称，必须是中文",
                    },
                    "city": {
                        "type": "string",
                        "description": "POI所在的城市名，必须是中文",
                    }
                },
                "required": ["location", "city"],
            },
            function=get_location_coordinate,
        )
        .register_function(
            name="search_nearby_pois",
            description="搜索给定坐标附近的poi",
            parameters={
                "type": "object",
                "properties": {
                    "longitude": {
                        "type": "string",
                        "description": "中心点的经度",
                    },
                    "latitude": {
                        "type": "string",
                        "description": "中心点的纬度",
                    },
                    "keyword": {
                        "type": "string",
                        "description": "目标poi的关键字",
                    }
                },
                "required": ["longitude", "latitude", "keyword"],
            },
            function=search_nearby_pois,
        )
)
result = function_calling_request.request(content="五道口附近的咖啡馆")
print("----------------------\n\n", result)

 
"""工具列表"""        
import re,sys
import json

def get_abs_path(file_path="."):
    """获取文件的绝对路径
    """
    import os
    abs_path = os.path.abspath(file_path) #获取当前文件的绝对路径
    return abs_path


def dcn_start_service():
    from tpf.dc import dcn 
    dcn.start_service()

#f_get_abs_path = {"name":"get_abs_path","desc":"获取文件的绝对路径","arguments":"python字典对象,其key为file_path，其值value为用户问题中的路径"}

# 给出工具定义，使用json格式详细描述函数的功能以及参数的个数与类型
tools_json = [
    # 每一个列表元素项，就是一个工具定义
    {
        # 类型标注（固定格式）
        "type": "function",
        # 函数定义
        "function": {
            # 函数名称（帮助我们去查找本地的函数在哪里，函数映射ID）
            "name": "get_abs_path",
            # 函数描述（帮助模型去理解函数的作用，适用场景，可以理解为Prompt的一部分）
            "description": "获取文件的绝对路径",
            # 函数依赖参数的定义（帮助模型去理解如果要做参数生成，应该怎么生成）
            "parameters": {
                # 参数形式
                "type": "object", # 对应输出JSON String
                # 参数结构
                "properties": {
                    # 参数名，参数类型
                    "file_path": {"type": "string"}, # 用户问题中的文件路径
                },
                # 必须保证生成的参数列表（每个元素对应上面properties的参数名）
                "required": ["file_path"],
                "additionalProperties": False 
            },
            # 格式是否严格（默认为True）
            "strict": True
        }
    },
    {
        # 类型标注（固定格式）
        "type": "function",
        # 函数定义
        "function": {
            # 函数名称（帮助我们去查找本地的函数在哪里，函数映射ID）
            "name": "dcn_start_service",
            # 函数描述（帮助模型去理解函数的作用，适用场景，可以理解为Prompt的一部分）
            "description": "启动本地dcn服务",
            # 函数依赖参数的定义（帮助模型去理解如果要做参数生成，应该怎么生成）
            "parameters": {
            },
            # 格式是否严格（默认为True）
            "strict": True
        }
    },
    # 每一个列表元素项，就是一个工具定义
    {
        # 类型标注（固定格式）
        "type": "function",
        # 函数定义
        "function": {
            # 函数名称（帮助我们去查找本地的函数在哪里，函数映射ID）
            "name": "get_weather",
            # 函数描述（帮助模型去理解函数的作用，适用场景，可以理解为Prompt的一部分）
            "description": "Get current temperature for provided coordinates in celsius.",
            # 函数依赖参数的定义（帮助模型去理解如果要做参数生成，应该怎么生成）
            "parameters": {
                # 参数形式
                "type": "object", # 对应输出JSON String
                # 参数结构
                "properties": {
                    # 参数名，参数类型
                    "latitude": {"type": "number"},
                    # 参数名，参数类型
                    "longitude": {"type": "number"}
                },
                # 必须保证生成的参数列表（每个元素对应上面properties的参数名）
                "required": ["latitude", "longitude"],
                "additionalProperties": False
            },
            # 格式是否严格（默认为True）
            "strict": True
        }
    }
]


class Tools():
    def __init__(self):
        self.tool_list = []
        self.func_name_list = []
    
         
        # self.tool_list.append(f_get_abs_path) 
    def func_name_desc(self):
        self.func_name_list.append({
            "name": "get_weather",
            "description": "Get current temperature for provided coordinates in celsius."
        })
        self.func_name_list.append({
            "name": "get_abs_path",
            "description": "获取文件的绝对路径",
            "arguments":["file_path"]
        })
        return self.func_name_list
        
        
    def addTool(self,f_json):
        """添加工具
        
        examples
        ------------------------
        f_get_abs_path = {
            "name":"get_abs_path",
            "desc":"获取文件的绝对路径",
            "arguments":"名为file_path，其value为路径"}
        addTool(f_get_abs_path)
        
        
        """
        self.tool_list.append(f_json) 
        
    def tools(self):
        return self.tool_list
    
    
    def func_list(self):
        """文本格式的函数列表"""
        # func_list = f"""
        # # 可调用的函数/工具 列表
        # {self.tool_list}
        # """
        return self.tool_list
    
    
    def get_json_str(self,txt):
        """从文本中解析出json"""
        # json_pattern = r'```json\n([\s\S]*?)\n```'
        json_pattern = r'```json([\s\S]*?)```'
        match = re.search(json_pattern, txt)
        json_str = ""
        if match:
            json_str = match.group(1).strip()
        return json_str

    def parse_json(self,content):
        try:
            json_str = self.get_json_str(content)
            json_dict = json.loads(json_str)
            is_parse_ok = True 
        except Exception as e:
            print(e)
            is_parse_ok = False 
            
        if is_parse_ok:
            return json_dict
        return content

 
def register_function(self, *, name, description, parameters, function, **kwargs):
    #函数信息列表，最全的函数信息列表
    self._function_infos.update({
        name: {
            "type": "function",
            "function": {
                "name": name,
                "description": description,
                "parameters": parameters,
                **kwargs
            }
        }
    })
    
    #函数列表
    self._function_mappings.update({ name: function })

 
注册一个函数，添加了两部分内容：
- 函数定义：_function_mappings，用于调用
- 除定义外的详细信息：_function_infos，用于prompt模板

数据库操作

访问Sqlite3

 
pip install pysqlite3

 
from tpf.db import SqLite
sdb = SqLite(db='vo.db')  #会在当前目录创建一个vo.db的文件存储表及数据

 
sql = "SELECT * FROM users"
df = sdb.select(sql, columns=["id","name","age"])
df

exec

子进程执行

仅检查语法

 
from tpf.box.zx import check_grammar_noexec
code = '''
def f(a, b):
    return a+b 
'''
check_grammar_noexec(code)   #{'status': 'ok'}

执行并返回结果

 
from tpf.box.zx import exec_python_code

source = '''
import pandas as pd
df = pd.DataFrame({'a':[1,2], 'b':[3,4]})
print(df.shape)
'''

exec_python_code(source)  #{'status': 'ok', 'result': '(2, 2)'}

代码

 
import subprocess, tempfile, os

source = '''
import pandas as pd
df = pd.DataFrame({'a':[1,2], 'b':[3,4]})
print(df.shape)
'''

with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
    f.write(source)
    f.flush()
    fname = f.name

try:
    proc = subprocess.run(
        ['python', fname],
        capture_output=True,
        text=True,
        timeout=10          # 防止死循环
    )
    if proc.returncode == 0:
        print("运行成功：\n", proc.stdout)
    else:
        print("运行失败：\n", proc.stderr)
finally:
    os.remove(fname)

参考

七三笔记路线：学习，记录，分享