单元测试编写
复杂任务(如编写单元测试)可以从多步骤提示中受益。与单个提示相反,多步骤提示从 GPT-3 生成文本,然后将该文本反馈到后续提示中。如果您希望 GPT-3 在回答之前解释其推理,或者在执行计划之前集思广益,这可能会有所帮助。
在此笔记本中,我们使用 3 步提示符通过以下步骤在 Python 中编写单元测试:
- 给定一个 Python 函数,我们首先提示 GPT-3 解释该函数在做什么。
- 其次,我们提示 GPT-3 为函数计划一组单元测试。
- 如果计划太短,我们要求 GPT-3 详细说明单元测试的更多想法。
- 最后,我们提示 GPT-3 编写单元测试。
该代码示例演示了链接的多步骤提示符上的一些可选修饰:
- 有条件的分支(例如,只有在第一个计划太短时才要求详细说明)
- 不同步骤的不同模型(例如,文本规划步骤和代码编写步骤)
text-davinci-002
code-davinci-002
- 如果输出不令人满意(例如,如果 Python 的模块无法解析输出代码),则重新运行函数的检查ast
- 流式处理输出,以便您可以在完全生成输出之前开始读取输出(对于长、多步骤输出很有用)
完整的 3 步提示符如下所示(用作单元测试框架的示例和函数):pytestis_palindrome
# 如何使用 pytest 编写出色的单元测试 在这个面向专家的高级教程中,我们将使用 Python 3.9 和 `pytest` 编写一套单元测试来验证以下函数的行为。 ```python def is_palindrome(s): return s == s[::-1] ``` 在编写任何单元测试之前,让我们回顾一下函数的每个元素究竟在做什么以及作者的意图可能是什么。 - 首先,{第 1 步生成} 一个好的单元测试套件应该旨在: - 针对各种可能的输入测试函数的行为 - 测试作者可能没有预见到的边缘情况 - 利用 `pytest` 的特性使测试易于编写和维护 - 易于阅读和理解,代码简洁,名称具有描述性 - 具有确定性,以便测试始终以相同的方式通过或失败 `pytest` 有许多方便的特性,可以轻松编写和维护单元测试。 我们将使用它们为上面的函数编写单元测试。 对于这个特定的功能,我们希望我们的单元测试能够处理以下不同的场景(在每个场景下,我们都包含一些示例作为子项目符号): -{在步骤 2 中生成} [可选附加]除了上述场景之外,我们还希望确保我们不会忘记测试罕见或意外的边缘情况(并且在每个边缘情况下,我们包括一些示例作为子项目符号): -{在步骤 2B 中生成} 在进入单个测试之前,让我们首先将完整的单元测试套件作为一个内聚的整体来看待。 我们添加了有用的注释来解释每一行的作用。 ```python import pytest # 用于我们的单元测试 def is_palindrome(s): return s == s[::-1] # 下面,每个测试用例都由传递给@pytest.mark.parametrize装饰器的元组表示 {GENERATED IN STEP 3}
# 运行此笔记本中的代码所需的导入 import ast # 用于检测生成的Python代码是否有效 import openai # 用于调用 OpenAI API # 使用多步提示编写单元测试的函数示例 def unit_test_from_function( function_to_test: str, # 要测试的 Python 函数,作为字符串 unit_test_package: str = "pytest", # 单元测试包; 使用导入语句中出现的名称 approx_min_cases_to_cover: int = 7, # 要覆盖的测试用例类别的最小数量(近似值) print_text: bool = False, # 可选地打印文本; 有助于理解功能和调试 text_model: str = "text-davinci-002", # 用于在步骤 1、2 和 2b 中生成文本计划的模型 code_model: str = "code-davinci-002", # 如果您无权访问代码模型,则可以在此处使用文本模型 max_tokens: int = 1000, # 可以设置这个高,因为生成应该通过停止序列提前停止 temperature: float = 0.4, # temperature = 0 有时会陷入重复循环,所以我们使用 0.4 reruns_if_fail: int = 1, # 如果无法解析输出代码,这将重新运行该函数最多 N 次 ) -> str: """使用 3 步 GPT-3 提示输出给定 Python 函数的单元测试。""" # 第 1 步:生成函数的解释 # 创建一个 markdown 格式的提示,要求 GPT-3 完成功能的解释,格式为项目符号列表 prompt_to_explain_the_function = f"""# How to write great unit tests with {unit_test_package} In this advanced tutorial for experts, we'll use Python 3.9 and `{unit_test_package}` to write a suite of unit tests to verify the behavior of the following function. ```python {function_to_test} ``` Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been. - First,""" if print_text: text_color_prefix = "\033[30m" # black; if you read against a dark background \033[97m is white print(text_color_prefix + prompt_to_explain_the_function, end="") # end='' prevents a newline from being printed # send the prompt to the API, using \n\n as a stop sequence to stop at the end of the bullet list explanation_response = openai.Completion.create( model=text_model, prompt=prompt_to_explain_the_function, stop=["\n\n", "\n\t\n", "\n \n"], max_tokens=max_tokens, temperature=temperature, stream=True, ) explanation_completion = "" if print_text: completion_color_prefix = "\033[92m" # green print(completion_color_prefix, end="") for event in explanation_response: event_text = event["choices"][0]["text"] explanation_completion += event_text if print_text: print(event_text, end="") # Step 2: Generate a plan to write a unit test # create a markdown-formatted prompt that asks GPT-3 to complete a plan for writing unit tests, formatted as a bullet list prompt_to_explain_a_plan = f""" A good unit test suite should aim to: - Test the function's behavior for a wide range of possible inputs - Test edge cases that the author may not have foreseen - Take advantage of the features of `{unit_test_package}` to make the tests easy to write and maintain - Be easy to read and understand, with clean code and descriptive names - Be deterministic, so that the tests always pass or fail in the same way `{unit_test_package}` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above. For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets): -""" if print_text: print(text_color_prefix + prompt_to_explain_a_plan, end="") # append this planning prompt to the results from step 1 prior_text = prompt_to_explain_the_function + explanation_completion full_plan_prompt = prior_text + prompt_to_explain_a_plan # send the prompt to the API, using \n\n as a stop sequence to stop at the end of the bullet list plan_response = openai.Completion.create( model=text_model, prompt=full_plan_prompt, stop=["\n\n", "\n\t\n", "\n \n"], max_tokens=max_tokens, temperature=temperature, stream=True, ) plan_completion = "" if print_text: print(completion_color_prefix, end="") for event in plan_response: event_text = event["choices"][0]["text"] plan_completion += event_text if print_text: print(event_text, end="") # Step 2b: If the plan is short, ask GPT-3 to elaborate further # this counts top-level bullets (e.g., categories), but not sub-bullets (e.g., test cases) elaboration_needed = plan_completion.count("\n-") +1 < approx_min_cases_to_cover # adds 1 because the first bullet is not counted if elaboration_needed: prompt_to_elaborate_on_the_plan = f""" In addition to the scenarios above, we'll also want to make sure we don't forget to test rare or unexpected edge cases (and under each edge case, we include a few examples as sub-bullets): -""" if print_text: print(text_color_prefix + prompt_to_elaborate_on_the_plan, end="") # append this elaboration prompt to the results from step 2 prior_text = full_plan_prompt + plan_completion full_elaboration_prompt = prior_text + prompt_to_elaborate_on_the_plan # send the prompt to the API, using \n\n as a stop sequence to stop at the end of the bullet list elaboration_response = openai.Completion.create( model=text_model, prompt=full_elaboration_prompt, stop=["\n\n", "\n\t\n", "\n \n"], max_tokens=max_tokens, temperature=temperature, stream=True, ) elaboration_completion = "" if print_text: print(completion_color_prefix, end="") for event in elaboration_response: event_text = event["choices"][0]["text"] elaboration_completion += event_text if print_text: print(event_text, end="") # Step 3: Generate the unit test # create a markdown-formatted prompt that asks GPT-3 to complete a unit test starter_comment = "" if unit_test_package == "pytest": starter_comment = "Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator" prompt_to_generate_the_unit_test = f""" Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does. ```python import {unit_test_package} # used for our unit tests {function_to_test} #{starter_comment}""" if print_text: print(text_color_prefix + prompt_to_generate_the_unit_test, end="") # append this unit test prompt to the results from step 3 if elaboration_needed: prior_text = full_elaboration_prompt + elaboration_completion else: prior_text = full_plan_prompt + plan_completion full_unit_test_prompt = prior_text + prompt_to_generate_the_unit_test # send the prompt to the API, using ``` as a stop sequence to stop at the end of the code block unit_test_response = openai.Completion.create( model=code_model, prompt=full_unit_test_prompt, stop="```", max_tokens=max_tokens, temperature=temperature, stream=True ) unit_test_completion = "" if print_text: print(completion_color_prefix, end="") for event in unit_test_response: event_text = event["choices"][0]["text"] unit_test_completion += event_text if print_text: print(event_text, end="") # check the output for errors code_start_index = prompt_to_generate_the_unit_test.find("```python\n") + len("```python\n") code_output = prompt_to_generate_the_unit_test[code_start_index:] + unit_test_completion try: ast.parse(code_output) except SyntaxError as e: print(f"Syntax error in generated code: {e}") if reruns_if_fail > 0: print("Rerunning...") return unit_test_from_function( function_to_test=function_to_test, unit_test_package=unit_test_package, approx_min_cases_to_cover=approx_min_cases_to_cover, print_text=print_text, text_model=text_model, code_model=code_model, max_tokens=max_tokens, temperature=temperature, reruns_if_fail=reruns_if_fail-1, # decrement rerun counter when calling again ) # return the unit test as a string return unit_test_completion
example_function = """def is_palindrome(s): return s == s[::-1]""" unit_test_from_function(example_function, print_text=True)
# 如何使用 pytest 编写出色的单元测试 在这个面向专家的高级教程中,我们将使用 Python 3.9 和 `pytest` 编写一套单元测试来验证以下函数的行为。 ```python def is_palindrome(s): return s == s[::-1] ``` 在编写任何单元测试之前,让我们回顾一下函数的每个元素究竟在做什么以及作者的意图可能是什么。 - 首先,我们有一个函数定义。 这是我们为函数命名的地方,`is_palindrome`,并指定函数接受的参数。 在这种情况下,该函数接受单个字符串参数"s"。 - 接下来,我们有一个返回语句。 这是我们指定函数返回值的地方。 在这种情况下,函数返回 s == s[::-1] 。 - 最后,我们有一个函数调用。 这是我们实际使用一组特定参数调用函数的地方。 在本例中,我们使用字符串"racecar"调用该函数。 一个好的单元测试套件应该旨在: - 针对各种可能的输入测试函数的行为 - 测试作者可能没有预见到的边缘情况 - 利用 `pytest` 的特性使测试易于编写和维护 - 易于阅读和理解,代码简洁,名称具有描述性 - 具有确定性,以便测试始终以相同的方式通过或失败 `pytest` 有许多方便的特性,可以轻松编写和维护单元测试。 我们将使用它们为上面的函数编写单元测试。 对于这个特定的功能,我们希望我们的单元测试能够处理以下不同的场景(在每个场景下,我们都包含一些示例作为子项目符号): - 输入是回文 -`"赛车"` -`"女士"` -`"安娜"` - 输入不是回文 -`"蟒蛇"` -`"测试"` -`"1234"` - 输入为空字符串 -`""` - 输入为"无" - 输入不是字符串 - `1` - `1.0` -`真实` - `假` -`[]` -`{}` 除了上述场景之外,我们还希望确保我们不会忘记测试罕见或意外的边缘情况(并且在每个边缘情况下,我们包括一些示例作为子项目符号): - 输入是带空格的回文 -`"赛车"` -`"女士"` -`"安娜"` - 输入不是带空格的回文 -`"蟒蛇"` -`"测试"` -`"1234"` - 输入是带标点符号的回文 -`"赛车!"` -`"女士,我是亚当。"` -`"安娜的"` - 输入的不是带标点符号的回文 -`"蟒蛇!"` -`"测试。"` -`"1234!"` - 输入是大小写混合的回文 -`"赛车"` -`"女士"` - `"安娜"` - 输入不是大小写混合的回文 -`"蟒蛇"` -`"测试"` -`"1234"` 在进入单个测试之前,让我们首先将完整的单元测试套件作为一个内聚的整体来看待。 我们添加了有用的注释来解释每一行的作用。 ```python import pytest # used for our unit tests def is_palindrome(s): return s == s[::-1] #下面,每个测试用例都由传递给 @pytest.mark.parametrize 装饰器的元组表示。 #元组的第一个元素是测试用例的名称,第二个元素是测试用例的参数列表。 #@pytest.mark.parametrize 装饰器将为每个测试用例生成一个单独的测试函数。 #生成的测试函数将被命名为test_is_palindrome_<name> 其中<name>是测试用例的名称。 #生成的测试函数将被赋予测试用例参数列表中指定的参数。 #生成的测试函数将被赋予装饰器中指定的夹具,在这种情况下是函数本身。 #生成的测试函数会调用带参数的函数,并断言结果等于预期值。 @pytest.mark.parametrize( "name,args,expected", [ # Test the function's behavior for a wide range of possible inputs ("palindrome", ["racecar"], True), ("palindrome", ["madam"], True), ("palindrome", ["anna"], True), ("non-palindrome", ["python"], False), ("non-palindrome", ["test"], False), ("non-palindrome", ["1234"], False), ("empty string", [""], True), ("None", [None], False), ("non-string", [1], False), ("non-string", [1.0], False), ("non-string", [True], False), ("non-string", [False], False), ("non-string", [[]], False), ("non-string", [{}], False), # Test edge cases that the author may not have foreseen ("palindrome with spaces", ["race car"], True), ("palindrome with spaces", [" madam "], True), ("palindrome with spaces", [" anna "], True), ("non-palindrome with spaces", [" python "], False), ("non-palindrome with spaces", [" test "], False), ("non-palindrome with spaces", [" 1234 "], False), ("palindrome with punctuation", ["racecar!"], True), ("palindrome with punctuation", ["Madam, I'm Adam."], True), ("palindrome with punctuation", ["Anna's"], True), ("non-palindrome with punctuation", ["python!"], False), ("non-palindrome with punctuation", ["test."], False), ("non-palindrome with punctuation", ["1234!"], False), ("palindrome with mixed case", ["Racecar"], True), ("palindrome with mixed case", ["Madam"], True), ("palindrome with mixed case", ["Anna"], True), ("non-palindrome with mixed case", ["Python"], False), ("non-palindrome with mixed case", ["Test"], False), ("non-palindrome with mixed case", ["1234"], False), ], ) def test_is_palindrome(is_palindrome, args, expected): assert is_palindrome(*args) == expected
'.\n#The first element of the tuple is a name for the test case, and the second element is a list of arguments for the test case.\n#The @pytest.mark.parametrize decorator will generate a separate test function for each test case.\n#The generated test function will be named test_is_palindrome_<name> where <name> is the name of the test case.\n#The generated test function will be given the arguments specified in the list of arguments for the test case.\n#The generated test function will be given the fixture specified in the decorator, in this case the function itself.\n#The generated test function will call the function with the arguments and assert that the result is equal to the expected value.\n@pytest.mark.parametrize(\n "name,args,expected",\n [\n # Test the function\'s behavior for a wide range of possible inputs\n ("palindrome", ["racecar"], True),\n ("palindrome", ["madam"], True),\n ("palindrome", ["anna"], True),\n ("non-palindrome", ["python"], False),\n ("non-palindrome", ["test"], False),\n ("non-palindrome", ["1234"], False),\n ("empty string", [""], True),\n ("None", [None], False),\n ("non-string", [1], False),\n ("non-string", [1.0], False),\n ("non-string", [True], False),\n ("non-string", [False], False),\n ("non-string", [[]], False),\n ("non-string", [{}], False),\n # Test edge cases that the author may not have foreseen\n ("palindrome with spaces", ["race car"], True),\n ("palindrome with spaces", [" madam "], True),\n ("palindrome with spaces", [" anna "], True),\n ("non-palindrome with spaces", [" python "], False),\n ("non-palindrome with spaces", [" test "], False),\n ("non-palindrome with spaces", [" 1234 "], False),\n ("palindrome with punctuation", ["racecar!"], True),\n ("palindrome with punctuation", ["Madam, I\'m Adam."], True),\n ("palindrome with punctuation", ["Anna\'s"], True),\n ("non-palindrome with punctuation", ["python!"], False),\n ("non-palindrome with punctuation", ["test."], False),\n ("non-palindrome with punctuation", ["1234!"], False),\n ("palindrome with mixed case", ["Racecar"], True),\n ("palindrome with mixed case", ["Madam"], True),\n ("palindrome with mixed case", ["Anna"], True),\n ("non-palindrome with mixed case", ["Python"], False),\n ("non-palindrome with mixed case", ["Test"], False),\n ("non-palindrome with mixed case", ["1234"], False),\n ],\n)\ndef test_is_palindrome(is_palindrome, args, expected):\n assert is_palindrome(*args) == expected\n'
评论 (0)