AnimateDiff|文本生成视频

AnimateDiff是一个用于实现个性化文本到图像扩散模型的动画化,动画化你的个性化文本到图像扩散模型。 可以通过文字生成视频,效果比Stable Diffusion中的图生图更好,闪烁趋近于无。

AnimateDiff|高质量文本到动画视频生成

AnimateDiff(动画差异)是一个用于实现个性化文本到图像扩散模型的动画化,在没有特定调整的情况下动画化您的个性化文本到图像扩散模型。

简而言之可以通过你给出的文字内容生成图像,然后该项目自动根据该图像生成较为优雅的动画,效果比Stable Diffusion WebUI中的图生图生成视频效果更好,闪烁趋近于无。

项目仓库

官网:AnimateDiff

GitHub:guoyww/AnimateDiff: Official implementation of AnimateDiff. (github.com)

前置条件

在执行项目安装之前,我们还需要安装GitConda,如果您的电脑还未安装这两款软件,请先根据本站所给出的教程安装。

Windows系统安装Git请参阅此文章:

Windows系统安装Conda请参阅此文章:

网络问题

在安装过程中,你可能即便开启了魔法上网也无法下载一些编程依赖库,关于魔法上网的相关配置问题不方便在站内讲解,请自行查看【魔法上网】的教程内容。

安装教程

如果您是初学者,对于命令行不太理解,那么请按下键盘上的Win键+R键后,在弹出的新窗口内输入CMD并按下回车键,在CMD窗口中按顺序执行如下的每一条命令

首先我们需要确认一个工作目录,用来存放该项目的相关环境依赖文件。本站所选择的目录为D盘的根目录下openai.wiki文件夹,完整路径为:D:\openai.wiki

在CMD中执行如下命令,这将会自动检测D盘是否在openai.wiki文件夹,没有则自动创建文件夹

if not exist D:\openai.wiki mkdir D:\openai.wiki

继续执行如下命令,在CMD中强制切换当前工作路径为D盘openai.wiki文件夹。

cd /d D:\openai.wiki

拉取该项目的Github仓库文件,将其下载至openai.wiki文件夹内。

git clone https://github.com/guoyww/AnimateDiff.git

如果您无法完成此步骤,执行后报错或者无法下载,可以下载该文件将其解压至D:\openai.wiki即可。

AnimateDiff-main

提取密码 文件说明 文件大小 23.18GB 文件格式 ZIP 资源价格 免费 下载次数 2

环境部署

在CMD中执行如下命令,强制切换至AnimateDiff的项目目录。

cd /d D:\openai.wiki\AnimateDiff

这里我们需要手动修改项目根目录下的environment.yaml文件,否则将会报错xformers无法安装的问题。

打开该文件之后,你将会看到如下内容:

name: animatediff
channels:
  - pytorch
  - xformers
dependencies:
  - python=3.10
  - pytorch==1.12.1
  - torchvision==0.13.1
  - torchaudio==0.12.1
  - cudatoolkit=11.3
  - xformers
  - pip
  - pip:
    - diffusers[torch]==0.11.1
    - transformers==4.25.1
    - imageio==2.27.0
    - gdown
    - einops
    - omegaconf
    - safetensors

我们将第11行- xformers命令,移动至第13行即可,更改后的内容示例如下。

备注:如果你看不懂也没关系,直接复制下面的,覆盖你自己environment.yaml文件内的所有内容即可,然后记得保存。

name: animatediff
channels:
  - pytorch
  - xformers
dependencies:
  - python=3.10
  - pytorch==1.12.1
  - torchvision==0.13.1
  - torchaudio==0.12.1
  - cudatoolkit=11.3
  - pip
  - pip:
    - xformers
    - diffusers[torch]==0.11.1
    - transformers==4.25.1
    - imageio==2.27.0
    - gdown
    - einops
    - omegaconf
    - safetensors

在CMD中执行下面的命令行,这将会自动根据environment.yaml创建一个名为animatediffConda虚拟环境

conda env create -f environment.yaml

初始化Conda环境防止后续操作可能存在报错等问题。

conda init cmd.exe

激活已创建的Conda环境,这样我们可以将我们后续所需要的所有环境依赖都安装至此环境下。

conda activate animatediff

执行如上命令之后,你的当前CMD窗口将会处于一个名为animatediffPython虚拟环境中。

我们执行如下代码,卸载当前Python环境中已安装的torchtorchvisiontorchaudio模块。

pip uninstall -y torch torchvision torchaudio

why?如果我们不重新安装的话,当前默认安装会以PyTorchCPU方式运行,会出现如下报错内容。

(animatediff) D:\openai.wiki\AnimateDiff>python -m scripts.animate --config configs/prompts/1-ToonYou.yaml --pretrained_model_path models/StableDiffusion --L 16 --W 512 --H 512
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.0.1+cpu)
    Python  3.10.11 (you have 3.10.12)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
C:\Users\openA\miniconda3\envs\animatediff\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: Could not find module 'C:\Users\openA\miniconda3\envs\animatediff\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
  warn(f"Failed to load image Python extension: {e}")
loaded temporal unet's pretrained weights from models/StableDiffusion\unet ...
### missing keys: 560;
### unexpected keys: 0;
### Temporal Module Parameters: 417.1376 M
Traceback (most recent call last):
  File "C:\Users\openA\miniconda3\envs\animatediff\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\openA\miniconda3\envs\animatediff\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\openai.wiki\AnimateDiff\scripts\animate.py", line 159, in <module>
    main(args)
  File "D:\openai.wiki\AnimateDiff\scripts\animate.py", line 55, in main
    if is_xformers_available(): unet.enable_xformers_memory_efficient_attention()
  File "C:\Users\openA\miniconda3\envs\animatediff\lib\site-packages\diffusers\modeling_utils.py", line 215, in enable_xformers_memory_efficient_attention
    self.set_use_memory_efficient_attention_xformers(True)
  File "C:\Users\openA\miniconda3\envs\animatediff\lib\site-packages\diffusers\modeling_utils.py", line 203, in set_use_memory_efficient_attention_xformers
    fn_recursive_set_mem_eff(module)
  File "C:\Users\openA\miniconda3\envs\animatediff\lib\site-packages\diffusers\modeling_utils.py", line 199, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "C:\Users\openA\miniconda3\envs\animatediff\lib\site-packages\diffusers\modeling_utils.py", line 199, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "C:\Users\openA\miniconda3\envs\animatediff\lib\site-packages\diffusers\modeling_utils.py", line 199, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "C:\Users\openA\miniconda3\envs\animatediff\lib\site-packages\diffusers\modeling_utils.py", line 196, in fn_recursive_set_mem_eff
    module.set_use_memory_efficient_attention_xformers(valid)
  File "C:\Users\openA\miniconda3\envs\animatediff\lib\site-packages\diffusers\modeling_utils.py", line 203, in set_use_memory_efficient_attention_xformers
    fn_recursive_set_mem_eff(module)
  File "C:\Users\openA\miniconda3\envs\animatediff\lib\site-packages\diffusers\modeling_utils.py", line 199, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "C:\Users\openA\miniconda3\envs\animatediff\lib\site-packages\diffusers\modeling_utils.py", line 196, in fn_recursive_set_mem_eff
    module.set_use_memory_efficient_attention_xformers(valid)
  File "D:\openai.wiki\AnimateDiff\animatediff\models\attention.py", line 237, in set_use_memory_efficient_attention_xformers
    raise ValueError(
ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU

就像你所看到的上面报错这样,无法调用GPU,找不到可用的GPU。这个问题困扰了我一天,真的很烦,因为官方文档里没有写,也没有看到有人在官方反馈中遇到这个问题。

我们卸载完成后,执行如下代码,重新安装torchtorchvisiontorchaudio模块,这样才可以调用GPU硬件设备。

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

这可能要等待一会,具体取决于你的魔法配置。这一步结束之后,就代表我们已经完成了所有关于环境部署的部分。

模型说明

以下是各个模型的生成效果展示,官方给出的YAML文件对应着以下的各个模型。

D:\openai.wiki\AnimateDiff\configs\prompts
└─1-ToonYou.yaml
└─2-Lyriel.yaml
└─3-RcnzCartoon.yaml
└─4-MajicMix.yaml
└─5-RealisticVision.yaml
└─6-Tusun.yaml
└─7-FilmVelvia.yaml
└─8-GhibliBackground.yaml

ToonYou

Civitai:https://civitai.com/models/30240/toonyou

Counterfeit V3.0

Civitai:https://civitai.com/models/4468/counterfeit-v30

Realistic Vision V2.0

Civitai:https://civitai.com/models/4201/realistic-vision-v20

majicMIX Realistic

Civitai:https://civitai.com/models/43331/majicmix-realistic

RCNZ Cartoon

Civitai:https://civitai.com/models/66347/rcnz-cartoon-3d

FilmVelvia

Civitai:https://civitai.com/models/33208/filmgirl-film-grain-lora-and-loha

模型下载|官方

以下模型下载时必须开启魔法网络,而且即使你开了魔法环境也不一定真的有用,所以不建议通过官方下载,而是通过本站所提供的国内网盘下载。

StableDiffusion

在CMD中执行如下命令,强制切换至AnimateDiff的项目目录。

cd /d D:\openai.wiki\AnimateDiff

逐行执行如下代码,这将会自动下载并编译模型models/StableDiffusion目录。

git lfs install
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 models/StableDiffusion/

注意:这非常非常耗时,站长的垃圾配置用了近20个小时才编译完成,编译后的文件共计75.2GB,除了编译文件外,实际的模型大小为37.6GB。

Motion_Module

在CMD中执行如下命令,强制切换至AnimateDiff的项目目录。

cd /d D:\openai.wiki\AnimateDiff

逐行执行如下代码,这将会自动下载模型models/Motion_Module目录。

bash download_bashscripts/0-MotionModule.sh

注意:这将会自动下载两个模型文件,大小共计3.11GB,下载后将会自动存储在models/Motion_Module目录内。如果你无法通过此方式下载,也可以【点击此处】前往Google云盘下载。

DreamBooth_LoRA

官方目前提供了该模型的8个下载地址,共计11个模型文件,大小共计27.6GB

逐行执行如下代码,这将会自动下载模型models/DreamBooth_LoRA目录。

bash download_bashscripts/1-ToonYou.sh
bash download_bashscripts/2-Lyriel.sh
bash download_bashscripts/3-RcnzCartoon.sh
bash download_bashscripts/4-MajicMix.sh
bash download_bashscripts/5-RealisticVision.sh
bash download_bashscripts/6-Tusun.sh
bash download_bashscripts/7-FilmVelvia.sh
bash download_bashscripts/8-GhibliBackground.sh

模型下载|网盘

以下文件全部下载,不要更改目录结构,共计68.4GB,将名为models的文件夹移动至项目的根目录下,如果提示是否覆盖,选择即可。

models

提取密码 文件说明 文件大小 68.4GB 文件格式 File 资源价格 免费 下载次数 3

运行方式

在以后每次运行该项目时,只需要先激活我们刚刚所创建的Conda虚拟Python环境,然后运行启动文件即可。

在CMD中执行如下命令,强制切换至项目目录文件夹。

cd /d D:\openai.wiki\AnimateDiff

激活已创建的Conda环境,这样才可以正常使用该项目,否则将会自动调用系统中的默认Python。

conda activate animatediff

使用不同的模型,需要加载不同的yaml文件,官方内置了8个Prompt文件,可以帮你快速生成想要的内容。具体如何实现,我们先拿一条来举例子。

python -m scripts.animate --config configs/prompts/1-ToonYou.yaml --pretrained_model_path models/StableDiffusion --L 16 --W 512 --H 512

上面的代码,在执行之后,就会自动生成一个GIF文件至项目根目录samples文件夹内。

  • python -m scripts.animate
    • 代表使用Python环境运行项目根目录scripts文件夹内名为animate.py的脚本
  • --config
    • 参数:configs/prompts/1-ToonYou.yaml
    • 该文件则是这个脚本执行时,必须要填写的参数,用来决定生成什么样的内容,也就是Prompt。
  • --pretrained_model_path
    • 参数:models/StableDiffusion
    • 这里用来填写StableDiffusion官方模型的路径,无需修改。
  • --L 16 --W 512 --H 512
    • —l
      • 没搞懂,好像是模型位度?或者其它的某个维度,经测试,好像只可以填写8的平方数。
      • 总之,官方默认就是16,一般没有必要修改。
    • —w
      • 代表生成的动画图像宽度
    • —h
      • 代表生成的动画图像高度

大家可能还有一个疑问,我怎么控制我想生成的内容是什么?这就要看1-ToonYou.yaml文件的内容了,打开该文件看一下,文件的内容如下所示。

ToonYou:
  base: ""
  path: "models/DreamBooth_LoRA/toonyou_beta3.safetensors"
  motion_module:
    - "models/Motion_Module/mm_sd_v14.ckpt"
    - "models/Motion_Module/mm_sd_v15.ckpt"

  seed:           [10788741199826055526, 6520604954829636163, 6519455744612555650, 16372571278361863751]
  steps:          25
  guidance_scale: 7.5

  prompt:
    - "best quality, masterpiece, 1girl, looking at viewer, blurry background, upper body, contemporary, dress"
    - "masterpiece, best quality, 1girl, solo, cherry blossoms, hanami, pink flower, white flower, spring season, wisteria, petals, flower, plum blossoms, outdoors, falling petals, white hair, black eyes,"
    - "best quality, masterpiece, 1boy, formal, abstract, looking at viewer, masculine, marble pattern"
    - "best quality, masterpiece, 1girl, cloudy sky, dandelion, contrapposto, alternate hairstyle,"

  n_prompt:
    - ""
    - "badhandv4,easynegative,ng_deepnegative_v1_75t,verybadimagenegative_v1.3, bad-artist, bad_prompt_version2-neg, teeth"
    - ""
    - ""
  • base
    • 基础模型路径,默认即可。
  • path
    • 代表你本次生成图像时所使用LoRA模型路径,你也可以定义一个你自己的模型。
  • motion_module
    • 官方微调模型路径,一般不用修改。
  • seed
    • 生成图像时的种子,这里和SD是一样的。
  • steps
    • 步数,你可以理解为生成一张图像需要计算多少张。
  • guidance_scale
    • 引导比例,与SD的默认功能无异。
  • prompt
    • 正向提示词,每一行都可以视为一次任务,比如本示例中的Prompt为4行,那么也就是共计生成4次,每一次都将逐行按Prompt内容生成GIF图像。
  • n_prompt
    • 反向提示词,每一行都可以视为一次任务,这将与Prompt正向提示词一一对应,也可以留空。
    • 总之,你不想生成什么内容,那就填写在这里,比如NSFW。

好啦,我们已经完成了所有理论,下面开始生成环境,无论怎么样,我们先执行一下试试效果,看看程序能不能跑的起来。

python -m scripts.animate --config configs/prompts/1-ToonYou.yaml --pretrained_model_path models/StableDiffusion --L 16 --W 512 --H 512

这一段需要等待非常非常久,毕竟站长的显卡是2080TI,真的越来越力不从心了。

(animatediff) D:\openai.wiki\AnimateDiff>python -m scripts.animate --config configs/prompts/1-ToonYou.yaml --pretrained_model_path models/StableDiffusion --L 16 --W 512 --H 512
A matching Triton is not available, some optimizations will not be enabled.
loaded temporal unet's pretrained weights from models/StableDiffusion\unet ...
### missing keys: 560;
### unexpected keys: 0;
### Temporal Module Parameters: 417.1376 M
Downloading pytorch_model.bin: 100%|██████████████████████████████████████████████| 1.71G/1.71G [09:11<00:00, 3.10MB/s]
C:\Users\openA\miniconda3\envs\animatediff\lib\site-packages\huggingface_hub\file_download.py:133: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\openA\.cache\huggingface\hub. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPTextModel: ['vision_model.encoder.layers.14.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.self_attn.v_proj.weight', 'vision_model.encoder.layers.23.self_attn.v_proj.weight', 'vision_model.encoder.layers.21.self_attn.q_proj.weight', 'vision_model.encoder.layers.10.mlp.fc1.weight', 'vision_model.encoder.layers.17.mlp.fc1.weight', 'vision_model.encoder.layers.10.layer_norm1.weight', 'vision_model.encoder.layers.0.self_attn.q_proj.bias', 'vision_model.encoder.layers.7.self_attn.k_proj.weight', 'vision_model.encoder.layers.9.self_attn.v_proj.bias', 'vision_model.encoder.layers.10.self_attn.v_proj.weight', 'vision_model.encoder.layers.9.self_attn.k_proj.weight', 'vision_model.encoder.layers.2.mlp.fc2.bias', 'vision_model.encoder.layers.7.self_attn.out_proj.weight', 'vision_model.encoder.layers.20.self_attn.v_proj.weight', 'vision_model.encoder.layers.11.mlp.fc2.bias', 'vision_model.encoder.layers.12.self_attn.q_proj.bias', 'vision_model.encoder.layers.0.self_attn.q_proj.weight', 'vision_model.encoder.layers.14.mlp.fc2.weight', 'vision_model.encoder.layers.22.layer_norm2.bias', 'vision_model.encoder.layers.9.layer_norm2.bias', 'vision_model.encoder.layers.12.layer_norm1.weight', 'vision_model.encoder.layers.18.self_attn.q_proj.weight', 'vision_model.embeddings.class_embedding', 'vision_model.encoder.layers.14.mlp.fc1.bias', 'vision_model.encoder.layers.21.layer_norm2.bias', 'vision_model.encoder.layers.3.self_attn.k_proj.bias', 'vision_model.encoder.layers.0.mlp.fc1.bias', 'vision_model.encoder.layers.3.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.layer_norm1.bias', 'vision_model.encoder.layers.21.mlp.fc1.weight', 'vision_model.encoder.layers.11.layer_norm2.weight', 'vision_model.encoder.layers.21.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.self_attn.q_proj.bias', 'vision_model.encoder.layers.2.layer_norm1.weight', 'vision_model.encoder.layers.2.layer_norm1.bias', 'vision_model.encoder.layers.19.self_attn.out_proj.bias', 'vision_model.encoder.layers.10.layer_norm2.bias', 'vision_model.encoder.layers.15.mlp.fc1.bias', 'vision_model.encoder.layers.5.layer_norm1.weight', 'vision_model.encoder.layers.4.mlp.fc1.bias', 'vision_model.encoder.layers.11.self_attn.out_proj.weight', 'vision_model.encoder.layers.23.mlp.fc1.weight', 'vision_model.encoder.layers.21.self_attn.k_proj.bias', 'vision_model.encoder.layers.20.self_attn.v_proj.bias', 'vision_model.encoder.layers.5.self_attn.out_proj.bias', 'vision_model.encoder.layers.4.mlp.fc2.bias', 'vision_model.encoder.layers.7.layer_norm2.weight', 'vision_model.encoder.layers.19.layer_norm2.weight', 'vision_model.encoder.layers.8.layer_norm1.weight', 'vision_model.encoder.layers.15.mlp.fc2.bias', 'vision_model.encoder.layers.7.self_attn.out_proj.bias', 'vision_model.encoder.layers.10.self_attn.q_proj.weight', 'vision_model.encoder.layers.5.mlp.fc2.bias', 'vision_model.encoder.layers.2.self_attn.k_proj.bias', 'vision_model.encoder.layers.22.self_attn.k_proj.weight', 'vision_model.encoder.layers.13.self_attn.v_proj.weight', 'vision_model.encoder.layers.13.mlp.fc1.weight', 'vision_model.encoder.layers.3.self_attn.k_proj.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.self_attn.v_proj.weight', 'vision_model.encoder.layers.9.mlp.fc2.weight', 'vision_model.encoder.layers.13.self_attn.q_proj.bias', 'vision_model.encoder.layers.21.mlp.fc2.bias', 'vision_model.encoder.layers.4.self_attn.v_proj.bias', 'vision_model.encoder.layers.17.layer_norm1.weight', 'vision_model.encoder.layers.20.self_attn.k_proj.bias', 'vision_model.encoder.layers.23.self_attn.q_proj.weight', 'vision_model.encoder.layers.21.mlp.fc2.weight', 'vision_model.encoder.layers.18.mlp.fc1.weight', 'vision_model.encoder.layers.14.layer_norm2.bias', 'vision_model.encoder.layers.23.self_attn.k_proj.weight', 'vision_model.encoder.layers.11.layer_norm2.bias', 'vision_model.encoder.layers.5.self_attn.q_proj.weight', 'vision_model.post_layernorm.bias', 'vision_model.encoder.layers.17.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.self_attn.out_proj.bias', 'vision_model.encoder.layers.5.self_attn.k_proj.weight', 'vision_model.encoder.layers.12.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.mlp.fc2.weight', 'vision_model.encoder.layers.18.mlp.fc2.weight', 'vision_model.encoder.layers.18.layer_norm1.weight', 'vision_model.encoder.layers.8.layer_norm1.bias', 'vision_model.encoder.layers.7.layer_norm2.bias', 'vision_model.encoder.layers.14.layer_norm1.bias', 'vision_model.encoder.layers.15.self_attn.k_proj.bias', 'vision_model.encoder.layers.14.self_attn.v_proj.weight', 'vision_model.encoder.layers.6.self_attn.out_proj.weight', 'vision_model.encoder.layers.21.self_attn.k_proj.weight', 'vision_model.encoder.layers.11.mlp.fc1.bias', 'vision_model.encoder.layers.19.self_attn.k_proj.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.bias', 'vision_model.encoder.layers.18.mlp.fc2.bias', 'vision_model.encoder.layers.21.self_attn.v_proj.weight', 'vision_model.encoder.layers.19.self_attn.q_proj.weight', 'vision_model.encoder.layers.16.layer_norm1.weight', 'vision_model.encoder.layers.10.self_attn.out_proj.bias', 'vision_model.encoder.layers.22.mlp.fc1.weight', 'vision_model.encoder.layers.10.layer_norm1.bias', 'vision_model.encoder.layers.18.self_attn.q_proj.bias', 'vision_model.encoder.layers.6.mlp.fc1.weight', 'vision_model.encoder.layers.2.layer_norm2.weight', 'vision_model.encoder.layers.4.layer_norm2.bias', 'vision_model.encoder.layers.6.self_attn.v_proj.bias', 'vision_model.encoder.layers.22.mlp.fc2.weight', 'vision_model.encoder.layers.1.mlp.fc1.bias', 'vision_model.encoder.layers.17.self_attn.v_proj.weight', 'vision_model.encoder.layers.21.self_attn.q_proj.bias', 'vision_model.encoder.layers.11.layer_norm1.bias', 'vision_model.encoder.layers.17.self_attn.q_proj.bias', 'vision_model.encoder.layers.20.self_attn.out_proj.bias', 'vision_model.encoder.layers.6.self_attn.v_proj.weight', 'vision_model.encoder.layers.14.self_attn.out_proj.bias', 'vision_model.encoder.layers.22.mlp.fc1.bias', 'vision_model.encoder.layers.2.layer_norm2.bias', 'vision_model.encoder.layers.21.self_attn.out_proj.bias', 'vision_model.encoder.layers.13.mlp.fc1.bias', 'vision_model.encoder.layers.22.mlp.fc2.bias', 'vision_model.embeddings.position_embedding.weight', 'vision_model.encoder.layers.0.mlp.fc2.bias', 'vision_model.encoder.layers.8.self_attn.k_proj.bias', 'vision_model.encoder.layers.5.mlp.fc1.weight', 'vision_model.encoder.layers.7.self_attn.q_proj.weight', 'vision_model.encoder.layers.20.layer_norm1.weight', 'vision_model.encoder.layers.2.self_attn.out_proj.bias', 'vision_model.encoder.layers.16.self_attn.q_proj.weight', 'vision_model.encoder.layers.10.layer_norm2.weight', 'vision_model.encoder.layers.23.self_attn.k_proj.bias', 'vision_model.encoder.layers.0.self_attn.v_proj.bias', 'vision_model.encoder.layers.4.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.self_attn.out_proj.weight', 'vision_model.encoder.layers.6.mlp.fc1.bias', 'vision_model.encoder.layers.10.mlp.fc1.bias', 'vision_model.encoder.layers.7.mlp.fc1.weight', 'vision_model.encoder.layers.14.mlp.fc2.bias', 'vision_model.encoder.layers.23.self_attn.q_proj.bias', 'vision_model.encoder.layers.16.mlp.fc2.weight', 'vision_model.encoder.layers.6.layer_norm1.bias', 'vision_model.encoder.layers.10.self_attn.out_proj.weight', 'vision_model.encoder.layers.22.self_attn.k_proj.bias', 'vision_model.encoder.layers.1.layer_norm2.weight', 'vision_model.encoder.layers.20.self_attn.q_proj.weight', 'vision_model.encoder.layers.16.layer_norm2.bias', 'vision_model.embeddings.position_ids', 'vision_model.encoder.layers.5.layer_norm2.weight', 'vision_model.encoder.layers.11.self_attn.q_proj.bias', 'vision_model.encoder.layers.15.self_attn.q_proj.weight', 'vision_model.encoder.layers.13.mlp.fc2.bias', 'vision_model.encoder.layers.3.mlp.fc2.weight', 'vision_model.encoder.layers.14.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.self_attn.v_proj.weight', 'vision_model.encoder.layers.20.mlp.fc2.weight', 'vision_model.encoder.layers.1.mlp.fc1.weight', 'vision_model.encoder.layers.16.self_attn.k_proj.weight', 'vision_model.encoder.layers.19.self_attn.v_proj.bias', 'vision_model.encoder.layers.16.layer_norm1.bias', 'vision_model.encoder.layers.13.mlp.fc2.weight', 'vision_model.encoder.layers.21.mlp.fc1.bias', 'vision_model.encoder.layers.10.self_attn.k_proj.weight', 'vision_model.encoder.layers.20.layer_norm2.bias', 'vision_model.encoder.layers.9.layer_norm1.weight', 'vision_model.encoder.layers.14.layer_norm1.weight', 'vision_model.encoder.layers.0.layer_norm1.weight', 'vision_model.encoder.layers.14.mlp.fc1.weight', 'vision_model.encoder.layers.4.self_attn.out_proj.bias', 'vision_model.encoder.layers.19.layer_norm1.weight', 'vision_model.encoder.layers.23.self_attn.v_proj.bias', 'vision_model.encoder.layers.6.self_attn.q_proj.bias', 'vision_model.encoder.layers.20.self_attn.k_proj.weight', 'vision_model.encoder.layers.20.mlp.fc1.bias', 'vision_model.encoder.layers.20.layer_norm2.weight', 'vision_model.encoder.layers.2.self_attn.q_proj.bias', 'vision_model.encoder.layers.18.self_attn.v_proj.weight', 'vision_model.encoder.layers.22.layer_norm2.weight', 'vision_model.encoder.layers.9.layer_norm2.weight', 'vision_model.encoder.layers.15.self_attn.out_proj.weight', 'vision_model.encoder.layers.22.self_attn.v_proj.bias', 'vision_model.encoder.layers.18.self_attn.v_proj.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.bias', 'vision_model.encoder.layers.6.self_attn.out_proj.bias', 'vision_model.encoder.layers.21.layer_norm1.weight', 'vision_model.encoder.layers.1.mlp.fc2.bias', 'vision_model.encoder.layers.7.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.layer_norm2.weight', 'vision_model.encoder.layers.23.self_attn.out_proj.bias', 'vision_model.encoder.layers.8.self_attn.out_proj.bias', 'vision_model.encoder.layers.12.layer_norm1.bias', 'vision_model.encoder.layers.2.mlp.fc2.weight', 'vision_model.encoder.layers.2.self_attn.k_proj.weight', 'vision_model.encoder.layers.16.self_attn.out_proj.bias', 'vision_model.encoder.layers.2.self_attn.v_proj.weight', 'vision_model.encoder.layers.8.mlp.fc2.bias', 'vision_model.encoder.layers.23.layer_norm2.weight', 'vision_model.encoder.layers.13.layer_norm2.bias', 'vision_model.encoder.layers.17.mlp.fc1.bias', 'vision_model.encoder.layers.18.layer_norm1.bias', 'vision_model.encoder.layers.9.mlp.fc1.weight', 'vision_model.encoder.layers.4.layer_norm2.weight', 'vision_model.encoder.layers.12.mlp.fc1.weight', 'vision_model.encoder.layers.8.self_attn.v_proj.bias', 'vision_model.encoder.layers.17.mlp.fc2.weight', 'vision_model.encoder.layers.12.self_attn.k_proj.weight', 'vision_model.encoder.layers.4.layer_norm1.bias', 'vision_model.encoder.layers.0.self_attn.v_proj.weight', 'vision_model.embeddings.patch_embedding.weight', 'vision_model.encoder.layers.1.self_attn.k_proj.weight', 'vision_model.encoder.layers.11.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.self_attn.q_proj.bias', 'vision_model.encoder.layers.18.self_attn.k_proj.bias', 'vision_model.encoder.layers.13.layer_norm2.weight', 'vision_model.encoder.layers.23.layer_norm1.weight', 'vision_model.encoder.layers.4.self_attn.v_proj.weight', 'vision_model.encoder.layers.12.self_attn.v_proj.weight', 'vision_model.encoder.layers.19.layer_norm2.bias', 'vision_model.encoder.layers.15.self_attn.v_proj.weight', 'vision_model.encoder.layers.11.self_attn.k_proj.bias', 'vision_model.encoder.layers.9.self_attn.q_proj.weight', 'vision_model.encoder.layers.5.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.self_attn.v_proj.bias', 'vision_model.encoder.layers.22.self_attn.out_proj.bias', 'vision_model.encoder.layers.17.mlp.fc2.bias', 'vision_model.encoder.layers.16.self_attn.v_proj.weight', 'vision_model.encoder.layers.20.self_attn.q_proj.bias', 'text_projection.weight', 'vision_model.encoder.layers.1.self_attn.q_proj.weight', 'vision_model.encoder.layers.8.mlp.fc1.bias', 'vision_model.encoder.layers.8.mlp.fc1.weight', 'vision_model.encoder.layers.23.mlp.fc2.weight', 'vision_model.encoder.layers.5.self_attn.k_proj.bias', 'vision_model.encoder.layers.11.mlp.fc1.weight', 'vision_model.encoder.layers.17.self_attn.k_proj.weight', 'vision_model.encoder.layers.9.layer_norm1.bias', 'vision_model.encoder.layers.3.self_attn.v_proj.weight', 'vision_model.encoder.layers.3.mlp.fc2.bias', 'vision_model.encoder.layers.5.mlp.fc1.bias', 'vision_model.encoder.layers.18.layer_norm2.bias', 'vision_model.encoder.layers.4.layer_norm1.weight', 'vision_model.encoder.layers.12.self_attn.v_proj.bias', 'vision_model.encoder.layers.14.self_attn.k_proj.weight', 'vision_model.encoder.layers.3.layer_norm1.weight', 'vision_model.pre_layrnorm.bias', 'vision_model.encoder.layers.19.self_attn.q_proj.bias', 'vision_model.encoder.layers.19.mlp.fc1.weight', 'vision_model.encoder.layers.6.mlp.fc2.bias', 'vision_model.encoder.layers.16.self_attn.v_proj.bias', 'vision_model.encoder.layers.10.mlp.fc2.bias', 'vision_model.encoder.layers.22.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.self_attn.q_proj.bias', 'vision_model.encoder.layers.1.layer_norm1.bias', 'vision_model.encoder.layers.11.self_attn.k_proj.weight', 'vision_model.encoder.layers.1.self_attn.out_proj.weight', 'vision_model.encoder.layers.5.self_attn.v_proj.weight', 'vision_model.encoder.layers.22.self_attn.v_proj.weight', 'vision_model.encoder.layers.22.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.layer_norm2.bias', 'vision_model.encoder.layers.11.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.self_attn.k_proj.bias', 'vision_model.encoder.layers.15.layer_norm2.weight', 'vision_model.encoder.layers.2.self_attn.out_proj.weight', 'vision_model.encoder.layers.12.self_attn.out_proj.bias', 'vision_model.encoder.layers.15.mlp.fc2.weight', 'vision_model.encoder.layers.8.self_attn.v_proj.weight', 'vision_model.encoder.layers.8.self_attn.q_proj.weight', 'vision_model.encoder.layers.14.layer_norm2.weight', 'vision_model.encoder.layers.18.layer_norm2.weight', 'vision_model.encoder.layers.1.layer_norm1.weight', 'vision_model.encoder.layers.0.self_attn.out_proj.bias', 'vision_model.encoder.layers.10.self_attn.k_proj.bias', 'vision_model.post_layernorm.weight', 'vision_model.encoder.layers.21.self_attn.v_proj.bias', 'vision_model.encoder.layers.20.mlp.fc2.bias', 'vision_model.encoder.layers.13.self_attn.out_proj.bias', 'vision_model.encoder.layers.9.self_attn.out_proj.bias', 'vision_model.encoder.layers.17.self_attn.k_proj.bias', 'vision_model.encoder.layers.7.layer_norm1.weight', 'vision_model.encoder.layers.19.mlp.fc1.bias', 'vision_model.encoder.layers.3.mlp.fc1.bias', 'vision_model.encoder.layers.15.layer_norm1.weight', 'vision_model.encoder.layers.17.self_attn.v_proj.bias', 'vision_model.encoder.layers.7.mlp.fc2.bias', 'vision_model.encoder.layers.4.mlp.fc1.weight', 'vision_model.encoder.layers.12.self_attn.k_proj.bias', 'vision_model.encoder.layers.20.mlp.fc1.weight', 'vision_model.encoder.layers.5.layer_norm2.bias', 'vision_model.encoder.layers.3.self_attn.q_proj.weight', 'vision_model.encoder.layers.3.layer_norm2.bias', 'vision_model.encoder.layers.15.layer_norm1.bias', 'vision_model.encoder.layers.14.self_attn.q_proj.bias', 'vision_model.encoder.layers.3.mlp.fc1.weight', 'vision_model.encoder.layers.22.self_attn.out_proj.weight', 'vision_model.encoder.layers.6.layer_norm1.weight', 'vision_model.encoder.layers.16.mlp.fc1.weight', 'vision_model.encoder.layers.23.layer_norm1.bias', 'vision_model.encoder.layers.1.layer_norm2.bias', 'vision_model.encoder.layers.17.layer_norm1.bias', 'vision_model.encoder.layers.16.self_attn.k_proj.bias', 'vision_model.encoder.layers.4.self_attn.k_proj.bias', 'vision_model.encoder.layers.7.self_attn.v_proj.bias', 'vision_model.encoder.layers.5.self_attn.v_proj.bias', 'vision_model.encoder.layers.10.mlp.fc2.weight', 'vision_model.encoder.layers.11.self_attn.v_proj.weight', 'vision_model.encoder.layers.15.self_attn.k_proj.weight', 'vision_model.encoder.layers.21.layer_norm1.bias', 'vision_model.encoder.layers.20.self_attn.out_proj.weight', 'vision_model.encoder.layers.16.layer_norm2.weight', 'vision_model.encoder.layers.12.mlp.fc2.weight', 'vision_model.encoder.layers.7.layer_norm1.bias', 'vision_model.encoder.layers.13.layer_norm1.weight', 'vision_model.encoder.layers.0.mlp.fc1.weight', 'vision_model.encoder.layers.12.self_attn.q_proj.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.bias', 'vision_model.encoder.layers.6.layer_norm2.bias', 'vision_model.encoder.layers.11.layer_norm1.weight', 'vision_model.encoder.layers.3.self_attn.v_proj.bias', 'vision_model.encoder.layers.3.self_attn.out_proj.bias', 'vision_model.encoder.layers.15.self_attn.v_proj.bias', 'vision_model.encoder.layers.10.self_attn.v_proj.bias', 'vision_model.encoder.layers.4.mlp.fc2.weight', 'vision_model.encoder.layers.13.self_attn.q_proj.weight', 'vision_model.pre_layrnorm.weight', 'vision_model.encoder.layers.15.layer_norm2.bias', 'vision_model.encoder.layers.2.self_attn.v_proj.bias', 'vision_model.encoder.layers.23.mlp.fc1.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.weight', 'vision_model.encoder.layers.12.mlp.fc2.bias', 'vision_model.encoder.layers.5.self_attn.q_proj.bias', 'vision_model.encoder.layers.17.layer_norm2.bias', 'vision_model.encoder.layers.19.mlp.fc2.bias', 'vision_model.encoder.layers.7.mlp.fc2.weight', 'vision_model.encoder.layers.8.layer_norm2.weight', 'vision_model.encoder.layers.18.mlp.fc1.bias', 'vision_model.encoder.layers.6.self_attn.k_proj.bias', 'vision_model.encoder.layers.22.layer_norm1.weight', 'vision_model.encoder.layers.4.self_attn.q_proj.weight', 'vision_model.encoder.layers.6.layer_norm2.weight', 'vision_model.encoder.layers.18.self_attn.k_proj.weight', 'vision_model.encoder.layers.6.self_attn.k_proj.weight', 'vision_model.encoder.layers.15.self_attn.q_proj.bias', 'vision_model.encoder.layers.7.self_attn.k_proj.bias', 'vision_model.encoder.layers.8.layer_norm2.bias', 'vision_model.encoder.layers.0.mlp.fc2.weight', 'vision_model.encoder.layers.5.mlp.fc2.weight', 'vision_model.encoder.layers.11.mlp.fc2.weight', 'vision_model.encoder.layers.19.self_attn.k_proj.bias', 'vision_model.encoder.layers.1.self_attn.k_proj.bias', 'vision_model.encoder.layers.8.self_attn.out_proj.weight', 'vision_model.encoder.layers.9.mlp.fc1.bias', 'vision_model.encoder.layers.3.layer_norm2.weight', 'vision_model.encoder.layers.23.layer_norm2.bias', 'vision_model.encoder.layers.13.self_attn.out_proj.weight', 'vision_model.encoder.layers.9.self_attn.out_proj.weight', 'vision_model.encoder.layers.8.self_attn.k_proj.weight', 'vision_model.encoder.layers.4.self_attn.out_proj.weight', 'vision_model.encoder.layers.2.mlp.fc1.bias', 'vision_model.encoder.layers.16.self_attn.out_proj.weight', 'vision_model.encoder.layers.14.self_attn.k_proj.bias', 'vision_model.encoder.layers.13.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.mlp.fc2.weight', 'vision_model.encoder.layers.20.layer_norm1.bias', 'vision_model.encoder.layers.7.mlp.fc1.bias', 'vision_model.encoder.layers.10.self_attn.q_proj.bias', 'vision_model.encoder.layers.13.layer_norm1.bias', 'vision_model.encoder.layers.3.layer_norm1.bias', 'vision_model.encoder.layers.9.mlp.fc2.bias', 'vision_model.encoder.layers.12.layer_norm2.weight', 'vision_model.encoder.layers.6.self_attn.q_proj.weight', 'vision_model.encoder.layers.0.layer_norm1.bias', 'vision_model.encoder.layers.7.self_attn.q_proj.bias', 'vision_model.encoder.layers.16.mlp.fc2.bias', 'vision_model.encoder.layers.16.self_attn.q_proj.bias', 'vision_model.encoder.layers.22.layer_norm1.bias', 'visual_projection.weight', 'logit_scale', 'vision_model.encoder.layers.3.self_attn.q_proj.bias', 'vision_model.encoder.layers.18.self_attn.out_proj.bias', 'vision_model.encoder.layers.2.mlp.fc1.weight', 'vision_model.encoder.layers.5.layer_norm1.bias', 'vision_model.encoder.layers.15.self_attn.out_proj.bias', 'vision_model.encoder.layers.23.mlp.fc2.bias', 'vision_model.encoder.layers.15.mlp.fc1.weight', 'vision_model.encoder.layers.19.self_attn.out_proj.weight', 'vision_model.encoder.layers.2.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.layer_norm2.weight', 'vision_model.encoder.layers.4.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.mlp.fc1.bias', 'vision_model.encoder.layers.18.self_attn.out_proj.weight', 'vision_model.encoder.layers.6.mlp.fc2.weight', 'vision_model.encoder.layers.0.layer_norm2.bias', 'vision_model.encoder.layers.21.layer_norm2.weight', 'vision_model.encoder.layers.14.self_attn.out_proj.weight', 'vision_model.encoder.layers.16.mlp.fc1.bias', 'vision_model.encoder.layers.8.mlp.fc2.weight', 'vision_model.encoder.layers.0.self_attn.out_proj.weight']
- This IS expected if you are initializing CLIPTextModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CLIPTextModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
current seed: 10788741199826055526
sampling best quality, masterpiece, 1girl, looking at viewer, blurry background, upper body, contemporary, dress ...
100%|███████████████████████████████████████████████████████████████████████████████| 25/25 [1:00:05<00:00, 144.23s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 16/16 [00:07<00:00,  2.04it/s]
save to samples/1-ToonYou-2023-07-13T19-31-04/sample/best-quality,-masterpiece,-1girl,-looking-at-viewer,-blurry-background,-upper.gif
current seed: 6520604954829636163
sampling masterpiece, best quality, 1girl, solo, cherry blossoms, hanami, pink flower, white flower, spring season, wisteria, petals, flower, plum blossoms, outdoors, falling petals, white hair, black eyes, ...
 68%|███████████████████████████████████████████████████████                          | 17/25 [43:32<19:27, 145.95s/it]

GIF图像生成中的内容如上所示,我们可以看到站长已经生成了一个16帧,分辨率512*512的GIF图像,2080TI的算力时间为1小时。

使用教程

上面本站已经讲了正常的使用方法,有些人可能看的云里雾里,下面站长换个例子再讲一次吧。

官方一共给提供了8种预设,他们分别如下。

python -m scripts.animate --config configs/prompts/1-ToonYou.yaml --pretrained_model_path models/StableDiffusion --L 16 --W 512 --H 512
python -m scripts.animate --config configs/prompts/2-Lyriel.yaml --pretrained_model_path models/StableDiffusion --L 16 --W 512 --H 512
python -m scripts.animate --config configs/prompts/3-RcnzCartoon.yaml --pretrained_model_path models/StableDiffusion --L 16 --W 512 --H 512
python -m scripts.animate --config configs/prompts/4-MajicMix.yaml --pretrained_model_path models/StableDiffusion --L 16 --W 512 --H 512
python -m scripts.animate --config configs/prompts/5-RealisticVision.yaml --pretrained_model_path models/StableDiffusion --L 16 --W 512 --H 512
python -m scripts.animate --config configs/prompts/6-Tusun.yaml --pretrained_model_path models/StableDiffusion --L 16 --W 512 --H 512
python -m scripts.animate --config configs/prompts/7-FilmVelvia.yaml --pretrained_model_path models/StableDiffusion --L 16 --W 512 --H 512
python -m scripts.animate --config configs/prompts/8-GhibliBackground.yaml --pretrained_model_path models/StableDiffusion --L 16 --W 512 --H 512

经过观察,我们只需要修改configs/prompts/*.yaml的路径,即可对应修改相关Prompt等设置。

我们这次以7-FilmVelvia为例子,打开项目根目录下的configs/prompts/7-FilmVelvia.yaml路径查看一下这个文件。

FilmVelvia:
  base: "models/DreamBooth_LoRA/majicmixRealistic_v4.safetensors"
  path: "models/DreamBooth_LoRA/FilmVelvia2.safetensors"
  motion_module:
    - "models/Motion_Module/mm_sd_v14.ckpt"
    - "models/Motion_Module/mm_sd_v15.ckpt"

  seed:           [358675358833372813, 3519455280971923743, 11684545350557985081, 8696855302100399877]
  steps:          25
  guidance_scale: 7.5
  lora_alpha: 0.6

  prompt:
    - "a woman standing on the side of a road at night,girl, long hair, motor vehicle, car, looking at viewer, ground vehicle, night, hands in pockets, blurry background, coat, black hair, parted lips, bokeh, jacket, brown hair, outdoors, red lips, upper body, artist name"
    - ", dark shot,0mm, portrait quality of a arab man worker,boy, wasteland that stands out vividly against the background of the desert, barren landscape, closeup, moles skin, soft light, sharp, exposure blend, medium shot, bokeh, hdr, high contrast, cinematic, teal and orange5, muted colors, dim colors, soothing tones, low saturation, hyperdetailed, noir"
    - "fashion photography portrait of 1girl, offshoulder, fluffy short hair, soft light, rim light, beautiful shadow, low key, photorealistic, raw photo, natural skin texture, realistic eye and face details, hyperrealism, ultra high res, 4K, Best quality, masterpiece, necklace, cleavage, in the dark"
    - "In this lighthearted portrait, a woman is dressed as a fierce warrior, armed with an arsenal of paintbrushes and palette knives. Her war paint is composed of thick, vibrant strokes of color, and her armor is made of paint tubes and paint-splattered canvases. She stands victoriously atop a mountain of conquered blank canvases, with a beautiful, colorful landscape behind her, symbolizing the power of art and creativity. bust Portrait, close-up, Bright and transparent scene lighting, "

  n_prompt:
    - "cartoon, anime, sketches,worst quality, low quality, deformed, distorted, disfigured, bad eyes, wrong lips, weird mouth, bad teeth, mutated hands and fingers, bad anatomy, wrong anatomy, amputation, extra limb, missing limb, floating limbs, disconnected limbs, mutation, ugly, disgusting, bad_pictures, negative_hand-neg"
    - "cartoon, anime, sketches,worst quality, low quality, deformed, distorted, disfigured, bad eyes, wrong lips, weird mouth, bad teeth, mutated hands and fingers, bad anatomy, wrong anatomy, amputation, extra limb, missing limb, floating limbs, disconnected limbs, mutation, ugly, disgusting, bad_pictures, negative_hand-neg"
    - "wrong white balance, dark, cartoon, anime, sketches,worst quality, low quality, deformed, distorted, disfigured, bad eyes, wrong lips, weird mouth, bad teeth, mutated hands and fingers, bad anatomy, wrong anatomy, amputation, extra limb, missing limb, floating limbs, disconnected limbs, mutation, ugly, disgusting, bad_pictures, negative_hand-neg"
    - "wrong white balance, dark, cartoon, anime, sketches,worst quality, low quality, deformed, distorted, disfigured, bad eyes, wrong lips, weird mouth, bad teeth, mutated hands and fingers, bad anatomy, wrong anatomy, amputation, extra limb, missing limb, floating limbs, disconnected limbs, mutation, ugly, disgusting, bad_pictures, negative_hand-neg"

除了prompt和n_prompt参数,我们都保持默认,只修改这两个参数即可。

我们可以看到prompt和n_prompt参数都是4行内容,也就是执行之后将会生成4张图像,因为我们处于测试阶段,所以只保留最后一个即可,删除前面三个。

FilmVelvia:
  base: "models/DreamBooth_LoRA/majicmixRealistic_v4.safetensors"
  path: "models/DreamBooth_LoRA/FilmVelvia2.safetensors"
  motion_module:
    - "models/Motion_Module/mm_sd_v14.ckpt"
    - "models/Motion_Module/mm_sd_v15.ckpt"

  seed:           [358675358833372813, 3519455280971923743, 11684545350557985081, 8696855302100399877]
  steps:          25
  guidance_scale: 7.5
  lora_alpha: 0.6

  prompt:
    - "In this lighthearted portrait, a woman is dressed as a fierce warrior, armed with an arsenal of paintbrushes and palette knives. Her war paint is composed of thick, vibrant strokes of color, and her armor is made of paint tubes and paint-splattered canvases. She stands victoriously atop a mountain of conquered blank canvases, with a beautiful, colorful landscape behind her, symbolizing the power of art and creativity. bust Portrait, close-up, Bright and transparent scene lighting, "

  n_prompt:
    - "wrong white balance, dark, cartoon, anime, sketches,worst quality, low quality, deformed, distorted, disfigured, bad eyes, wrong lips, weird mouth, bad teeth, mutated hands and fingers, bad anatomy, wrong anatomy, amputation, extra limb, missing limb, floating limbs, disconnected limbs, mutation, ugly, disgusting, bad_pictures, negative_hand-neg"

我们将prompt和n_prompt翻译成中文看一下,到底是什么样的描述。

  prompt:
    - "在这幅轻松的肖像画中,一名女子打扮成一名凶猛的战士,手持画笔和调色刀。 她的战争颜料由厚重、充满活力的色彩笔触组成,她的盔甲由颜料管和溅满颜料的画布制成。 她胜利地站在一座被征服的空白画布上,身后是美丽多彩的风景,象征着艺术和创造力的力量。 半身像、特写、明亮通透的场景灯光、"

  n_prompt:
    - "错误的白平衡、黑暗、卡通、动漫、草图、最差质量、低质量、变形、扭曲、毁容、坏眼睛、错误的嘴唇、奇怪的嘴、坏牙齿、突变的手和手指、坏的解剖结构、错误的解剖结构、截肢、额外 肢体、肢体缺失、肢体浮动、肢体断开、突变、丑陋、恶心、坏图片、负手负"

简而言之,我们只保留了第4个,那么官方给出的示例,使用第4行Prompt所生成的图像如下所示。

AnimateDiff|文本生成视频

站长在现有的基础上,添加一句话她的头上戴着紫色的帽子之后,变为一个新的Prompt,至于n_prompt不作修改,修改后的内容如下:

FilmVelvia:
  base: "models/DreamBooth_LoRA/majicmixRealistic_v4.safetensors"
  path: "models/DreamBooth_LoRA/FilmVelvia2.safetensors"
  motion_module:
    - "models/Motion_Module/mm_sd_v14.ckpt"
    - "models/Motion_Module/mm_sd_v15.ckpt"

  seed:           [358675358833372813, 3519455280971923743, 11684545350557985081, 8696855302100399877]
  steps:          25
  guidance_scale: 7.5
  lora_alpha: 0.6

  prompt:
    - "In this lighthearted portrait, a woman is dressed as a fierce warrior, On her head was a purple cap,armed with an arsenal of paintbrushes and palette knives. Her war paint is composed of thick, vibrant strokes of color, and her armor is made of paint tubes and paint-splattered canvases. She stands victoriously atop a mountain of conquered blank canvases, with a beautiful, colorful landscape behind her, symbolizing the power of art and creativity. bust Portrait, close-up, Bright and transparent scene lighting, "

  n_prompt:
    - "wrong white balance, dark, cartoon, anime, sketches,worst quality, low quality, deformed, distorted, disfigured, bad eyes, wrong lips, weird mouth, bad teeth, mutated hands and fingers, bad anatomy, wrong anatomy, amputation, extra limb, missing limb, floating limbs, disconnected limbs, mutation, ugly, disgusting, bad_pictures, negative_hand-neg"

上面就是修改后的文件了,你可以不只是添加,而是完全修改,这与SD的使用无异。

修改完成后记得保存,然后在CMD中执行如下代码,强制切换工作路径。

cd /d D:\openai.wiki\AnimateDiff

激活已创建的Conda环境,这样才可以正常使用该项目,否则将会自动调用系统中的默认Python。

conda activate animatediff

执行下面的代码,其中7-FilmVelvia.yaml就是站长刚刚修改的文件。

python -m scripts.animate --config configs/prompts/7-FilmVelvia.yaml --pretrained_model_path models/StableDiffusion --L 16 --W 512 --H 512

执行完上在的代码会经常长时间的计算,站长就不等了,我这破电脑保守估计也要好几个小时。

总结

这个项目对于电脑的硬件条件还是比较苛刻的,不然速度会非常非常非常慢,

但这就目前来看,是本站所写的所有开源项目中,生成视频效果最好的一个了。如果你有时间想折腾的话,或许也可以搭配SD,使用SD生成图像预览,然后保留你的Prompt和相关设置复制到Yaml中做相应的修改,这样可以避免盲盒抽奖,有更大概率得到你想要的图像。

此文章由OpenAI开源维基百科原创发布,如若转载请注明出处:https://openai.wiki/animatediff.html

(0)
上一篇 2023-07-09 08:34
下一篇 2023-07-15 23:35

相关推荐

  • AI视频生成|发光裙子

    本文详解抖音热门的“AI发光裙舞视频”变现技术,覆盖LiblibAI、即梦AI、可灵AI三大工具的操作指南。从图片生成(SD本地/在线)、动态特效优化,到视频合成的全流程实战,揭秘单条视频涨粉几十万的秘密,并对比免费与付费工具的效率差异,助你快速入局AI内容变现。

    2025-02-27
    00152
  • AI推文自动剪辑|剪映

    本文提供零门槛AI短视频生成教程,详解如何利用DeepSeek生成历史故事文案,结合剪映AI一键成片功能快速制作高质量短视频,支持抖音、快手等平台发布。涵盖文案生成、素材匹配、音效配置及批量导出发布全流程,适合矩阵账号运营与AI变现。

    2025-02-27
    00304
  • text-to-video-synthesis|通义-文本生成视频

    文本生成视频大模型-英文-通用领域:本模型基于多阶段文本到视频生成扩散模型,输入描述文本,返回符合文本描述的视频。仅支持英文输入。模型参数约17亿,采用Unet3D结构。

    2023-05-15
    001.8K
  • VideoCrafter|视频生成工具

    VideoCrafter是一个开源视频生成和编辑工具箱,用于制作视频内容。支持从文本Prompt提示词生成视频,支持LoRA模型,另外对已有视频进行风格转换也是支持的,类似于SD绘画的图生图功能。

    2023-07-25
    012.2K

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

评论列表(2条)

  • open
    open 2023-07-29 15:39

    站长,不好意思,能问一下,这玩意儿能云端部署吗

  • 苏哥
    苏哥 2023-11-20 11:07

    感谢

微信