加载权重问题 #136

RealBar · 2025-01-27T06:51:32Z

修改depth为30
MODEL_DEPTH = 30 # TODO: =====> please specify MODEL_DEPTH <=====
手动下载了var_d30.pth文件
执行demo_sample.ipynb报错
RuntimeError: Error(s) in loading state_dict for VAR:
Unexpected key(s) in state_dict: "blocks.16.attn.scale_mul_1H11", "blocks.16.attn.q_bias", "blocks.16.attn.v_bias", "blocks.16.attn.zero_k_bias", "blocks.16.attn.mat_qkv.weight", "blocks.16.attn.proj.weight", "blocks.16.attn.proj.bias", "blocks.16.ffn.fc1.weight", "blocks.16.ffn.fc1.bias", "blocks.16.ffn.fc2.weight", "blocks.16.ffn.fc2.bias", "blocks.16.ada_lin.1.weight", "blocks.16.ada_lin.1.bias", "blocks.17.attn.scale_mul_1H11", "blocks.17.attn.q_bias", "blocks.17.attn.v_bias", "blocks.17.attn.zero_k_bias", "blocks.17.attn.mat_qkv.weight", "blocks.17.attn.proj.weight", "blocks.17.attn.proj.bias", "blocks.17.ffn.fc1.weight", "blocks.17.ffn.fc1.bias", "blocks.17.ffn.fc2.weight", "blocks.17.ffn.fc2.bias", "blocks.17.ada_lin.1.weight", "blocks.17.ada_lin.1.bias", "blocks.18.attn.scale_mul_1H11", "blocks.18.attn.q_bias", "blocks.18.attn.v_bias", "blocks.18.attn.zero_k_bias", "blocks.18.attn.mat_qkv.weight", "blocks.18.attn.proj.weight", "blocks.18.attn.proj.bias", "blocks.18.ffn.fc1.weight", "blocks.18.ffn.fc1.bias", "blocks.18.ffn.fc2.weight", "blocks.18.ffn.fc2.bias", "blocks.18.ada_lin.1.weight", "blocks.18.ada_lin.1.bias", "blocks.19.attn.scale_mul_1H11", "blocks.19.attn.q_bias", "blocks.19.attn.v_bias", "blocks.19.attn.zero_k_bias", "blocks.19.attn.mat_qkv.weight", "blocks.19.attn.proj.weight", "blocks.19.attn.proj.bias", "blocks.19.ffn.fc1.weight", "blocks.19.ffn.fc1.bias", "blocks.19.ffn.fc2.weight", "blocks.19.ffn.fc2.bias", "blocks.19.ada_lin.1.weight", "blocks.19.ada_lin.1.bias", "blocks.20.attn.scale_mul_1H11", "blocks.20.attn.q_bias", "blocks.20.attn.v_bias", "blocks.20.attn.zero_k_bias", "blocks.20.attn.mat_qkv.weight", "blocks.20.attn.proj.weight", "blocks.20.attn.proj.bias", "blocks.20.ffn.fc1.weight", "blocks.20.ffn.fc1.bias", "blocks.20.ffn.fc2.weight", "blocks.20.ffn.fc2.bias", "blocks.20.ada_lin.1.weight", "blocks.20.ada_lin.1.bias", "blocks.21.attn.scale_mul_1H11", "blocks.21.attn.q_bias", "blocks.21.attn.v_bias", "blocks.21.attn.zero_k_bias", "blocks.21.attn.mat_qkv.weight", "blocks.21.attn.proj.weight", "blocks.21.attn.proj.bias", "blocks.21.ffn.fc1.weight", "blocks.21.ffn.fc1.bias", "blocks.21.ffn.fc2.weight", "blocks.21.ffn.fc2.bias", "blocks.21.ada_lin.1.weight", "blocks.21.ada_lin.1.bias", "blocks.22.attn.scale_mul_1H11", "blocks.22.attn.q_bias", "blocks.22.attn.v_bias", "blocks.22.attn.zero_k_bias", "blocks.22.attn.mat_qkv.weight", "blocks.22.attn.proj.weight", "blocks.22.attn.proj.bias", "blocks.22.ffn.fc1.weight", "blocks.22.ffn.fc1.bias", "blocks.22.ffn.fc2.weight", "blocks.22.ffn.fc2.bias", "blocks.22.ada_lin.1.weight", "blocks.22.ada_lin.1.bias", "blocks.23.attn.scale_mul_1H11", "blocks.23.attn.q_bias", "blocks.23.attn.v_bias", "blocks.23.attn.zero_k_bias", "blocks.23.attn.mat_qkv.weight", "blocks.23.attn.proj.weight", "blocks.23.attn.proj.bias", "blocks.23.ffn.fc1.weight", "blocks.23.ffn.fc1.bias", "blocks.23.ffn.fc2.weight", "blocks.23.ffn.fc2.bias", "blocks.23.ada_lin.1.weight", "blocks.23.ada_lin.1.bias", "blocks.24.attn.scale_mul_1H11", "blocks.24.attn.q_bias", "blocks.24.attn.v_bias", "blocks.24.attn.zero_k_bias", "blocks.24.attn.mat_qkv.weight", "blocks.24.attn.proj.weight", "blocks.24.attn.proj.bias", "blocks.24.ffn.fc1.weight", "blocks.24.ffn.fc1.bias", "blocks.24.ffn.fc2.weight", "blocks.24.ffn.fc2.bias", "blocks.24.ada_lin.1.weight", "blocks.24.ada_lin.1.bias", "blocks.25.attn.scale_mul_1H11", "blocks.25.attn.q_bias", "blocks.25.attn.v_bias", "blocks.25.attn.zero_k_bias", "blocks.25.attn.mat_qkv.weight", "blocks.25.attn.proj.weight", "blocks.25.attn.proj.bias", "blocks.25.ffn.fc1.weight", "blocks.25.ffn.fc1.bias", "blocks.25.ffn.fc2.weight", "blocks.25.ffn.fc2.bias", "blocks.25.ada_lin.1.weight", "blocks.25.ada_lin.1.bias", "blocks.26.attn.scale_mul_1H11", "blocks.26.attn.q_bias", "blocks.26.attn.v_bias", "blocks.26.attn.zero_k_bias", "blocks.26.attn.mat_qkv.weight", "blocks.26.attn.proj.weight", "blocks.26.attn.proj.bias", "blocks.26.ffn.fc1.weight", "blocks.26.ffn.fc1.bias", "blocks.26.ffn.fc2.weight", "blocks.26.ffn.fc2.bias", "blocks.26.ada_lin.1.weight", "blocks.26.ada_lin.1.bias", "blocks.27.attn.scale_mul_1H11", "blocks.27.attn.q_bias", "blocks.27.attn.v_bias", "blocks.27.attn.zero_k_bias", "blocks.27.attn.mat_qkv.weight", "blocks.27.attn.proj.weight", "blocks.27.attn.proj.bias", "blocks.27.ffn.fc1.weight", "blocks.27.ffn.fc1.bias", "blocks.27.ffn.fc2.weight", "blocks.27.ffn.fc2.bias", "blocks.27.ada_lin.1.weight", "blocks.27.ada_lin.1.bias", "blocks.28.attn.scale_mul_1H11", "blocks.28.attn.q_bias", "blocks.28.attn.v_bias", "blocks.28.attn.zero_k_bias", "blocks.28.attn.mat_qkv.weight", "blocks.28.attn.proj.weight", "blocks.28.attn.proj.bias", "blocks.28.ffn.fc1.weight", "blocks.28.ffn.fc1.bias", "blocks.28.ffn.fc2.weight", "blocks.28.ffn.fc2.bias", "blocks.28.ada_lin.1.weight", "blocks.28.ada_lin.1.bias", "blocks.29.attn.scale_mul_1H11", "blocks.29.attn.q_bias", "blocks.29.attn.v_bias", "blocks.29.attn.zero_k_bias", "blocks.29.attn.mat_qkv.weight", "blocks.29.attn.proj.weight", "blocks.29.attn.proj.bias", "blocks.29.ffn.fc1.weight", "blocks.29.ffn.fc1.bias", "blocks.29.ffn.fc2.weight", "blocks.29.ffn.fc2.bias", "blocks.29.ada_lin.1.weight", "blocks.29.ada_lin.1.bias".
size mismatch for pos_start: copying a param with shape torch.Size([1, 1, 1920]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
size mismatch for pos_1LC: copying a param with shape torch.Size([1, 680, 1920]) from checkpoint, the shape in current model is torch.Size([1, 680, 1024]).
size mismatch for word_embed.weight: copying a param with shape torch.Size([1920, 32]) from checkpoint, the shape in current model is torch.Size([1024, 32]).
size mismatch for word_embed.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for class_emb.weight: copying a param with shape torch.Size([1001, 1920]) from checkpoint, the shape in current model is torch.Size([1001, 1024]).
size mismatch for lvl_embed.weight: copying a param with shape torch.Size([10, 1920]) from checkpoint, the shape in current model is torch.Size([10, 1024]).
size mismatch for blocks.0.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.0.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.0.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.0.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.0.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.0.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.0.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.0.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.1.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.1.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.1.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.1.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.1.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.1.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.1.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.1.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.2.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.2.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.2.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.2.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.2.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.2.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.2.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.2.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.3.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.3.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.3.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.3.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.3.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.3.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.3.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.3.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.4.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.4.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.4.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.4.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.4.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.4.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.4.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.4.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.5.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.5.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.5.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.5.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.5.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.5.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.5.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.5.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.6.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.6.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.6.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.6.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.6.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.6.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.6.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.6.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.7.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.7.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.7.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.7.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.7.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.7.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.7.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.7.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.8.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.8.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.8.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.8.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.8.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.8.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.8.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.8.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.9.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.9.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.9.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.9.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.9.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.9.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.9.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.9.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.10.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.10.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.10.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.10.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.10.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.10.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.10.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.10.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.11.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.11.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.11.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.11.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.11.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.11.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.11.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.11.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.12.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.12.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.12.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.12.attn.zero_k_bias: copying a para

pyshen-watson · 2025-02-03T01:41:46Z

我也遇到了一樣的問題

melodyincopenhagen · 2025-02-20T10:25:18Z

请问解决了吗

pyshen-watson · 2025-02-23T13:41:58Z

我這裡解決了，似乎是加載權重時shared_aln不匹配的關係

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

加载权重问题 #136

加载权重问题 #136

RealBar commented Jan 27, 2025

pyshen-watson commented Feb 3, 2025

melodyincopenhagen commented Feb 20, 2025

pyshen-watson commented Feb 23, 2025

加载权重问题 #136

加载权重问题 #136

Comments

RealBar commented Jan 27, 2025

pyshen-watson commented Feb 3, 2025

melodyincopenhagen commented Feb 20, 2025

pyshen-watson commented Feb 23, 2025