You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
修改depth为30
MODEL_DEPTH = 30 # TODO: =====> please specify MODEL_DEPTH <=====
手动下载了var_d30.pth文件
执行demo_sample.ipynb报错
RuntimeError: Error(s) in loading state_dict for VAR:
Unexpected key(s) in state_dict: "blocks.16.attn.scale_mul_1H11", "blocks.16.attn.q_bias", "blocks.16.attn.v_bias", "blocks.16.attn.zero_k_bias", "blocks.16.attn.mat_qkv.weight", "blocks.16.attn.proj.weight", "blocks.16.attn.proj.bias", "blocks.16.ffn.fc1.weight", "blocks.16.ffn.fc1.bias", "blocks.16.ffn.fc2.weight", "blocks.16.ffn.fc2.bias", "blocks.16.ada_lin.1.weight", "blocks.16.ada_lin.1.bias", "blocks.17.attn.scale_mul_1H11", "blocks.17.attn.q_bias", "blocks.17.attn.v_bias", "blocks.17.attn.zero_k_bias", "blocks.17.attn.mat_qkv.weight", "blocks.17.attn.proj.weight", "blocks.17.attn.proj.bias", "blocks.17.ffn.fc1.weight", "blocks.17.ffn.fc1.bias", "blocks.17.ffn.fc2.weight", "blocks.17.ffn.fc2.bias", "blocks.17.ada_lin.1.weight", "blocks.17.ada_lin.1.bias", "blocks.18.attn.scale_mul_1H11", "blocks.18.attn.q_bias", "blocks.18.attn.v_bias", "blocks.18.attn.zero_k_bias", "blocks.18.attn.mat_qkv.weight", "blocks.18.attn.proj.weight", "blocks.18.attn.proj.bias", "blocks.18.ffn.fc1.weight", "blocks.18.ffn.fc1.bias", "blocks.18.ffn.fc2.weight", "blocks.18.ffn.fc2.bias", "blocks.18.ada_lin.1.weight", "blocks.18.ada_lin.1.bias", "blocks.19.attn.scale_mul_1H11", "blocks.19.attn.q_bias", "blocks.19.attn.v_bias", "blocks.19.attn.zero_k_bias", "blocks.19.attn.mat_qkv.weight", "blocks.19.attn.proj.weight", "blocks.19.attn.proj.bias", "blocks.19.ffn.fc1.weight", "blocks.19.ffn.fc1.bias", "blocks.19.ffn.fc2.weight", "blocks.19.ffn.fc2.bias", "blocks.19.ada_lin.1.weight", "blocks.19.ada_lin.1.bias", "blocks.20.attn.scale_mul_1H11", "blocks.20.attn.q_bias", "blocks.20.attn.v_bias", "blocks.20.attn.zero_k_bias", "blocks.20.attn.mat_qkv.weight", "blocks.20.attn.proj.weight", "blocks.20.attn.proj.bias", "blocks.20.ffn.fc1.weight", "blocks.20.ffn.fc1.bias", "blocks.20.ffn.fc2.weight", "blocks.20.ffn.fc2.bias", "blocks.20.ada_lin.1.weight", "blocks.20.ada_lin.1.bias", "blocks.21.attn.scale_mul_1H11", "blocks.21.attn.q_bias", "blocks.21.attn.v_bias", "blocks.21.attn.zero_k_bias", "blocks.21.attn.mat_qkv.weight", "blocks.21.attn.proj.weight", "blocks.21.attn.proj.bias", "blocks.21.ffn.fc1.weight", "blocks.21.ffn.fc1.bias", "blocks.21.ffn.fc2.weight", "blocks.21.ffn.fc2.bias", "blocks.21.ada_lin.1.weight", "blocks.21.ada_lin.1.bias", "blocks.22.attn.scale_mul_1H11", "blocks.22.attn.q_bias", "blocks.22.attn.v_bias", "blocks.22.attn.zero_k_bias", "blocks.22.attn.mat_qkv.weight", "blocks.22.attn.proj.weight", "blocks.22.attn.proj.bias", "blocks.22.ffn.fc1.weight", "blocks.22.ffn.fc1.bias", "blocks.22.ffn.fc2.weight", "blocks.22.ffn.fc2.bias", "blocks.22.ada_lin.1.weight", "blocks.22.ada_lin.1.bias", "blocks.23.attn.scale_mul_1H11", "blocks.23.attn.q_bias", "blocks.23.attn.v_bias", "blocks.23.attn.zero_k_bias", "blocks.23.attn.mat_qkv.weight", "blocks.23.attn.proj.weight", "blocks.23.attn.proj.bias", "blocks.23.ffn.fc1.weight", "blocks.23.ffn.fc1.bias", "blocks.23.ffn.fc2.weight", "blocks.23.ffn.fc2.bias", "blocks.23.ada_lin.1.weight", "blocks.23.ada_lin.1.bias", "blocks.24.attn.scale_mul_1H11", "blocks.24.attn.q_bias", "blocks.24.attn.v_bias", "blocks.24.attn.zero_k_bias", "blocks.24.attn.mat_qkv.weight", "blocks.24.attn.proj.weight", "blocks.24.attn.proj.bias", "blocks.24.ffn.fc1.weight", "blocks.24.ffn.fc1.bias", "blocks.24.ffn.fc2.weight", "blocks.24.ffn.fc2.bias", "blocks.24.ada_lin.1.weight", "blocks.24.ada_lin.1.bias", "blocks.25.attn.scale_mul_1H11", "blocks.25.attn.q_bias", "blocks.25.attn.v_bias", "blocks.25.attn.zero_k_bias", "blocks.25.attn.mat_qkv.weight", "blocks.25.attn.proj.weight", "blocks.25.attn.proj.bias", "blocks.25.ffn.fc1.weight", "blocks.25.ffn.fc1.bias", "blocks.25.ffn.fc2.weight", "blocks.25.ffn.fc2.bias", "blocks.25.ada_lin.1.weight", "blocks.25.ada_lin.1.bias", "blocks.26.attn.scale_mul_1H11", "blocks.26.attn.q_bias", "blocks.26.attn.v_bias", "blocks.26.attn.zero_k_bias", "blocks.26.attn.mat_qkv.weight", "blocks.26.attn.proj.weight", "blocks.26.attn.proj.bias", "blocks.26.ffn.fc1.weight", "blocks.26.ffn.fc1.bias", "blocks.26.ffn.fc2.weight", "blocks.26.ffn.fc2.bias", "blocks.26.ada_lin.1.weight", "blocks.26.ada_lin.1.bias", "blocks.27.attn.scale_mul_1H11", "blocks.27.attn.q_bias", "blocks.27.attn.v_bias", "blocks.27.attn.zero_k_bias", "blocks.27.attn.mat_qkv.weight", "blocks.27.attn.proj.weight", "blocks.27.attn.proj.bias", "blocks.27.ffn.fc1.weight", "blocks.27.ffn.fc1.bias", "blocks.27.ffn.fc2.weight", "blocks.27.ffn.fc2.bias", "blocks.27.ada_lin.1.weight", "blocks.27.ada_lin.1.bias", "blocks.28.attn.scale_mul_1H11", "blocks.28.attn.q_bias", "blocks.28.attn.v_bias", "blocks.28.attn.zero_k_bias", "blocks.28.attn.mat_qkv.weight", "blocks.28.attn.proj.weight", "blocks.28.attn.proj.bias", "blocks.28.ffn.fc1.weight", "blocks.28.ffn.fc1.bias", "blocks.28.ffn.fc2.weight", "blocks.28.ffn.fc2.bias", "blocks.28.ada_lin.1.weight", "blocks.28.ada_lin.1.bias", "blocks.29.attn.scale_mul_1H11", "blocks.29.attn.q_bias", "blocks.29.attn.v_bias", "blocks.29.attn.zero_k_bias", "blocks.29.attn.mat_qkv.weight", "blocks.29.attn.proj.weight", "blocks.29.attn.proj.bias", "blocks.29.ffn.fc1.weight", "blocks.29.ffn.fc1.bias", "blocks.29.ffn.fc2.weight", "blocks.29.ffn.fc2.bias", "blocks.29.ada_lin.1.weight", "blocks.29.ada_lin.1.bias".
size mismatch for pos_start: copying a param with shape torch.Size([1, 1, 1920]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
size mismatch for pos_1LC: copying a param with shape torch.Size([1, 680, 1920]) from checkpoint, the shape in current model is torch.Size([1, 680, 1024]).
size mismatch for word_embed.weight: copying a param with shape torch.Size([1920, 32]) from checkpoint, the shape in current model is torch.Size([1024, 32]).
size mismatch for word_embed.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for class_emb.weight: copying a param with shape torch.Size([1001, 1920]) from checkpoint, the shape in current model is torch.Size([1001, 1024]).
size mismatch for lvl_embed.weight: copying a param with shape torch.Size([10, 1920]) from checkpoint, the shape in current model is torch.Size([10, 1024]).
size mismatch for blocks.0.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.0.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.0.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.0.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.0.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.0.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.0.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.0.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.1.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.1.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.1.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.1.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.1.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.1.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.1.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.1.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.2.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.2.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.2.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.2.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.2.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.2.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.2.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.2.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.3.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.3.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.3.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.3.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.3.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.3.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.3.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.3.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.4.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.4.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.4.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.4.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.4.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.4.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.4.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.4.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.5.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.5.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.5.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.5.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.5.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.5.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.5.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.5.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.6.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.6.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.6.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.6.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.6.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.6.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.6.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.6.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.7.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.7.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.7.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.7.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.7.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.7.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.7.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.7.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.8.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.8.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.8.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.8.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.8.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.8.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.8.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.8.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.9.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.9.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.9.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.9.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.9.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.9.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.9.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.9.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.10.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.10.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.10.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.10.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.10.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.10.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.10.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.10.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.11.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.11.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.11.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.11.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.11.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.11.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.11.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.11.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.12.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.12.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.12.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.12.attn.zero_k_bias: copying a para
The text was updated successfully, but these errors were encountered:
修改depth为30
MODEL_DEPTH = 30 # TODO: =====> please specify MODEL_DEPTH <=====
手动下载了var_d30.pth文件
执行demo_sample.ipynb报错
RuntimeError: Error(s) in loading state_dict for VAR:
Unexpected key(s) in state_dict: "blocks.16.attn.scale_mul_1H11", "blocks.16.attn.q_bias", "blocks.16.attn.v_bias", "blocks.16.attn.zero_k_bias", "blocks.16.attn.mat_qkv.weight", "blocks.16.attn.proj.weight", "blocks.16.attn.proj.bias", "blocks.16.ffn.fc1.weight", "blocks.16.ffn.fc1.bias", "blocks.16.ffn.fc2.weight", "blocks.16.ffn.fc2.bias", "blocks.16.ada_lin.1.weight", "blocks.16.ada_lin.1.bias", "blocks.17.attn.scale_mul_1H11", "blocks.17.attn.q_bias", "blocks.17.attn.v_bias", "blocks.17.attn.zero_k_bias", "blocks.17.attn.mat_qkv.weight", "blocks.17.attn.proj.weight", "blocks.17.attn.proj.bias", "blocks.17.ffn.fc1.weight", "blocks.17.ffn.fc1.bias", "blocks.17.ffn.fc2.weight", "blocks.17.ffn.fc2.bias", "blocks.17.ada_lin.1.weight", "blocks.17.ada_lin.1.bias", "blocks.18.attn.scale_mul_1H11", "blocks.18.attn.q_bias", "blocks.18.attn.v_bias", "blocks.18.attn.zero_k_bias", "blocks.18.attn.mat_qkv.weight", "blocks.18.attn.proj.weight", "blocks.18.attn.proj.bias", "blocks.18.ffn.fc1.weight", "blocks.18.ffn.fc1.bias", "blocks.18.ffn.fc2.weight", "blocks.18.ffn.fc2.bias", "blocks.18.ada_lin.1.weight", "blocks.18.ada_lin.1.bias", "blocks.19.attn.scale_mul_1H11", "blocks.19.attn.q_bias", "blocks.19.attn.v_bias", "blocks.19.attn.zero_k_bias", "blocks.19.attn.mat_qkv.weight", "blocks.19.attn.proj.weight", "blocks.19.attn.proj.bias", "blocks.19.ffn.fc1.weight", "blocks.19.ffn.fc1.bias", "blocks.19.ffn.fc2.weight", "blocks.19.ffn.fc2.bias", "blocks.19.ada_lin.1.weight", "blocks.19.ada_lin.1.bias", "blocks.20.attn.scale_mul_1H11", "blocks.20.attn.q_bias", "blocks.20.attn.v_bias", "blocks.20.attn.zero_k_bias", "blocks.20.attn.mat_qkv.weight", "blocks.20.attn.proj.weight", "blocks.20.attn.proj.bias", "blocks.20.ffn.fc1.weight", "blocks.20.ffn.fc1.bias", "blocks.20.ffn.fc2.weight", "blocks.20.ffn.fc2.bias", "blocks.20.ada_lin.1.weight", "blocks.20.ada_lin.1.bias", "blocks.21.attn.scale_mul_1H11", "blocks.21.attn.q_bias", "blocks.21.attn.v_bias", "blocks.21.attn.zero_k_bias", "blocks.21.attn.mat_qkv.weight", "blocks.21.attn.proj.weight", "blocks.21.attn.proj.bias", "blocks.21.ffn.fc1.weight", "blocks.21.ffn.fc1.bias", "blocks.21.ffn.fc2.weight", "blocks.21.ffn.fc2.bias", "blocks.21.ada_lin.1.weight", "blocks.21.ada_lin.1.bias", "blocks.22.attn.scale_mul_1H11", "blocks.22.attn.q_bias", "blocks.22.attn.v_bias", "blocks.22.attn.zero_k_bias", "blocks.22.attn.mat_qkv.weight", "blocks.22.attn.proj.weight", "blocks.22.attn.proj.bias", "blocks.22.ffn.fc1.weight", "blocks.22.ffn.fc1.bias", "blocks.22.ffn.fc2.weight", "blocks.22.ffn.fc2.bias", "blocks.22.ada_lin.1.weight", "blocks.22.ada_lin.1.bias", "blocks.23.attn.scale_mul_1H11", "blocks.23.attn.q_bias", "blocks.23.attn.v_bias", "blocks.23.attn.zero_k_bias", "blocks.23.attn.mat_qkv.weight", "blocks.23.attn.proj.weight", "blocks.23.attn.proj.bias", "blocks.23.ffn.fc1.weight", "blocks.23.ffn.fc1.bias", "blocks.23.ffn.fc2.weight", "blocks.23.ffn.fc2.bias", "blocks.23.ada_lin.1.weight", "blocks.23.ada_lin.1.bias", "blocks.24.attn.scale_mul_1H11", "blocks.24.attn.q_bias", "blocks.24.attn.v_bias", "blocks.24.attn.zero_k_bias", "blocks.24.attn.mat_qkv.weight", "blocks.24.attn.proj.weight", "blocks.24.attn.proj.bias", "blocks.24.ffn.fc1.weight", "blocks.24.ffn.fc1.bias", "blocks.24.ffn.fc2.weight", "blocks.24.ffn.fc2.bias", "blocks.24.ada_lin.1.weight", "blocks.24.ada_lin.1.bias", "blocks.25.attn.scale_mul_1H11", "blocks.25.attn.q_bias", "blocks.25.attn.v_bias", "blocks.25.attn.zero_k_bias", "blocks.25.attn.mat_qkv.weight", "blocks.25.attn.proj.weight", "blocks.25.attn.proj.bias", "blocks.25.ffn.fc1.weight", "blocks.25.ffn.fc1.bias", "blocks.25.ffn.fc2.weight", "blocks.25.ffn.fc2.bias", "blocks.25.ada_lin.1.weight", "blocks.25.ada_lin.1.bias", "blocks.26.attn.scale_mul_1H11", "blocks.26.attn.q_bias", "blocks.26.attn.v_bias", "blocks.26.attn.zero_k_bias", "blocks.26.attn.mat_qkv.weight", "blocks.26.attn.proj.weight", "blocks.26.attn.proj.bias", "blocks.26.ffn.fc1.weight", "blocks.26.ffn.fc1.bias", "blocks.26.ffn.fc2.weight", "blocks.26.ffn.fc2.bias", "blocks.26.ada_lin.1.weight", "blocks.26.ada_lin.1.bias", "blocks.27.attn.scale_mul_1H11", "blocks.27.attn.q_bias", "blocks.27.attn.v_bias", "blocks.27.attn.zero_k_bias", "blocks.27.attn.mat_qkv.weight", "blocks.27.attn.proj.weight", "blocks.27.attn.proj.bias", "blocks.27.ffn.fc1.weight", "blocks.27.ffn.fc1.bias", "blocks.27.ffn.fc2.weight", "blocks.27.ffn.fc2.bias", "blocks.27.ada_lin.1.weight", "blocks.27.ada_lin.1.bias", "blocks.28.attn.scale_mul_1H11", "blocks.28.attn.q_bias", "blocks.28.attn.v_bias", "blocks.28.attn.zero_k_bias", "blocks.28.attn.mat_qkv.weight", "blocks.28.attn.proj.weight", "blocks.28.attn.proj.bias", "blocks.28.ffn.fc1.weight", "blocks.28.ffn.fc1.bias", "blocks.28.ffn.fc2.weight", "blocks.28.ffn.fc2.bias", "blocks.28.ada_lin.1.weight", "blocks.28.ada_lin.1.bias", "blocks.29.attn.scale_mul_1H11", "blocks.29.attn.q_bias", "blocks.29.attn.v_bias", "blocks.29.attn.zero_k_bias", "blocks.29.attn.mat_qkv.weight", "blocks.29.attn.proj.weight", "blocks.29.attn.proj.bias", "blocks.29.ffn.fc1.weight", "blocks.29.ffn.fc1.bias", "blocks.29.ffn.fc2.weight", "blocks.29.ffn.fc2.bias", "blocks.29.ada_lin.1.weight", "blocks.29.ada_lin.1.bias".
size mismatch for pos_start: copying a param with shape torch.Size([1, 1, 1920]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
size mismatch for pos_1LC: copying a param with shape torch.Size([1, 680, 1920]) from checkpoint, the shape in current model is torch.Size([1, 680, 1024]).
size mismatch for word_embed.weight: copying a param with shape torch.Size([1920, 32]) from checkpoint, the shape in current model is torch.Size([1024, 32]).
size mismatch for word_embed.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for class_emb.weight: copying a param with shape torch.Size([1001, 1920]) from checkpoint, the shape in current model is torch.Size([1001, 1024]).
size mismatch for lvl_embed.weight: copying a param with shape torch.Size([10, 1920]) from checkpoint, the shape in current model is torch.Size([10, 1024]).
size mismatch for blocks.0.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.0.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.0.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.0.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.0.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.0.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.0.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.0.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.1.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.1.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.1.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.1.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.1.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.1.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.1.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.1.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.2.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.2.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.2.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.2.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.2.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.2.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.2.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.2.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.3.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.3.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.3.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.3.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.3.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.3.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.3.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.3.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.4.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.4.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.4.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.4.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.4.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.4.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.4.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.4.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.5.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.5.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.5.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.5.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.5.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.5.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.5.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.5.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.6.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.6.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.6.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.6.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.6.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.6.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.6.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.6.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.7.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.7.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.7.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.7.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.7.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.7.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.7.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.7.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.8.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.8.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.8.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.8.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.8.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.8.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.8.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.8.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.9.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.9.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.9.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.9.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.9.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.9.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.9.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.9.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.10.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.10.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.10.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.10.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.10.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.10.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.10.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.10.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.11.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.11.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.11.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.11.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.11.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.11.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.11.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.11.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.12.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.12.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.12.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.12.attn.zero_k_bias: copying a para
The text was updated successfully, but these errors were encountered: