Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

加载权重问题 #136

Open
RealBar opened this issue Jan 27, 2025 · 3 comments
Open

加载权重问题 #136

RealBar opened this issue Jan 27, 2025 · 3 comments

Comments

@RealBar
Copy link

RealBar commented Jan 27, 2025

修改depth为30
MODEL_DEPTH = 30 # TODO: =====> please specify MODEL_DEPTH <=====
手动下载了var_d30.pth文件
执行demo_sample.ipynb报错
RuntimeError: Error(s) in loading state_dict for VAR:
Unexpected key(s) in state_dict: "blocks.16.attn.scale_mul_1H11", "blocks.16.attn.q_bias", "blocks.16.attn.v_bias", "blocks.16.attn.zero_k_bias", "blocks.16.attn.mat_qkv.weight", "blocks.16.attn.proj.weight", "blocks.16.attn.proj.bias", "blocks.16.ffn.fc1.weight", "blocks.16.ffn.fc1.bias", "blocks.16.ffn.fc2.weight", "blocks.16.ffn.fc2.bias", "blocks.16.ada_lin.1.weight", "blocks.16.ada_lin.1.bias", "blocks.17.attn.scale_mul_1H11", "blocks.17.attn.q_bias", "blocks.17.attn.v_bias", "blocks.17.attn.zero_k_bias", "blocks.17.attn.mat_qkv.weight", "blocks.17.attn.proj.weight", "blocks.17.attn.proj.bias", "blocks.17.ffn.fc1.weight", "blocks.17.ffn.fc1.bias", "blocks.17.ffn.fc2.weight", "blocks.17.ffn.fc2.bias", "blocks.17.ada_lin.1.weight", "blocks.17.ada_lin.1.bias", "blocks.18.attn.scale_mul_1H11", "blocks.18.attn.q_bias", "blocks.18.attn.v_bias", "blocks.18.attn.zero_k_bias", "blocks.18.attn.mat_qkv.weight", "blocks.18.attn.proj.weight", "blocks.18.attn.proj.bias", "blocks.18.ffn.fc1.weight", "blocks.18.ffn.fc1.bias", "blocks.18.ffn.fc2.weight", "blocks.18.ffn.fc2.bias", "blocks.18.ada_lin.1.weight", "blocks.18.ada_lin.1.bias", "blocks.19.attn.scale_mul_1H11", "blocks.19.attn.q_bias", "blocks.19.attn.v_bias", "blocks.19.attn.zero_k_bias", "blocks.19.attn.mat_qkv.weight", "blocks.19.attn.proj.weight", "blocks.19.attn.proj.bias", "blocks.19.ffn.fc1.weight", "blocks.19.ffn.fc1.bias", "blocks.19.ffn.fc2.weight", "blocks.19.ffn.fc2.bias", "blocks.19.ada_lin.1.weight", "blocks.19.ada_lin.1.bias", "blocks.20.attn.scale_mul_1H11", "blocks.20.attn.q_bias", "blocks.20.attn.v_bias", "blocks.20.attn.zero_k_bias", "blocks.20.attn.mat_qkv.weight", "blocks.20.attn.proj.weight", "blocks.20.attn.proj.bias", "blocks.20.ffn.fc1.weight", "blocks.20.ffn.fc1.bias", "blocks.20.ffn.fc2.weight", "blocks.20.ffn.fc2.bias", "blocks.20.ada_lin.1.weight", "blocks.20.ada_lin.1.bias", "blocks.21.attn.scale_mul_1H11", "blocks.21.attn.q_bias", "blocks.21.attn.v_bias", "blocks.21.attn.zero_k_bias", "blocks.21.attn.mat_qkv.weight", "blocks.21.attn.proj.weight", "blocks.21.attn.proj.bias", "blocks.21.ffn.fc1.weight", "blocks.21.ffn.fc1.bias", "blocks.21.ffn.fc2.weight", "blocks.21.ffn.fc2.bias", "blocks.21.ada_lin.1.weight", "blocks.21.ada_lin.1.bias", "blocks.22.attn.scale_mul_1H11", "blocks.22.attn.q_bias", "blocks.22.attn.v_bias", "blocks.22.attn.zero_k_bias", "blocks.22.attn.mat_qkv.weight", "blocks.22.attn.proj.weight", "blocks.22.attn.proj.bias", "blocks.22.ffn.fc1.weight", "blocks.22.ffn.fc1.bias", "blocks.22.ffn.fc2.weight", "blocks.22.ffn.fc2.bias", "blocks.22.ada_lin.1.weight", "blocks.22.ada_lin.1.bias", "blocks.23.attn.scale_mul_1H11", "blocks.23.attn.q_bias", "blocks.23.attn.v_bias", "blocks.23.attn.zero_k_bias", "blocks.23.attn.mat_qkv.weight", "blocks.23.attn.proj.weight", "blocks.23.attn.proj.bias", "blocks.23.ffn.fc1.weight", "blocks.23.ffn.fc1.bias", "blocks.23.ffn.fc2.weight", "blocks.23.ffn.fc2.bias", "blocks.23.ada_lin.1.weight", "blocks.23.ada_lin.1.bias", "blocks.24.attn.scale_mul_1H11", "blocks.24.attn.q_bias", "blocks.24.attn.v_bias", "blocks.24.attn.zero_k_bias", "blocks.24.attn.mat_qkv.weight", "blocks.24.attn.proj.weight", "blocks.24.attn.proj.bias", "blocks.24.ffn.fc1.weight", "blocks.24.ffn.fc1.bias", "blocks.24.ffn.fc2.weight", "blocks.24.ffn.fc2.bias", "blocks.24.ada_lin.1.weight", "blocks.24.ada_lin.1.bias", "blocks.25.attn.scale_mul_1H11", "blocks.25.attn.q_bias", "blocks.25.attn.v_bias", "blocks.25.attn.zero_k_bias", "blocks.25.attn.mat_qkv.weight", "blocks.25.attn.proj.weight", "blocks.25.attn.proj.bias", "blocks.25.ffn.fc1.weight", "blocks.25.ffn.fc1.bias", "blocks.25.ffn.fc2.weight", "blocks.25.ffn.fc2.bias", "blocks.25.ada_lin.1.weight", "blocks.25.ada_lin.1.bias", "blocks.26.attn.scale_mul_1H11", "blocks.26.attn.q_bias", "blocks.26.attn.v_bias", "blocks.26.attn.zero_k_bias", "blocks.26.attn.mat_qkv.weight", "blocks.26.attn.proj.weight", "blocks.26.attn.proj.bias", "blocks.26.ffn.fc1.weight", "blocks.26.ffn.fc1.bias", "blocks.26.ffn.fc2.weight", "blocks.26.ffn.fc2.bias", "blocks.26.ada_lin.1.weight", "blocks.26.ada_lin.1.bias", "blocks.27.attn.scale_mul_1H11", "blocks.27.attn.q_bias", "blocks.27.attn.v_bias", "blocks.27.attn.zero_k_bias", "blocks.27.attn.mat_qkv.weight", "blocks.27.attn.proj.weight", "blocks.27.attn.proj.bias", "blocks.27.ffn.fc1.weight", "blocks.27.ffn.fc1.bias", "blocks.27.ffn.fc2.weight", "blocks.27.ffn.fc2.bias", "blocks.27.ada_lin.1.weight", "blocks.27.ada_lin.1.bias", "blocks.28.attn.scale_mul_1H11", "blocks.28.attn.q_bias", "blocks.28.attn.v_bias", "blocks.28.attn.zero_k_bias", "blocks.28.attn.mat_qkv.weight", "blocks.28.attn.proj.weight", "blocks.28.attn.proj.bias", "blocks.28.ffn.fc1.weight", "blocks.28.ffn.fc1.bias", "blocks.28.ffn.fc2.weight", "blocks.28.ffn.fc2.bias", "blocks.28.ada_lin.1.weight", "blocks.28.ada_lin.1.bias", "blocks.29.attn.scale_mul_1H11", "blocks.29.attn.q_bias", "blocks.29.attn.v_bias", "blocks.29.attn.zero_k_bias", "blocks.29.attn.mat_qkv.weight", "blocks.29.attn.proj.weight", "blocks.29.attn.proj.bias", "blocks.29.ffn.fc1.weight", "blocks.29.ffn.fc1.bias", "blocks.29.ffn.fc2.weight", "blocks.29.ffn.fc2.bias", "blocks.29.ada_lin.1.weight", "blocks.29.ada_lin.1.bias".
size mismatch for pos_start: copying a param with shape torch.Size([1, 1, 1920]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]).
size mismatch for pos_1LC: copying a param with shape torch.Size([1, 680, 1920]) from checkpoint, the shape in current model is torch.Size([1, 680, 1024]).
size mismatch for word_embed.weight: copying a param with shape torch.Size([1920, 32]) from checkpoint, the shape in current model is torch.Size([1024, 32]).
size mismatch for word_embed.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for class_emb.weight: copying a param with shape torch.Size([1001, 1920]) from checkpoint, the shape in current model is torch.Size([1001, 1024]).
size mismatch for lvl_embed.weight: copying a param with shape torch.Size([10, 1920]) from checkpoint, the shape in current model is torch.Size([10, 1024]).
size mismatch for blocks.0.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.0.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.0.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.0.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.0.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.0.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.0.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.0.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.0.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.1.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.1.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.1.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.1.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.1.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.1.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.1.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.1.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.1.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.2.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.2.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.2.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.2.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.2.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.2.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.2.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.2.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.2.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.3.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.3.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.3.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.3.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.3.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.3.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.3.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.3.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.3.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.4.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.4.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.4.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.4.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.4.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.4.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.4.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.4.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.4.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.5.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.5.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.5.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.5.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.5.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.5.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.5.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.5.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.5.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.6.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.6.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.6.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.6.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.6.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.6.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.6.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.6.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.6.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.7.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.7.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.7.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.7.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.7.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.7.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.7.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.7.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.7.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.8.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.8.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.8.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.8.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.8.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.8.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.8.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.8.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.8.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.9.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.9.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.9.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.9.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.9.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.9.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.9.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.9.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.9.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.10.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.10.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.10.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.10.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.10.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.10.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.10.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.10.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.10.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.11.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.11.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.attn.zero_k_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.attn.mat_qkv.weight: copying a param with shape torch.Size([5760, 1920]) from checkpoint, the shape in current model is torch.Size([3072, 1024]).
size mismatch for blocks.11.attn.proj.weight: copying a param with shape torch.Size([1920, 1920]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for blocks.11.attn.proj.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.ffn.fc1.weight: copying a param with shape torch.Size([7680, 1920]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for blocks.11.ffn.fc1.bias: copying a param with shape torch.Size([7680]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for blocks.11.ffn.fc2.weight: copying a param with shape torch.Size([1920, 7680]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for blocks.11.ffn.fc2.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.11.ada_lin.1.weight: copying a param with shape torch.Size([11520, 1920]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for blocks.11.ada_lin.1.bias: copying a param with shape torch.Size([11520]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for blocks.12.attn.scale_mul_1H11: copying a param with shape torch.Size([1, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 16, 1, 1]).
size mismatch for blocks.12.attn.q_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.12.attn.v_bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for blocks.12.attn.zero_k_bias: copying a para

@pyshen-watson
Copy link

我也遇到了一樣的問題

@melodyincopenhagen
Copy link

请问解决了吗

@pyshen-watson
Copy link

我這裡解決了,似乎是加載權重時shared_aln不匹配的關係

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants