-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++: Improve and promote cpp/overflow-buffer #18837
base: main
Are you sure you want to change the base?
Conversation
…so it will appear in security-extended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This PR improves and promotes the "Call to memory access function may overflow buffer" query by fixing its handling of array expressions within offsetof and adjusting its severity and precision for the security-extended suite.
- Updated change note in cpp/ql/src documenting the security-extended promotion of cpp/overflow-buffer.
- Added a change note in cpp/ql/lib for fixing the getBufferSize predicate issue.
Reviewed Changes
File | Description |
---|---|
cpp/ql/src/change-notes/2025-02-20-overflow-buffer.md | Added change note to promote cpp/overflow-buffer to security-extended. |
cpp/ql/lib/change-notes/2025-02-20-getbuffersize.md | Documented the fix for getBufferSize misinterpreting array expressions. |
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Tip: Copilot only keeps its highest confidence comments to reduce noise and keep you focused. Learn more
Did you look at the results flagged up by DCA and their quality? |
DCA
|
I think this is ready to merge, but I'd really appreciate a second opinion on at least some of the real world results. |
This is what I'm seeing:
|
…for arrays inside classes (though it sometimes fails, costing us TPs).
Thanks for taking a look.
I've just pushed a change that fixes this FP result in PHP (amongst others), and the cost of losing some TPs as well. I think this is a good tradeoff given the query is a bit noisy, and one we could improve in future by refining the existing
The hashing ones are also eliminated by the commit I just pushed (the clear TP result remains). |
This was caused by a variable with multiple types in the database. Probably shouldn't happen, but I've hardened the query to this. I should do another DCA run now with all the changes... [started] |
exists(bufferVar.getUnspecifiedType().(ArrayType).getSize()) and | ||
result = | ||
unique(int size | // more generous than .getSize() itself, when the array is a class field or similar. | ||
size = getSize(bufferExpr) | ||
| | ||
size | ||
) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this we seem to be losing good results for cpp/static-buffer-overflow
. This change is also somewhat unexpected to me, but maybe I do not completely understand what is going on here. What I thought I was seeing in the DCA data was cases where the value of sizeof(...)
could be statically determined and, hence, would be in the AST, and we should be using that value but somehow weren't. But your comment on TinyUSB "This was caused by a variable with multiple types in the database." suggests that I might be misunderstanding what was going on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed two issues since the first DCA run. One was that occasionally a variable has multiple types associated with it in the database, and those types have different sizes (and presumably there are different results on the sizeof
side of the equation as well but I didn't check). This led to a multitude of false positive results on the same lines (e.g. if the sizes and sizeofs in the database for a single variable were 4, 8 and 12 you'd see it comparing every combination and getting query results where it compares 8 > 4, 12 > 4 and 12 > 8). I fixed this with the unique
you see here - though in retrospect putting it on the sizeof
side of the equation would probably have worked equally well. I don't think it matters and I can do both if you'd prefer.
The other issue I fixed was to do with writes that are intended overwrite (typically zero) multiple fields of a struct. For example the struct might be
struct foo {
int a;
int b;
int c;
}
and we might write memset(&(foo.b), 0, sizeof(b) + sizeof(c))
. There's some existing special logic for this in Buffer.qll
for one of the cases, which looks for the outermost struct the member variable is in and calculates how much room there is in following fields. The change I've made is using the same logic for the size of array members, for example in:
struct foo {
int a;
int b[10];
int c;
}
we would now allow memset(&(foo.b), 0, sizeof(int) * 11)
. This gets rid of a lot of very dubious query results, but also a few good ones because the "special logic" I referred to above isn't particularly robust and sometimes doesn't produce an answer, so we don't produce a query result.
we seem to be losing good results
That's an expected price to making a query less noisy (and thus promotable) while only investing a few days development time - better to have the query enabled with less results than having more results that nobody sees. But if you can see a quick fix that is more precise I'll be happy to try it.
Are there any specific good results you're uncomfortable with losing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the explanation!
Are there any specific good results you're uncomfortable with losing?
I thought so, but looking at the 2nd DCA run again and diving into the lost SAMATE results, I don't think that anymore. It seems that all the results we lose are ones where we overwrite more than one member of a struct, which SAMATE apparently doesn't like, but for which there are valid use-cases as we observed.
So, I'm happy with this. It does seem we have some more test results that need to be updated (CPP language tests are currently failing in CI).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a change note for the lost cpp/static-buffer-overflow
results by the way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added [another] change note.
One of the tests is from the internal repo so I'm moving it to here before accepting the differences. I think I'll also need to merge main (after the old copy of the test is deleted) before all tests actually pass here... |
// we calculate the size based on the last field, to avoid including any padding after it | ||
trueSize = max(Field f | f = c.getAField() | f.getOffsetInClass(c) + f.getUnspecifiedType().getSize()) and | ||
result = trueSize - v.(Field).getOffsetInClass(c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems new. Do we need another DCA experiment for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be best. Some of the results we lost were due to the formula including padding at the end of a struct / class, which it now doesn't, so we may get a few results back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we now get some incorrect new results when unions are involved. For example, for cpp/static-buffer-overflow
in systemd:
#define BTRFS_SUBVOL_NAME_MAX 4039
struct btrfs_ioctl_vol_args_v2 {
__s64 fd;
__u64 transid;
__u64 flags;
union {
struct {
__u64 size;
struct btrfs_qgroup_inherit *qgroup_inherit;
};
__u64 unused[4];
};
union {
char name[BTRFS_SUBVOL_NAME_MAX + 1];
__u64 devid;
__u64 subvolid;
};
};
...
struct btrfs_ioctl_vol_args_v2 vol_args = {
.flags = flags & BTRFS_SNAPSHOT_READ_ONLY ? BTRFS_SUBVOL_RDONLY : 0,
.fd = old_fd,
};
...
strncpy(vol_args.name, subvolume, sizeof(vol_args.name)-1);
we get "Potential buffer-overflow: 'name' has size -32 not 4039.".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new results on Kamalio similarly involve unions embedded in structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look. If this goes any deeper I may need to cut back on the ambition of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this goes any deeper I may need to cut back on the ambition of this PR.
Makes sense. It's fine with me to drop this specific change if that's the fastest way to get this merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by replacing getAField()
with getDeclaringType*()
.
Thanks for the fixes. I'm happy with this once DCA comes showing that we no longer unexpectedly lose or gain results. |
Hmm, I'm still seeing some cases with negative sizes in the latest DCA run. :( |
Shall we just back out the changes from 1354beb (and the fix you did on top of that)? |
…roduce negative numbers now.
My Friday brain appears to have come up with an "obvious" better solution, so we'll see how we do with that. The problem with backing out the changes is having spend time looking into the results I'm not really sure any more that the query should be promoted without both improvements. So that leaves us with the |
Improve and promote "Call to memory access function may overflow buffer" (
cpp/overflow-buffer
).offsetof
expressions were being misinterpreted as accesses.security-extended
(by increasing the severity towarning
and setting the precision tomedium
).high
precision (code scanning suite) at some point, we will probably need to do something with these.