-
Notifications
You must be signed in to change notification settings - Fork 11
/
Copy pathChangelog
277 lines (195 loc) · 11.9 KB
/
Changelog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
version 2.3.2.031:
* Functions known to be used in Data Driven Security now have an informational message suggesting using verisr from jayjacobs (as the book suggests) to maintain compatibility.
version 2.3.2.030:
* Added lifecycle warnings to old verisr functions which may no longer work.
* NOTE: several functions in the Data Driven Security book are now marked depreciated.
version 2.3.2.029:
* Handle errors found by @CarrotPercepti1 (https://twitter.com/CarrotPercepti1/status/1586738912403374082)
* If the schema in use doesn't include plus.event_chain, don't include it. (was also causing DT warning)
* bypassed issue in rjson reading schema as a file by just passing it the URL if no schema is passed.
* Made the error for when something is in the incident but not the schema more clear.
* Clarified the error associated with clustering with schemas newer than 1.3.5.
* fixed an error in getenumCI2023() that did not occur on my install but did on a fresh install related to no ci.method argument
* added plyr, dplyr, and skmeans to 'imports' because json2veris and getenumCI are the two main functions and the three are required for json2veris.
* removed RCurl since the rjson fix removed the need for it.
version 2.3.2.028:
* added an error to getenumCI2023 if input data has zero rows. The error that happens otherwise is obscure and hard to interpret.
version 2.3.2.027:
* fix to edge case in getenumCI2023
version 2.3.2.026:
* Updated json2veris to not be case sensitive for file extensions
version 2.3.2.025:
* started getenumCI2023
* Initial edit is to let it run on arrow datasets.
version 2.3.2.024:
* Added test_veris_ratio() fact checking function.
version 2.3.2.022:
* fixed getenumCI2022 to avoid a bug in infer
version 2.3.2.022:
* Updated json2veris to produce a consistent dataframe in the plus.event_chain column. This allow veris to be easily exported as parquet.
version 2.3.2.021:
* fixed bug in getenumCI2022
version 2.3.2.020:
* fixed bug in add_patterns()
* fixed bug in nameveris()
version 2.3.2.019:
* export pattern_1.3.4_to_1.3.5
version 2.3.2.018:
* Added models for classifying incidents into the 2021 dbir patterns.
version 2.3.2.017:
* fixed missing 'method' column in getenumCI2021() when na.rm=FALSE and method = "bootstrap"
* fixed issue with getenumCI2021() counting 'NA' in x but not in 'n'
version 2.3.2.016:
* Performance improvements in json2veris()
* NOTE: as the veris data gets longer with many columns, parquet files may become the prefered on-disk format.
version 2.3.2.015:
* performance improvements to getenumCI2021()
version 2.3.2.014:
* Removed 'patterns' column
* Added value_chain.Intermediate columns. in json2veris. Verisr issue #20
* Added orgsize.Unknown. verisr issue #16
* Improved error associated with no inputted files per verisr issue #12
version 2.3.2.013:
* Updated the MCMC model in getenumCI2021 (untested). This model removes the intercept + group effects in favor of just the groups.
This basically makes all enumeratios independent. To make them dependent, each enum could be treated as an input to the model w/ an associated distribution outside of the plate. However this would require feeding in all rows of data as measurements. The benefit is it could potentially capture inter-enumeration interactions better.
Also, priors are now specified explicitly as a student_t w/ 3 DoF, mean of 0, and 1.6 scale. (based on brms defaults. Of note, this is a logit prior and so scaled -inf to +inf centered on 0. I could double check this against the book.)
version 2.3.2.012:
* Added sanity check to ensure x <= n and fail otherwise.
version 2.3.2.011:
* Changing several lines to count master_id's instead of summing the column. This _will_ change counts.
version 2.3.2.010:
* Fixed a potential crashing bug in calculating n (would show up 1 instead of actual value) for multinomial enums
* Improved calculation of n for single-column enums
* Noticed a bug however where we count column sums rather than unique plus.master_ids. Since in is master_id based this will cause issues. marking areas to fix tomorrow.
version 2.3.2.009:
* moved jqr to a suggests to work around a temporary issue
version 2.3.2.008:
* fixed bug in flatten_verisr()
version 2.3.2.006:
* merged in json2veris() changes
version 2.3.2.9002:
* json2veris() now accepts joined, zipped, generated with vcdb_to_joined.py from the vcdb package. It has been tested with both non-zipped and zipped data collections and appears to work consistently.
version 2.3.2.9001:
* adding support for zipped JSON files
version 2.3.2.006:
* Fixed bug in getenumCI2021 associated with duplicate rows.
version 2.3.2.005:
* Updated getenumCI2021 to calculate 'n' based on unique plus.master_id rather than strict rows. This also removes the assumption that each row is a unique breach in favor of the more explicit each master_id is a unique breach.
version 2.3.2.004:
* Fixed bug in getenumCI2021 with ci.method="bootstrap"
version 2.3.2.003:
* Corrected an edge case in getenumCI2020 and getenumCI2021. (catch error when "NA" in single column enum w/ ci.method)
version 2.3.2.002:
* updated test_veris_time_stability() to accept greater/less (increasing/decreasing) directions as well
version 2.3.2.001:
* added draft getenumCI2021
* getenumCI2021 will change ci.method="bootstrap" to use the "infer" package. binom.bayes will be ci.method="bayes"
* getenumCI2021 will add, ci.params as an argument to get the model or model params for the CI method (not sure exactly how this'll work)
* minor change to json2veris to populate the 'attribute.confidentiality.data_disclosure.No as 'TRUE' if no other confidentiality columns are true or filled in.
* minor fix to getenumCI2020. na.rm and unk had the opposite effect of the intended on producing the CIs if using mcmc.
* swapped out reshape2 reference for tidyr::pivot_wider reference.
version 2.3.1.0026:
* corrected bug introduced in previous change
version 2.3.1.0025:
* fixed an error handling issue in 'top' for getenumCI2020
version 2.3.1.0024:
* Added hypothesis testing functions:
* test_veris_hypothsis()
* test_veris_proportion()
* test_veris_consistency()
* test_veris_time_stability()
version 2.3.1.0023:
* Added more descriptive error for an edge case in getenumCI2020().
version 2.3.1.0022:
* Added a 3 lines to catch an edge case in getenumCI2020().
version 2.3.1.0021:
* Fixed a bug in the fix.
version 2.3.1.0020:
* Fixed a bug in the fix.
version 2.3.1.0019:
* Fixed a bug in getenumCI2020() when using 'top=TRUE', 'short.names=TRUE' and a wild card enum (.e.g "action.*.variety"). This would effectively move counts from a top enum to 'Other'.
version 2.3.1.0018:
* Set getenumCI2019 up to accept ci.method = "bootstrap" to help with getenumCI2020 compatibility.
version 2.3.1.0017:
* added 'quietly' argument to getenumCI2020() for scripting.
* changed behavior when top < 1. Previously, would error. Now top < 1 displays warning message and sets top to 'NULL' effectively removing 'top'.
version 2.3.1.0016:
* Fixed bug related to some 'by' levels not meeting minimum sample level and some meeting it. Those that meet it will have method set to 'none' and upper/lower set to NA. (ci.method can always be set manually to ensure all rows have a confidence interval.)
version 2.3.1.0015:
* Fixed a bug in getenumCI2020() that addresses the hack to avoid the binom.bayes error from 0014. (All X being either 0 or n caused error.)
version 2.3.1.0014:
* mitigated an error in binom.bayes that manifests itself in getenumCI* when an end (0 or n) exists in x and there are more than 3 rows in the subchunk passed to binom.bayes.
* found an issue with regex control characters in the end of the enum chain. Mitigated it in getenumCI2019 and getenumCI2020, but it may exist elsewhere. It has recently become an issue due to the veris 1.3.4 enumerations "asset.cloud.External Cloud Asset(s)" and "asset.cloud.On-Premise Asset(s)".
version 2.3.1.0013:
* more explicit mitigation of enum names issue based on feedback from brms author
version 2.3.1.0012:
* Attempted to mitigate a bug where brms rewrites enum names, making it hard to rejoin the mcmc analysis to the original data.
version 2.3.1.0011:
* Fixed bug where no not_top_enums causes a failure.
* Also filtered '0's out of top enums so that if the top enums goes town to zero-count enums, they are not shown (except as a '0' for 'Other')
version 2.3.1.0010:
* fixed bug where cred.mass not respected for ci.method="mcmc"
version 2.3.1.0009:
* bug fixes
version 2.3.1.0008:
* some testing of CI changes.
* added MCMC option to CI method - UNTESTED
version 2.3.1.0007:
* initial CI rules changes to enforce sane choices in low sample size situations. UNTESTED
version 2.3.1.0006:
* Abstracted getenumCI() by year.
* Started on feature updates to getenumCI2020
version 2.3.1.0005:
* Updated getenumCI()
version 2.3.1.0004:
* Updated documentation.
version 2.3.1.0003:
* Fixed historic data.table settings that caused warnings.
version 2.3.1.0002:
* Fixed how victim.industry2.XX is created to ensure only legitimate NAICS codes are created and that all of them are created.
version 2.3.1:
* added ability to specify files directly to json2veris to alleviate performance problems.
version 2.3:
* updated for veris 1.3.3
version 2.2.0.90001.1:
* added 'logical' to the list of column types not reported.
version 2.2.0.90001:
* update to geta4names() to add discovery_method convenience enumerations
version 2.2:
* updated json2veris() to create a plus.event_chain column as a list of dataframes. each row is the step, and the column is the 4A.
version 2.1:
* updated verisr to work with veris 1.3.1. Most changes are to json2veris for importing.
* added getenumCI() as a replacement for getenum
* added getenumSRC() as an experimental statistical enumeration tool
verision 2.0:
* added sample data "veris.sample", this will allow quick exploration and samples to function
* foldmatrix: several arguments were changed and removed, will require rework in existing code.
* foldmatrix: enabled seperate variables of "rows" and "cols" to specify the rows and columns of output matrix
* foldmatrix: rows can be a named list of logical to be used as a filter, or a list of column names of logicals
* foldmatrix: the "clean" variable is now defaulting to TRUE
* getenumlist is removed, the new data.table format allows direct query of columns
Version 2.0.1:
* a few bug fixes, problems with json loading in json2veris() is fixed
* getenum and getenumby() are now one function that are interchangable
* internal `getpattern` function now returns a data table to cbind with core data set, it enables querying for "pattern" in getenum in an easier way.
* getenum with more than one enum (what getenumby() did), now returns a different column name, since the columns are now bound by a "primary" qnd "secondary" limitation the columns are named "enum", "enum1", "enum2" and so on for each enum passed in.
Version 2.0.2:
* minor bug fixes
* Added in 'plota4' function in graphics.R to return ggplot2 object with a4 grid
Version 2.0.3:
* added in 'exclusive' parameter to getenum function. This will enable a query where an identified Unknown column will return only if exclusive. As an example, if action.hacking.variety is queried, one record has "SQLi" and "Unknown", and another record has just "Unknown", it will only count the latter (since "Unknown" in the former is not the only exclusive value set in the record).
Version 2.0.4:
* fixed bug in getenum that would error would non-factor column was empty
* minor update to matrix.R and the pattern assignment
* added proc.time() to json2veris (if progressbar==T)
Verison 2.0.5:
* Quick bug fix in getenum()
Version 2.0.6:
* json2veris would silently add columns not in the schema, it now produces a warning
* postproc now replaces NA's in all logical columns with FALSE (related to above)
Version 3.0.0.9003:
* Added 'Embedded' to 'asset.variety' enumeration
* updated getenumCI
Version 3.0.0.9004:
* Added getenumSRC
* updated getenumCI