-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Import Document AI] Implement Import Document AI connector #3466
Open
Cnstant
wants to merge
4
commits into
master
Choose a base branch
from
issue/3457
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
config.yml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
FROM python:3.12-alpine | ||
|
||
ENV CONNECTOR_TYPE=INTERNAL_IMPORT_FILE | ||
|
||
# Copy the connector | ||
COPY src /opt/opencti-connector-import-document-ai | ||
|
||
RUN apk --no-cache add libmagic && \ | ||
cd /opt/opencti-connector-import-document-ai && \ | ||
pip3 install --no-cache-dir -r requirements.txt | ||
|
||
# Expose and entrypoint | ||
COPY entrypoint.sh / | ||
RUN chmod +x /entrypoint.sh && chmod -R 0777 /opt/opencti-connector-import-document-ai | ||
WORKDIR /opt/opencti-connector-import-document-ai | ||
ENV HOME=/opt/opencti-connector-import-document-ai | ||
ENTRYPOINT ["/entrypoint.sh"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
# AI based OpenCTI Document Import Connector (Powered by Ariane) | ||
|
||
This connector allows Enterprise Edition Organizations to feed information from document to OpenCTI, with more capabilities than regular Import Document connector. | ||
|
||
This connector add more extraction capabilities : it is possible to extract `Malware`, `Country` and `Intrusion-Set` entities. | ||
|
||
## General overview | ||
|
||
OpenCTI data is coming from *import* connectors. | ||
|
||
## Installation | ||
|
||
### Requirements | ||
|
||
- OpenCTI Platform >= 6.5.0 | ||
|
||
### Configuration | ||
|
||
| Parameter | Docker envvar | Mandatory | Description | | ||
| ------------------------------------ | ----------------------------------- | ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| `opencti_url` | `OPENCTI_URL` | Yes | The URL of the OpenCTI platform. | | ||
| `opencti_token` | `OPENCTI_TOKEN` | Yes | The default admin token configured in the OpenCTI platform parameters file. | | ||
| `connector_id` | `CONNECTOR_ID` | Yes | A valid arbitrary `UUIDv4` that must be unique for this connector. | | ||
| `connector_name` | `CONNECTOR_NAME` | Yes | Option `ImportDocumentAI` | | ||
| `connector_only_contextual` | `CONNCETOR_ONLY_CONTEXTUAL` | Yes | `true` Only extract data related to an entity (a report, a threat actor, etc.) | | ||
| `connector_auto` | `CONNCETOR_AUTO` | Yes | `false` Enable/disable auto import of report file | | ||
| `connector_scope` | `CONNECTOR_SCOPE` | Yes | Supported file types: `'application/pdf','text/plain','text/html','text/markdown'` | | ||
| `connector_confidence_level` | `CONNECTOR_CONFIDENCE_LEVEL` | Yes | The default confidence level for created sightings (a number between 1 and 100). | | ||
| `connector_log_level` | `CONNECTOR_LOG_LEVEL` | Yes | The log level for this connector, could be `debug`, `info`, `warn` or `error` (less verbose). | | ||
| `connector_create_indicator` | `CONNECTOR_CREATE_INDICATOR` | Yes | Create an indicator for each extracted observable | | ||
| `connector_web_service_url` | `CONNECTOR_WEB_SERVICE_URL` | Yes | The url to access to the extraction service (provided by Filigran to Enterprise Edition users) | | ||
| `connector_licence_key_pem` | `CONNECTOR_LICENCE_KEY_PEM` | Yes | The url certificacte authenticating an allowed user, in a PEM format (provided by Filigran to Enterprise Edition users) | | ||
After adding the connector, you should be able to extract information from a report. | ||
|
||
### Debugging ### | ||
|
||
In case the connector doesn't behave like it should, please change the `CONNECTOR_LOG_LEVEL` to `debug`. | ||
This way you will get a log entry for every parsing step to verify each step. | ||
Example | ||
|
||
``` | ||
"timestamp": "2025-02-21T15:36:43.448532Z", "level": "INFO", "name": "api", "message": "Health check (platform version)..."} | ||
{"timestamp": "2025-02-21T15:36:43.509792Z", "level": "INFO", "name": "api", "message": "Health check (platform version)..."} | ||
{"timestamp": "2025-02-21T15:36:43.698952Z", "level": "INFO", "name": "ImportDocumentAI", "message": "Connector registered with ID", "attributes": {"id": "ChangeMe"}} | ||
{"timestamp": "2025-02-21T15:36:43.699773Z", "level": "INFO", "name": "ImportDocumentAI", "message": "Starting PingAlive thread"} | ||
{"timestamp": "2025-02-21T15:36:43.700252Z", "level": "DEBUG", "name": "ImportDocumentAI", "message": "PingAlive running."} | ||
{"timestamp": "2025-02-21T15:36:43.700442Z", "level": "DEBUG", "name": "ImportDocumentAI", "message": "PingAlive ConnectorInfo", "attributes": {"connector_info": {"run_and_terminate": false, "buffering": false, "queue_threshold": 500.0, "queue_messages_size": 0.0, "next_run_datetime": null, "last_run_datetime": null}}} | ||
{"timestamp": "2025-02-21T15:36:43.701104Z", "level": "INFO", "name": "ImportDocumentAI", "message": "Starting ListenQueue thread"} | ||
{"timestamp": "2025-02-21T15:36:43.702909Z", "level": "INFO", "name": "ImportDocumentAI", "message": "ListenQueue connecting to rabbitMq."} | ||
{"timestamp": "2025-02-21T15:37:23.808816Z", "level": "DEBUG", "name": "ImportDocumentAI", "message": "PingAlive running."} | ||
{"timestamp": "2025-02-21T15:37:23.809170Z", "level": "DEBUG", "name": "ImportDocumentAI", "message": "PingAlive ConnectorInfo", "attributes": {"connector_info": {"run_and_terminate": false, "buffering": false, "queue_threshold": 500.0, "queue_messages_size": 0.0, "next_run_datetime": null, "last_run_datetime": null}}} | ||
{"timestamp": "2025-02-21T15:37:26.935568Z", "level": "INFO", "name": "ImportDocumentAI", "message": "Message ack", "attributes": {"tag": 1}} | ||
{"timestamp": "2025-02-21T15:37:26.935903Z", "level": "INFO", "name": "api", "message": "Reporting work update_received", "attributes": {"work_id": "work_ChangeMe_2025-02-21T15:37:26.830Z"}} | ||
{"timestamp": "2025-02-21T15:37:26.999378Z", "level": "INFO", "name": "ImportDocumentAI", "message": "Processing new message"} | ||
[...] | ||
{"timestamp": "2025-02-21T15:37:32.028339Z", "level": "DEBUG", "name": "ImportDocumentAI", "message": "Results: [{'type': 'entity', 'category': 'Intrusion-Set', 'original_start': 4405, 'original_end': 4413, 'range': [4405, 4413], 'match': 'Andariel'}, {'type': 'entity', 'category': 'Malware', 'original_start': 4421, 'original_end': 4431, 'range': [4421, 4431], 'match': 'SmallTiger'}, {'type': 'entity', 'category': 'Malware', 'original_start': 1111, 'original_end': 1121, 'range': [1111, 1121], 'match': 'ModeLoader'}, {'type': 'observable', 'category': 'IPv4-Addr.value', 'original_start': 3044, 'original_end': 3056, 'range': [3044, 3056], 'match': '20.20.100.32'}, {'type': 'observable', 'category': 'IPv4-Addr.value', 'original_start': 3271, 'original_end': 3286, 'range': [3271, 3286], 'match': '45.61.148.153'}, {'type': 'observable', 'category': 'File.name', 'original_start': 3383, 'original_end': 3397, 'range': [3383, 3397], 'match': 'powershell.exe'}, {'type': 'observable', 'category': 'Url.value', 'original_start': 3446, 'original_end': 3478, 'range': [3446, 3478], 'match': 'http://45.61.148.153/pizza.jsp'}, {'type': 'observable', 'category': 'Url.value', 'original_start': 3453, 'original_end': 3478, 'range': [3453, 3478], 'match': '45.61.148.153/pizza.jsp'}, {'type': 'observable', 'category': 'File.hashes.MD5', 'original_start': 4443, 'original_end': 4475, 'range': [4443, 4475], 'match': '3525a8a16ce8988885d435133b3e85d8'}, {'type': 'observable', 'category': 'File.hashes.MD5', 'original_start': 4476, 'original_end': 4508, 'range': [4476, 4508], 'match': '45ef2e621f4c530437e186914c7a9c62'}, {'type': 'observable', 'category': 'File.hashes.MD5', 'original_start': 4509, 'original_end': 4541, 'range': [4509, 4541], 'match': '6a58b52b184715583cda792b56a0a1ed'}, {'type': 'observable', 'category': 'File.hashes.MD5', 'original_start': 4542, 'original_end': 4574, 'range': [4542, 4574], 'match': 'b500a8ffd4907a1dfda985683f1de1df'}]"} | ||
{"timestamp": "2025-02-21T15:37:32.192447Z", "level": "INFO", "name": "ImportDocumentAI", "message": "Message processed, thread terminated", "attributes": {"tag": 1}} | ||
[...] | ||
``` | ||
|
||
### Supported formats | ||
|
||
*Please open a feature requests in case the current implemention doesn't fit your needs* | ||
|
||
**File input format** | ||
- PDF file | ||
- Text file | ||
- HTML file | ||
- MD file | ||
|
||
**Extractable Entities/Stix Domain Objects** | ||
|
||
| Extractable Entity | Based on | Example | Stix entity type and field | Note | | ||
|-------------|-------------------------|------------------|------|----| | ||
| Attack Pattern | MITRE ATT&CK Technique | T1234.001| AttackPattern.x_mitre_id | | | ||
| Country | Occurrence in the original text |France |Location.name, Location.aliases| | | ||
| Intrusion Set | Occurrence in the original text | APT29| IntrusionSet.name, IntrusionSet.aliases| | | ||
| Malware | Occurrence in the original text |BadPatch| Malware.name, Malware.aliases| | | ||
| Vulnerability | CVE Numbers | CVE-2020-0688 | Vulnerability.name | | | ||
|
||
**Extractable Observables/Stix Cyber Observables** | ||
|
||
| Extractable Observable/SCO | Stix Reference fields | Supported | Note | | ||
|-----------------------------|------------------|------|---| | ||
| Artifact | - | :x: | | | ||
| AutonomousSystem | AutonomousSystem.number| :heavy_check_mark: | | | ||
| Directory | - | :x: | | | ||
| Domain Name | DomainName.value| :heavy_check_mark: | | | ||
| EMail Address | EMail-Addr.value | :heavy_check_mark: || | ||
| EMail Message | - | :x: | | | ||
| File | File.name, File.hashes (MD5, SHA-1, SHA-256) | :heavy_plus_sign: | | | ||
| IPv4 Address | IPv4-Addr.value| :heavy_check_mark: || | ||
| IPv6 Address | IPv6-Addr.value| :heavy_check_mark: || | ||
| MAC Address | Mac-Addr.value| :heavy_check_mark: | | | ||
| Mutex | - |:x: | | | ||
| Network Traffic | - | :x: | | | ||
| Process | - | :x: | | | ||
| Software | - | :x: | | | ||
| URL | Url.value | :heavy_check_mark: | | | ||
| User Account | - | :x: | | | ||
| Windows Registry Key | WindowsRegistryKey.key | :heavy_plus_sign: | | | ||
| X.509 Certificate | - | :x: | | | ||
|
||
:heavy_check_mark: = Fully implemented | ||
|
||
:heavy_plus_sign: = Not entirely implemented | ||
|
||
:x: = Not implemented | ||
|
||
*Reference: https://docs.oasis-open.org/cti/stix/v2.1/cs01/stix-v2.1-cs01.html* |
25 changes: 25 additions & 0 deletions
25
internal-import-file/import-document-ai/docker-compose.yml.sample
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
version: '3' | ||
services: | ||
connector-import-document-ai: | ||
image: opencti/connector-import-document-ai:6.5.2 | ||
environment: | ||
# Connector's generic execution parameters | ||
- OPENCTI_URL=http://opencti:8080 # if platform is deployed with "docker" repo | ||
- OPENCTI_TOKEN=CHANGEME | ||
# Connector's definition parameters REQUIRED | ||
- CONNECTOR_ID=CHANGEME | ||
- CONNECTOR_NAME=ImportDocumentAI | ||
- CONNECTOR_SCOPE=application/pdf,text/plain,text/html,text/markdown | ||
- CONNECTOR_LOG_LEVEL=error | ||
- CONNECTOR_WEB_SERVICE_URL=http://0.0.0.0:8000 | ||
- CONNECTOR_LICENCE_KEY_PEM = | | ||
-----BEGIN CERTIFICATE----- | ||
... | ||
-----END CERTIFICATE----- | ||
restart: always | ||
|
||
|
||
networks: | ||
default: | ||
external: true | ||
name: docker_default |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
#!/bin/sh | ||
|
||
# Go to the right directory | ||
cd /opt/opencti-connector-import-document-ai | ||
|
||
# Launch the worker | ||
python3 main.py |
16 changes: 16 additions & 0 deletions
16
internal-import-file/import-document-ai/src/config.yml.sample
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
opencti: | ||
url: 'http://localhost:PORT' | ||
token: 'ChangeMe' | ||
|
||
connector: | ||
id: 'changeMeInUUID' | ||
type: 'INTERNAL_IMPORT_FILE' | ||
name: 'ImportDocumentAI' | ||
scope: 'application/pdf,text/plain,text/html,text/markdown' | ||
validate_before_import: true | ||
log_level: 'info' | ||
web_service_url: 'http://localhost:PORT' | ||
licence_key_pem:| | ||
-----BEGIN CERTIFICATE----- | ||
... | ||
-----END CERTIFICATE----- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# -*- coding: utf-8 -*- | ||
"""OpenCTI ReportImporter connector main module.""" | ||
|
||
from reportimporter import ReportImporter | ||
|
||
if __name__ == "__main__": | ||
connector = ReportImporter() | ||
connector.start() |
6 changes: 6 additions & 0 deletions
6
internal-import-file/import-document-ai/src/reportimporter/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# -*- coding: utf-8 -*- | ||
"""OpenCTI ReportImporter connector module.""" | ||
|
||
from reportimporter.core import ReportImporter | ||
|
||
__all__ = ["ReportImporter"] |
3 changes: 3 additions & 0 deletions
3
internal-import-file/import-document-ai/src/reportimporter/constants.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
RESULT_FORMAT_TYPE = "type" | ||
RESULT_FORMAT_CATEGORY = "category" | ||
RESULT_FORMAT_MATCH = "match" |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.