feat(ingest/sigma): Sigma connector integration #10037

shubhamjagtap639 · 2024-03-13T06:28:18Z

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

hsheth2

Functionality:

seems like none of the workspaces have any entities nested within them - this suggests that the container paths are not set up correctly
can we generate more details about sigma datasets? e.g. schema, properties, documentation, ownership?
datasets are part of workspaces/folders right? they should have a container aspect
what happens to folders within a workspace?
can we get metadata from "badges" in sigma - those would probably be pretty valuable
for charts/dashboards, can we generate externalUrl info so we get the "View in Sigma" button in the UI

hsheth2 · 2024-03-18T21:47:22Z

metadata-ingestion/src/datahub/ingestion/source/sigma/sigma_api.py

+            for workspace_dict in response.json():
+                workspaces.append(Workspace.parse_obj(workspace_dict))
+        except Exception as e:
+            self._log_http_error(message=f"Unable to fetch workspaces. Exception: {e}")


if this happens, it should appear as a "failure" in the source report

hsheth2 · 2024-03-18T21:49:04Z

metadata-ingestion/src/datahub/ingestion/source/sigma/sigma.py

+        )
+
+    def _get_sigma_dataset_identifier(self, dataset: SigmaDataset) -> str:
+        return f"{dataset.datasetId}".lower()


why are we making these lowercased?

shubhamjagtap639 · 2024-03-20T09:55:29Z

Functionality:

seems like none of the workspaces have any entities nested within them - this suggests that the container paths are not set up correctly

can we generate more details about sigma datasets? e.g. schema, properties, documentation, ownership?

datasets are part of workspaces/folders right? they should have a container aspect

what happens to folders within a workspace?

can we get metadata from "badges" in sigma - those would probably be pretty valuable

for charts/dashboards, can we generate externalUrl info so we get the "View in Sigma" button in the UI

Get Dataset API don't have schema metadata.
Should I map folder with container entity or use browsepath aspect for this?
Yes, we can get badges metadata but where to map it?

hsheth2 · 2024-03-20T21:50:01Z

Should I map folder with container entity or use browsepath aspect for this?

you should emit a browsePathsV2 aspect, where the first parts are dataPlatformInstance (if set) and workspace container urns, and the remaining parts are strings with the folder names

Yes, we can get badges metadata but where to map it?

They should become tags in datahub, prefixed with "sigma:"

hsheth2 · 2024-04-12T19:34:14Z

metadata-ingestion/src/datahub/ingestion/source/sigma/data_classes.py

    name: str
    description: str
    createdBy: str
    createdAt: datetime
    updatedAt: datetime
    url: str
+    path: str


would be better to keep this as a List[str] - that way if someone has a / in their folder name, we still handle it correctly

We are getting path attribute details from API in string format. If we later convert it to list of string with any one folder containing / in name, still that folder name will get split.

metadata-ingestion/src/datahub/ingestion/source/sigma/sigma.py

hsheth2 · 2024-04-12T19:36:07Z

metadata-ingestion/src/datahub/ingestion/source/sigma/sigma.py

+            and "*" in self.config.chart_sources_platform_mapping
+        ):
+            data_source_platform_details = self.config.chart_sources_platform_mapping[
+                "*"


so the * is basically a "fallback" platform detail if nothing else matches?

If user wants to provide platform details to all sources used by all charts which got ingested, * as key can be used.

hsheth2 · 2024-04-12T19:39:17Z

metadata-ingestion/src/datahub/ingestion/source/sigma/sigma.py

+                            inputs[
+                                self._gen_sigma_dataset_urn(source_id)
+                            ] = in_table_urn
+                            sql_parser_in_tables.remove(in_table_urn)


we'll probably need to come back to this to generate CLL

To generate CLL, we don't have schema metadata for datasets. We do have columns details used by elements/charts, but can't map that columns with there tables.

hsheth2 · 2024-04-12T19:57:18Z

metadata-ingestion/src/datahub/ingestion/source/sigma/sigma_api.py

-        self.users: Dict = {}
+        self.workspaces: Dict[str, Workspace] = {}
+        self.users: Dict[str, str] = {}
+        self.datasets: Dict[str, str] = {}


why are we saving these? I would've expected these to be saved in sigma.py, not in sigma_api.py

Removed self.datasets but others can't be moved or removed. We even can't use functools.lru_cache as it gives problem with function fetching response from GET API.

hsheth2 · 2024-04-16T00:28:55Z

metadata-ingestion/docs/sources/sigma/sigma_pre.md

+```
+
+#### Example - For all ingested charts


follow up - the order of these examples should be reversed, and they should be rephrased as below

Suggested change

#### Example - For all ingested charts

#### Example - All workbooks use the same connection

hsheth2 · 2024-04-16T00:33:38Z

metadata-ingestion/src/datahub/ingestion/source/sigma/sigma_api.py

@@ -22,7 +22,6 @@ def __init__(self, config: SigmaSourceConfig) -> None:
        self.config = config
        self.workspaces: Dict[str, Workspace] = {}
        self.users: Dict[str, str] = {}


my comment was about users and workspaces too - those should probably not be getting saved here?

datahub-web-react/src/app/ingest/source/builder/sources.json

Co-authored-by: Harshal Sheth <[email protected]>

Code for sigma source integration

cfd23a8

github-actions bot added ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment community-contribution PR or Issue raised by member(s) of DataHub Community labels Mar 13, 2024

vercel bot deployed to Preview March 13, 2024 06:47 View deployment

hsheth2 reviewed Mar 18, 2024

View reviewed changes

shubhamjagtap639 and others added 3 commits April 1, 2024 15:51

Address review comments

652132a

Add sigma dataset and workbook badge as tag

1e34864

Merge branch 'master' into Sigma-Connector-Integration

b3cb718

vercel bot had a problem deploying to Preview April 1, 2024 11:28 Failure

Modify sigma workspace_pattern config description

42321a3

vercel bot had a problem deploying to Preview April 3, 2024 07:33 Failure

vercel bot had a problem deploying to Preview April 5, 2024 14:07 Failure

shubhamjagtap639 force-pushed the Sigma-Connector-Integration branch from a93b39d to 42321a3 Compare April 9, 2024 08:56

shubhamjagtap639 and others added 2 commits April 9, 2024 14:31

Add sigma dataset upstream lineage code

60fb768

Merge branch 'master' into Sigma-Connector-Integration

284f33c

vercel bot deployed to Preview April 9, 2024 09:26 View deployment

hsheth2 reviewed Apr 12, 2024

View reviewed changes

Address review comments

174bcc9

shubhamjagtap639 marked this pull request as ready for review April 15, 2024 18:34

vercel bot deployed to Preview April 15, 2024 18:44 View deployment

hsheth2 approved these changes Apr 16, 2024

View reviewed changes

hsheth2 reviewed Apr 16, 2024

View reviewed changes

datahub-web-react/src/app/ingest/source/builder/sources.json Outdated Show resolved Hide resolved

Update metadata-ingestion/docs/sources/sigma/sigma_pre.md

5c0366b

vercel bot deployed to Preview April 16, 2024 00:50 View deployment

hsheth2 added 2 commits April 15, 2024 17:51

Update datahub-web-react/src/app/ingest/source/builder/sources.json

c42df20

Merge branch 'master' into Sigma-Connector-Integration

1f09f7d

hsheth2 added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Apr 16, 2024

vercel bot deployed to Preview April 16, 2024 01:22 View deployment

hsheth2 merged commit 90c1249 into datahub-project:master Apr 16, 2024
60 of 61 checks passed

sleeperdeep pushed a commit to sleeperdeep/datahub that referenced this pull request Jun 25, 2024

feat(ingest/sigma): Sigma connector integration (datahub-project#10037)

d8aaca7

Co-authored-by: Harshal Sheth <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ingest/sigma): Sigma connector integration #10037

feat(ingest/sigma): Sigma connector integration #10037

shubhamjagtap639 commented Mar 13, 2024

hsheth2 left a comment

hsheth2 Mar 18, 2024

hsheth2 Mar 18, 2024

shubhamjagtap639 commented Mar 20, 2024 •

edited

Loading

hsheth2 commented Mar 20, 2024

hsheth2 Apr 12, 2024

shubhamjagtap639 Apr 15, 2024

hsheth2 Apr 12, 2024

shubhamjagtap639 Apr 15, 2024

hsheth2 Apr 12, 2024

shubhamjagtap639 Apr 15, 2024

hsheth2 Apr 12, 2024

shubhamjagtap639 Apr 15, 2024

hsheth2 Apr 16, 2024

hsheth2 Apr 16, 2024

	#### Example - For all ingested charts
	#### Example - All workbooks use the same connection

feat(ingest/sigma): Sigma connector integration #10037

feat(ingest/sigma): Sigma connector integration #10037

Conversation

shubhamjagtap639 commented Mar 13, 2024

Checklist

hsheth2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shubhamjagtap639 commented Mar 20, 2024 • edited Loading

hsheth2 commented Mar 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shubhamjagtap639 commented Mar 20, 2024 •

edited

Loading