Opinionated data processing framework.
Introduce restrictions and code organizations for data pipelines along with common functionality.
Different iterations of this approach were used in production to process petabytes of data. The framework is designed to be focused on readability and at the same time to be flexible and maintainable.
- Base pipe class
- Lazy evaluation
- Foreach and Flatmap support (different for comprehensions use cases)
- Base tests
- Apache Spark support for Java 13+
- DateStamp and common date related functions
- Integration with scheduler (Airflow?)