Skip to content

kuzemchik/pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PipeFlow

Opinionated data processing framework.

Core idea

Introduce restrictions and code organizations for data pipelines along with common functionality.

Maturity

Different iterations of this approach were used in production to process petabytes of data. The framework is designed to be focused on readability and at the same time to be flexible and maintainable.

TODO:

  • Base pipe class
    • Lazy evaluation
    • Foreach and Flatmap support (different for comprehensions use cases)
    • Base tests
  • Apache Spark support for Java 13+
  • DateStamp and common date related functions
  • Integration with scheduler (Airflow?)

About

Opinionated data processing framework

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages