|
We want to spread the best practices of teamwork and programming in teams developing ML/AI projects.
We think that existing products do not offer the proper tradeoff between speed and reliability of the resulting ML/AI solution.
Tools for team workflow
- Team Leaderboard
- Task Loader & Configs Provider
- Data Loader & Data Store
- Feature Loader & Feature Store
- Submitter, Submits Provider & Metrics Service
- Target loader
Target Loader is a python tool to load target for AI/ML task.
Target Loader doesn’t store target as tabular data nor any close to features. Target Loader load train target values and doesn’t show any test target values to the user. Althrough it does load empty data structure to be filled for task solution. All indexes in train/test target values are synced with indexes of features from Feature Loader.
Check out, compare and share scores of your team test via GUI.
Leaderboard is one of the main parts of the team workflow. It unites your team around the task and provides you with instrument to checkout and explore ML/AI task solutions and metrics, make fast text search for submits, switch between tasks, compare results in task and more.
Use Task Loader as entry point for AI/ML task in Jupyter Notebook, Jupyter Lab or simple Python.
Task Loader automatically loads configured Python tools for AI/ML task inside project, and also could be used to staff up all additional Python tools for team inside tasks configurations. Configure tools you wish to be used in a project by default and let all your team know about it.
Use client-server datastore as central storage of raw tabular data.
Data Store stores tabular data from different sources in one place available through REST API.
Data Loader is a Python tool to play along with Data Store. Tool enforces configured data types, converts data into Pandas dataframes and controls data scope for AI/ML task. Load any configured data into your Python with simple client interface.
Use client-server Feature Store to share features for AI/ML task between team members.
Feature Store stores tabular features for task available trough REST API.
Feature Loader is a Python tool to play along with Feature Store. It’s functions are similar to Data Loader but with flat model to access any configured and available features.
Submitter gathers results of AI/ML tasks in Submits Provider. Metrics Service scores it.
Submitter is a Python tool which allows user to submit notebooks, Python code, test predicts, images and more into Submits Provider. Submitter is integrated with Jupyter Notebook and Jupyter Lab to automatically save all artefacts on submit. Metrics Service checks out stored submits and counts metric values.
Target Loader is a python tool to load target for AI/ML task.
Target Loader doesn’t store target as tabular data nor any close to features. Target Loader load train target values and doesn’t show any test target values to the user. Althrough it does load empty data structure to be filled for task solution. All indexes in train/test target values are synced with indexes of features from Feature Loader.
Check out, compare and share scores of your team test via GUI.
Leaderboard is one of the main parts of the team workflow. It unites your team around the task and provides you with instrument to checkout and explore ML/AI task solutions and metrics, make fast text search for submits, switch between tasks, compare results in task and more.
Use Task Loader as entry point for AI/ML task in Jupyter Notebook, Jupyter Lab or simple Python.
Task Loader automatically loads configured Python tools for AI/ML task inside project, and also could be used to staff up all additional Python tools for team inside tasks configurations. Configure tools you wish to be used in a project by default and let all your team know about it.
Use client-server datastore as central storage of raw tabular data.
Data Store stores tabular data from different sources in one place available through REST API.
Data Loader is a Python tool to play along with Data Store. Tool enforces configured data types, converts data into Pandas dataframes and controls data scope for AI/ML task. Load any configured data into your Python with simple client interface.
Use client-server Feature Store to share features for AI/ML task between team members.
Feature Store stores tabular features for task available trough REST API.
Feature Loader is a Python tool to play along with Feature Store. It’s functions are similar to Data Loader but with flat model to access any configured and available features.
Submitter gathers results of AI/ML tasks in Submits Provider. Metrics Service scores it.
Submitter is a Python tool which allows user to submit notebooks, Python code, test predicts, images and more into Submits Provider. Submitter is integrated with Jupyter Notebook and Jupyter Lab to automatically save all artefacts on submit. Metrics Service checks out stored submits and counts metric values.
Target Loader is a python tool to load target for AI/ML task.
Target Loader doesn’t store target as tabular data nor any close to features. Target Loader load train target values and doesn’t show any test target values to the user. Althrough it does load empty data structure to be filled for task solution. All indexes in train/test target values are synced with indexes of features from Feature Loader.
Keep your project organized
- Wizard
- Project structure
- Data & features naming conventions
- Default AI/ML solution architecture
- Domain models in AI/ML projects
- Logging and monitoring
Data-sky has built-in support for graylog, netdata and redash.
All deployed services are connected to graylog by default, so all logs are collected in one logging and monitoring system. Netdata is used to check network load. Redash is used to visualize data, features, predictions and more.
Create new project instances and configure existing with CLI.
Wizard is a CLI tool for installation, creating project instances, configuring services, starting/stopping services, managing tasks, domain models, managing client docker containers and more.
All instances of Data-sky projects has templated structure.
This behaviour is an answer to AI/ML exploration challenge. It contains all files related to project and intended to be in sync with all clients. To fulfill this requirement project files are stored in git.
Default project structure help your team to easily switch between different projects, and navigate inside the project.
Data and features naming conventions solves a lot of issues inside your team and in communication with experts.
It’s easy to understand origin and all the transformations applied to data. You can enforce naming conventions in feature extractors and solve miscommunication at the root of a problem. You can use simple name resolving to give a suggestion and explanation to user or expert when you communicate.
We use straighforward and simplest possible solution. There is a lot of situations where simple is enough.
Our solution include few steps: data gathering -> feature extraction -> model inference. All this stages run asyncronously. Don’t use pipelines if you don’t need it. Use data storage to reuse data for other ML tasks and feature store to share features between colleagues and ML models.
Divide code into explicit parts - responsible for AI/ML, and responsible for domain.
AI/ML projects struggles from misunderstanding. It’s a struggle inside your team, and a struggle in interaction between team and domain experts.
Build the code of domain model and share it between many AI/ML tasks. Put all the domain knowledge inside it, and all the AI/ML knowledge outside.
Data-sky has built-in support for graylog, netdata and redash.
All deployed services are connected to graylog by default, so all logs are collected in one logging and monitoring system. Netdata is used to check network load. Redash is used to visualize data, features, predictions and more.
Create new project instances and configure existing with CLI.
Wizard is a CLI tool for installation, creating project instances, configuring services, starting/stopping services, managing tasks, domain models, managing client docker containers and more.
All instances of Data-sky projects has templated structure.
This behaviour is an answer to AI/ML exploration challenge. It contains all files related to project and intended to be in sync with all clients. To fulfill this requirement project files are stored in git.
Default project structure help your team to easily switch between different projects, and navigate inside the project.
Data and features naming conventions solves a lot of issues inside your team and in communication with experts.
It’s easy to understand origin and all the transformations applied to data. You can enforce naming conventions in feature extractors and solve miscommunication at the root of a problem. You can use simple name resolving to give a suggestion and explanation to user or expert when you communicate.
We use straighforward and simplest possible solution. There is a lot of situations where simple is enough.
Our solution include few steps: data gathering -> feature extraction -> model inference. All this stages run asyncronously. Don’t use pipelines if you don’t need it. Use data storage to reuse data for other ML tasks and feature store to share features between colleagues and ML models.
Divide code into explicit parts - responsible for AI/ML, and responsible for domain.
AI/ML projects struggles from misunderstanding. It’s a struggle inside your team, and a struggle in interaction between team and domain experts.
Build the code of domain model and share it between many AI/ML tasks. Put all the domain knowledge inside it, and all the AI/ML knowledge outside.
Data-sky has built-in support for graylog, netdata and redash.
All deployed services are connected to graylog by default, so all logs are collected in one logging and monitoring system. Netdata is used to check network load. Redash is used to visualize data, features, predictions and more.
Make frequent operations easier
even don't do it at all
- TemplatesWizard for generating and configuring ML/AI projects. Project structure and architecture fitted for mid and long time range projects (1month +).
- ArchitectureService-oriented architecture, based on docker technology. Opened and extendable set of tools. Framework is built on top of opensource products, libraries and components.
- InstallationUse local installation on your own premises for team work, or installation in datacenters/cloud premises for corporate use.
- MLOpsDeploy AI/ML solutions in framework instance. Engage Datascientists with end-to-end responsibility for the whole AI/ML solution.
Looking for beta testers
Please, contact us
FREE for non-commercial usage
The version for non-commercial use is free and will always be free.
If you want to use our tool for making money in a commercial organization, you will need to purchase a license.