Dask Working Notes
Writing about Scaling Python
- Improving GroupBy.map with Dask and Xarray
- Dask DataFrame is Fast Now
- High Level Query Optimization in Dask
- Upstream testing in Dask
- Do you need consistent environments between the client, scheduler and workers?
- Deep Dive into creating a Dask DataFrame Collection with from_map
- Shuffling large data at constant memory in Dask
- Managing dask workloads with Flyte
- Easy CPU/GPU Arrays and Dataframes
- Dask Demo Day November 2022
- Reducing memory usage in Dask workloads by 80%
- Dask Kubernetes Operator
- Understanding Dask’s meta keyword argument
- Data Proximate Computation on a Dask Cluster Distributed Between Data Centres
- Documentation Framework
- How to run different worker types with the Dask Helm Chart
- Reflections on one year as the Dask life science fellow
- Mosaic Image Fusion
- Choosing good chunk sizes in Dask
- CZI EOSS Update
- 2021 Dask User Survey
- Google Summer of Code 2021 - Dask Project
- High Level Graphs update
- Ragged output, how to handle awkward shaped results
- Dask Down Under
- Dask Survey 2021, early anecdotes
- The evolution of a Dask Distributed user
- The 2021 Dask User Survey is out now
- Life sciences at the 2021 Dask Summit
- Stability of the Dask library
- Skeleton analysis
- Dask with PyTorch for large scale image analysis
- Image segmentation with Dask
- Measuring Dask memory usage with dask-memusage
- Getting to know the life science community
- Dask User Summit 2021
- Image Analysis Redux
- 2020 Dask User Survey
- Announcing the DaskHub Helm Chart
- Running tutorials
- Comparing Dask-ML and Ray Tune's Model Selection Algorithms
- Configuring a Distributed Dask Cluster
- The current state of distributed Dask clusters
- Faster Scheduling
- Last Year in Review
- Large SVDs
- Dask Summit
- Estimating Users
- Dask Deployment Updates
- DataFrame Groupby Aggregations
- Better and faster hyperparameter optimization with Dask
- Co-locating a Jupyter Server and Dask Scheduler
- Dask on HPC: a case study
- Dask and ITK for large scale image analysis
- 2019 Dask User Survey
- Dask Release 2.2.0
- Extracting fsspec from Dask
- Dask Release 2.0
- Load Large Image Data with Dask Array
- Python and GPUs: A Status Update
- Dask on HPC
- Experiments in High Performance Networking with UCX and DGX
- Composing Dask Array with Numba Stencils
- cuML and Dask hyperparameter optimization
- Dask and the __array_function__ protocol
- Building GPU Groupby-Aggregations for Dask
- Running Dask and MPI programs together
- Single-Node Multi-GPU Dataframe Joins
- Dask Release 1.1.0
- Extension Arrays in Dask DataFrame
- Dask, Pandas, and GPUs: first steps
- GPU Dask Arrays, first steps
- Dask Version 1.0
- Dask-jobqueue
- Refactor Documentation
- Dask Development Log
- Dask Release 0.19.0
- High level performance of Pandas, Dask, Spark, and Arrow
- Building SAGA optimization for Dask arrays
- Dask Development Log
- Pickle isn't slow, it's a protocol
- Dask Development Log, Scipy 2018
- Who uses Dask?
- Dask Development Log
- Dask Scaling Limits
- Dask Release 0.18.0
- Beyond Numpy Arrays in Python
- Dask Release 0.17.2
- Craft Minimal Bug Reports
- Dask Release 0.17.0
- Credit Modeling with Dask
- Pangeo: JupyterHub, Dask, and XArray on the Cloud
- Dask Development Log
- Dask Release 0.16.0
- Optimizing Data Structure Access in Python
- Streaming Dataframes
- Notes on Kafka in Python
- Dask Release 0.15.3
- Fast GeoSpatial Analysis in Python
- Dask on HPC - Initial Work
- Dask Release 0.15.2
- Scikit-Image and Dask Performance
- Dask Benchmarks
- Use Apache Parquet
- Dask Release 0.15.0
- Dask Release 0.14.3
- Dask Development Log
- Asynchronous Optimization Algorithms with Dask
- Dask and Pandas and XGBoost
- Dask Release 0.14.1
- Developing Convex Optimization Algorithms in Dask
- Dask Release 0.14.0
- Dask Development Log
- Experiment with Dask and TensorFlow
- Two Easy Ways to Use Scikit Learn and Dask
- Dask Development Log
- Custom Parallel Algorithms on a Cluster with Dask
- Dask Development Log
- Distributed NumPy on a Cluster with Dask Arrays
- Distributed Pandas on a Cluster with Dask DataFrames
- Dask Release 0.13.0
- Dask Development Log
- Dask Development Log
- Dask Development Log
- Dask Development Log
- Dask Cluster Deployments
- Dask and Celery
- Dask Distributed Release 1.13.0
- Dask for Institutions
- Dask and Scikit-Learn -- Model Parallelism
- Ad Hoc Distributed Random Forests
- Fast Message Serialization
- Distributed Dask Arrays
- Pandas on HDFS with Dask Dataframes
- Introducing Dask distributed
- Dask is one year old
- Distributed Prototype
- Caching
- Custom Parallel Workflows
- Write Complex Parallel Algorithms
- Distributed Scheduling
- State of Dask
- Towards Out-of-core DataFrames
- Towards Out-of-core ND-Arrays -- Dask + Toolz = Bag
- Towards Out-of-core ND-Arrays -- Slicing and Stacking
- Towards Out-of-core ND-Arrays -- Spilling to Disk
- Towards Out-of-core ND-Arrays -- Benchmark MatMul
- Towards Out-of-core ND-Arrays -- Multi-core Scheduling
- Towards Out-of-core ND-Arrays -- Frontend
- Towards Out-of-core ND-Arrays