Skip to content

dwdown

dwdown is a Python package designed to download weather forecast data from the Deutscher Wetterdienst (DWD), process it, and upload it to a MinIO object storage server. It supports downloading forecast files, uploading them to cloud storage, and processing them for analysis. Furthermore it keeps you informed about the status of downloads, uploads, and any errors.

👉 Check out on GitHub

Features

  • DWDDownloader: Fetch weather forecast data from the DWD open data server.
  • MinioDownloader: Download files from a S3 compatible / MinIO object storage server.
  • MinioUploader: Upload downloaded data to a S3 compatible / MinIO object storage server with parallel uploads and data integrity checks.
  • DataProcessor: Extract, convert and filter data for further analysis.
  • DataEditor: Filter and merge CSV data.
  • Notifier: Receive status messages of downloads, uploads, and any errors from a Gotify server.
  • Logging: Automatically log download and upload activities, and handle errors gracefully.
  • Parallel Uploading: Upload large datasets in parallel for faster performance.
  • Automatic Bucket Creation: Create MinIO buckets if they do not exist.

Installation

You can install dwdown via pip:

git clone https://github.com/trholy/dwdown.git 

pip install .

Documentation

Read the documentation on GitLab Pages.

Usage

Example Workflow

Description

DWDDownloader: Download Data from DWD

The DWDDownloader class allows you to download weather forecast files from the DWD open data server.

from dwdown.download import DWDDownloader

# Initialize DWDDownloader
dwd_downloader = DWDDownloader(
    url="https://opendata.dwd.de/weather/nwp/icon-d2/grib/09/aswdifu_s/",
    restart_failed_downloads=False,  # Dont retry failed downloads
    log_downloads=True,  # Log download status
    delay=0.1,  # 0.1 seconds delay between downloads
    workers=4,  # Use 4 concurrent workers
    download_path="download_files",  # Path for downloaded files
    log_files_path="log_files"  # Path for log files
)

# Fetch download links
dwd_downloader.get_links(
    exclude_pattern=["icosahedral"],
    min_timestep=0,
    max_timestep=10
)

# Download files
dwd_downloader.download_files(
    check_for_existence=True,
    max_retries=3
)

# Print status after download
print("Successfully downloaded files:", dwd_downloader.downloaded_files)
print("Failed downloads:", dwd_downloader.failed_files)
print("Finally failed downloads:", dwd_downloader.finally_failed_files)

MinioUploader: Upload Data to MinIO

The MinioUploader class helps upload files to a MinIO object storage server, ensuring data integrity with MD5 hash verification.

from dwdown.upload import MinioUploader

# Initialize MinioUploader
uploader = MinioUploader(
    endpoint="your-minio-sever.com",
    access_key="your-access-key",
    secret_key="your-secret-key",
    files_path="download_files",  # Path for files to upload
    bucket_name="weather-forecasts",  # Name of the minio bucket 
    secure=False,  # If ‘true’ API requests will be secure (HTTPS), and insecure (HTTP) otherwise
    log_uploads=True,  # Log upload status
    log_files_path="log_files",  # Path for log files
    workers=4  # Use 4 concurrent workers
)

# Upload files to MinIO
uploader.upload_directory()

# Print status after upload
print("Successfully uploaded files:", uploader.uploaded_files)
print("Upload might be corrupted:", uploader.corrupted_files)

MinioDownloader: Download Data from MinIO

The MinioDownloader class helps you download files from a MinIO object storage server.

from dwdown.download import MinioDownloader

# Initialize MinioDownloader
minio_downloader = MinioDownloader(
    endpoint="your-minio-sever.com",
    access_key="your-access-key",
    secret_key="your-secret-key",
    files_path="download_files",  # Path for files to download
    secure=False,  # If ‘true’ API requests will be secure (HTTPS), and insecure (HTTP) otherwise
    log_downloads=True,  # Log upload status
    log_files_path="log_files",  # Path for log files
    workers=4  # Use 4 concurrent workers
)

# Download files from MinIO
minio_downloader.download_bucket(
    bucket_name="weather-forecasts",  # Name of the minio bucket 
    folder_prefix='aswdifu_s'
)

# Print status after upload
print("Successfully downloaded files:", minio_downloader.downloaded_files)
print("Download might be corrupted:", minio_downloader.corrupted_files)

DataProcessor: Process and Convert Data

The DataProcessor class provides tools for extracting, converting, filtering and processing data.

from dwdown.processing import DataProcessor

# Initialize the DataProcessor
editor = DataProcessor(
    search_path="download_files",  # Path for files to process
    extraction_path="extracted_files",  # Path for extracted files
    converted_files_path="csv_files",  # Path for CSV files
)

# Retrieve the filenames that have been downloaded
file_names = editor.get_filenames()

# Convert downloaded files into CSV format
editor.get_csv(
    file_names=file_names,
    apply_geo_filtering=True,
    start_lat=50.840,
    end_lat=51.000,
    start_lon=11.470,
    end_lon=11.690
)

Data Processing with DataEditor

The DataEditor class provides tools for merging and filtering CSV data.

from dwdown.processing import DataEditor

# Variables to build merged dataframe from
variables = [
    'aswdifd_s',
    'relhum',
    'smi',
]

# External mapping dictionary
mapping_dictionary = {
    'aswdifd_s': 'ASWDIFD_S',
    'relhum': 'r',
    'smi': 'SMI',
}

# Pattern selection for known variables
additional_patterns = {
    "relhum": [200, 975, 1000],
    "smi": [0, 9, 27],
}

# Initialize DataEditor
data_editor = DataEditor(
    files_path='csv_files/09/',
    required_columns={
        'latitude', 'longitude', 'valid_time'
    },
    join_method='inner',
    mapping_dictionary=mapping_dictionary,
    additional_pattern_selection=additional_patterns,

)

df = data_editor.merge_dfs(
    time_step=0,
    variables=variables
)
print("Processed DataFrame:", df)

Notifier: Send Status Updates

The Notifier class keeps you informed about the status of downloads, uploads, and any errors via a Gotify server.

from minio import Minio
from dwdown.notify import Notifier

# Initialize Notifier
notifier = Notifier(
    server_url="your-gotify-sever.com",
    token="your-access-token",
    priority=5,
    secure=False  # Set to True if your MinIO server is HTTPS
)

# Initialize minio client
minio_client = Minio(
    endpoint="your-minio-sever.com",
    access_key="your-access-key",
    secret_key="your-secret-key",
    secure=False  # Set to True if your MinIO server is HTTPS
)

# List all buckets
buckets = minio_client.list_buckets()

status_dict = {}

for bucket in buckets:
    bucket_name = bucket.name
    print(f"Processing bucket: {bucket_name}")

    # List all objects in the bucket
    objects = minio_client.list_objects(bucket_name, recursive=True)

    # Get number of objects in the bucket
    status_dict[bucket_name] = [len([obj.object_name for obj in objects])]

# Send notification
notifier.send_notification(
    message=status_dict,
    script_name="download-VM"
)

Directory Structure

The package structure is as follows:

dwdown/
├── src/
│   ├── dwdown/
│   │   ├── __init__.py
│   │   ├── downloader/
│   │   │   ├── download.py
│   │   │   ├── __init__.py
│   │   ├── notify/
│   │   │   ├── notifier.py
│   │   │   ├── __init__.py
│   │   ├── uploader/
│   │   │   ├── upload.py
│   │   │   ├── __init__.py
│   │   ├── processor/
│   │   │   ├── processing.py
│   │   │   ├── __init__.py
│   │   ├── tools/
│   │   │   ├── tools.py
│   │   │   ├── __init__.py
├── example_usage/
│   ├── dwd_processing.py
│   ├── dwd_scraper.py
│   ├── minio_downloader.py
│   ├── minio_uploader.py
│   ├── notifier.py
├── pyproject.toml
├── README.md
├── LICENSE
├── .gitignore

Dependencies

  • lxml: Required for parsing XML files (for DWD data).
  • minio: For uploading files to a MinIO server.
  • pandas: For data processing and handling CSV files.
  • requests: For making HTTP requests to the DWD API.
  • xarray: For handling multi-dimensional arrays (used for handling weather data).

Optional Development Dependencies

  • pytest: For running tests.
  • ruff: For linting Python code.

License

This project is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to share, adapt, and build upon the material, even for commercial purposes, as long as you provide appropriate credit. For more details, visit CC BY 4.0.

Authors & Maintainers

  • Thomas R. Holy, Ernst-Abbe-Hochschule Jena

Contributing

If you’d like to contribute to the development of dwdown, feel free to fork the repository, create a branch for your feature or fix, and submit a pull request.