Cover článku How we simplified our Public Data Storage (data.cesko.digital)
13. 8. 2021 Matěj Horák

How we simplified our Public Data Storage (data.cesko.digital)

I’ve contributed to the civic-tech community called Česko.Digital and we created a new project. We are storing all static assets in our S3 bucket accessible via CloudFront. This article describes ways we simplified uploading these assets.

The Main Motivation

There are several reasons why Česko.Digital needs storage space for public assets:

  • Firstly, we do not store our blog images in the repository, which would have inflated the repository size quite a lot.
  • We have some generated data, such as a book translation or a machine-readable list of all cities in the Czech Republic, that we need to store somewhere.

The first solution was quite simple. Tomáš Znamenáček (zoul) and Martin Wenisch together configured a simple S3 bucket with CloudFront leading to the domain “data.cesko.digital”. All scripts for the automated data upload were uploaded to this bucket. Zoul was responsible for the uploads as he usually is the publisher of all new blog posts.

It works well, but there is always some space for improvements:

  • Only the people who had an AWS IAM access could upload new files.
  • Any uploads were only possible through the AWS CLI or AWS Console.
  • The S3 bucket did not have any backup or a versioning system.

Architecture and Implementation

First, we opened a GitHub issue to discuss the specifications. The initial idea was to create a simple application. But then Martin got an amazing idea to use GitHub Actions.

For that, we would need a separate GitHub repository for the assets and to configure a GitHub Action for content synchronization. With this solution we would benefit from the Git versioning and the content back up. I looked for similar approaches for public data storage but I could not find anything. For that reason, I have made my decision to implement the synchronization.

For the AWS set up configuration, I wanted to use Terraform from the very beginning. The reason for that is that AWS is a complex service. Having the infrastructure as code is useful, especially because it works great for prototyping. Terraform helps you to set up, change and remove any AWS infrastructure. Martin has a lot of experience in Terraform, and he has also done the code reviews for me. Please see a link to an existing solution for the S3 upload I found here: jakejarvis/s3-sync-action.

All sources are available on GitHub. Now, let me highlight some design decisions we have made:

  • Currently we have two S3 buckets. One is used for the manual uploads, and the other is used for automatically generated data. CloudFront has an origin group with a fail-over bucket configured. CloudFront transfers a particular request to the bucket with the generated files, in case the file is not in the primary bucket.
  • We have an error page for invalid URLs.
  • Sync Action uploads all files from the content folder and invalidates the CloudFront cache. It was essential to add the “delete flag” to make sure removed files are removed from the cache, too.
  • We configured a GitHub Action for infrastructure changes. Terraform state is kept in a separate S3 bucket. The repository also contains a CODEOWNERS file. This file prevents any unwanted infrastructure changes that could bring extra costs.

GitHub GUI is available for any file uploads immediately, and it is accessible to anyone. The file upload process is very transparent and clear. On top of this structure, we are considering adding an extra layer such as NetlifyCMS. This would help us to hide the committing process. At this point however, we treat this as a cosmetic change and it does not have a high priority in our near future improvements.

I enjoyed working on this project. It was a fun journey, and not only because I have learned something new, especially about Terraform and AWS. Thank you so much, Martin and zoul. Thank you for your help, your dedication and for being an inevitable part of this project.

Zpět na všechny články
26. 1. 2022Tereza Lattová

číst.digital/30: Kdo je náš další hlavní finanční partner? | S pomocí nové platformy uskutečníme naši vizi

Ahoj Česko.Digital! 👋 Do nového roku přejeme hodně zdraví, spokojenosti a dobré nálady! Tu by vám mohl zajistit i dnešní newsletter, který je plný úžasných projektů a novinek. Tak jen malá ochutnávka... Už koncem ledna se k našim partnerům připojí Avast Foundation! A protože by byla škoda, aby vize Česko.Digital zůstala pouze na papíře, vdechneme jí život pomocí platformy TechForum. Příjemné čtení!

10. 1. 2022Marek Mencl

Spolupráce s Česko.Digital: Příběh inkubačního procesu projektu Pohyb je řešení

Příběh tříměsíčního designového procesu, který nás provedl od původní obecné představy technologického řešení ke konkrétní strategii a sérii dílčích služeb.

6. 1. 2022Rona Jankovičová

Sbírka počítačů má nového patrona, který bude nadále podporovat digitalizaci vzdělávání

Když v březnu 2020 vypukla pandemie, projekt Učíme online z dílny Česko.Digital zorganizoval promptně Sbírku počítačů. Dva roky poté, kdy opět hrozí lockdown situace, se projekt odevzdává z rukou expertních dobrovolníků do rukou odborné organizace. Kdo ho povede dál?