Cover článku How we simplified our Public Data Storage (data.cesko.digital)
13. 8. 2021 Matěj Horák

How we simplified our Public Data Storage (data.cesko.digital)

I’ve contributed to the civic-tech community called Česko.Digital and we created a new project. We are storing all static assets in our S3 bucket accessible via CloudFront. This article describes ways we simplified uploading these assets.

The Main Motivation

There are several reasons why Česko.Digital needs storage space for public assets:

  • Firstly, we do not store our blog images in the repository, which would have inflated the repository size quite a lot.
  • We have some generated data, such as a book translation or a machine-readable list of all cities in the Czech Republic, that we need to store somewhere.

The first solution was quite simple. Tomáš Znamenáček (zoul) and Martin Wenisch together configured a simple S3 bucket with CloudFront leading to the domain “data.cesko.digital”. All scripts for the automated data upload were uploaded to this bucket. Zoul was responsible for the uploads as he usually is the publisher of all new blog posts.

It works well, but there is always some space for improvements:

  • Only the people who had an AWS IAM access could upload new files.
  • Any uploads were only possible through the AWS CLI or AWS Console.
  • The S3 bucket did not have any backup or a versioning system.

Architecture and Implementation

First, we opened a GitHub issue to discuss the specifications. The initial idea was to create a simple application. But then Martin got an amazing idea to use GitHub Actions.

For that, we would need a separate GitHub repository for the assets and to configure a GitHub Action for content synchronization. With this solution we would benefit from the Git versioning and the content back up. I looked for similar approaches for public data storage but I could not find anything. For that reason, I have made my decision to implement the synchronization.

For the AWS set up configuration, I wanted to use Terraform from the very beginning. The reason for that is that AWS is a complex service. Having the infrastructure as code is useful, especially because it works great for prototyping. Terraform helps you to set up, change and remove any AWS infrastructure. Martin has a lot of experience in Terraform, and he has also done the code reviews for me. Please see a link to an existing solution for the S3 upload I found here: jakejarvis/s3-sync-action.

All sources are available on GitHub. Now, let me highlight some design decisions we have made:

  • Currently we have two S3 buckets. One is used for the manual uploads, and the other is used for automatically generated data. CloudFront has an origin group with a fail-over bucket configured. CloudFront transfers a particular request to the bucket with the generated files, in case the file is not in the primary bucket.
  • We have an error page for invalid URLs.
  • Sync Action uploads all files from the content folder and invalidates the CloudFront cache. It was essential to add the “delete flag” to make sure removed files are removed from the cache, too.
  • We configured a GitHub Action for infrastructure changes. Terraform state is kept in a separate S3 bucket. The repository also contains a CODEOWNERS file. This file prevents any unwanted infrastructure changes that could bring extra costs.

GitHub GUI is available for any file uploads immediately, and it is accessible to anyone. The file upload process is very transparent and clear. On top of this structure, we are considering adding an extra layer such as NetlifyCMS. This would help us to hide the committing process. At this point however, we treat this as a cosmetic change and it does not have a high priority in our near future improvements.

I enjoyed working on this project. It was a fun journey, and not only because I have learned something new, especially about Terraform and AWS. Thank you so much, Martin and zoul. Thank you for your help, your dedication and for being an inevitable part of this project.

Zpět na všechny články
15. 9. 2021Tereza Lattová

číst.digital/26: Usnadňujeme budoucím učitelům cestu ke katedře | Ověřte si, že nedlužíte státu

Ahoj Česko.Digital! 👋 I s mizejícími slunečními paprsky máme energii na rozdávání a radujeme se nad dalšími dokončenými projekty. Po dlouhých měsících práce pouštíme do světa nový web Začni učit! a věříme, že zájemcům o učitelství usnadní cestu ke katedře. A jestli si nejste jisti, zda nedlužíte státu, ověřte si to jednoduše na nové verzi webu Nedlužím státu. Na srdci toho máme ale mnohem víc. Užijte si zářijové čtení!

14. 9. 2021Jan Hobler

Nedlužím státu? Občané díky novým funkcím portálu vyřeší případné dluhy zase o něco snadněji

Věděli jste, že poplatky za psa nebo zapomenutý rest z pokuty v hromadné dopravě vám mohou nenápadně „kynout” a v budoucnu dělat potíže? Nejhorší je fakt, že o závazcích vůči státu často ani nemusíte vědět, dokud nepřijde upomínka do schránky, nebo rovnou exekuční příkaz. Pomocnou ruku podává projekt Nedlužím státu, na kterém spolupracují Česko.Digital a IPŘP. Ten se nyní dostal do fáze, která výrazně usnadňuje použití pro uživatele a navíc přibudou funkce řešící případný dluh. Jaké?

24. 8. 2021Gabriela Konvalinková

Dobrovolníci jsou už 2 roky náš hnací motor – 2. část

V Česko.Digital jsme oslavili druhé narozeniny a připomněli si, čeho jsme za tu dobu dosáhli. Projekty jako Dáme roušky, Covid portál či Učíme online jsou jen částí aktivit, kterými jsme se snažili pomoct nebo změnit Česko k lepšímu. Jak se bude Česko.Digital dál vyvíjet, záleží velkou mírou i na samotných dobrovolnících, kterých je v tuto chvíli už přes 4000. Jak se pracuje s dobrovolníky, odkud čerpáme inspiraci a proč chceme působit i za hranicemi Česka? Dočtěte 2. část rozhovoru.