20
https://ec.europa.eu/eurostat/cros/sites/default/files/WPC_Deliverable_C6_Reference_Methodological_
Framework_v2.0.pdf, checked on 8/4/2021.
Stone, M. (1974): Cross-Validatory Choice and Assessment of Statistical Predictions. In Journal of the
Royal Statistical Society: Series B (Methodological) 36 (2), pp. 111–133. DOI: 10.1111/j.2517-
6161.1974.tb00994.x.
van der Loo, Mark (2014): The stringdist package for approximate string matching. In The R Journal 6 (1).
Available online at https://CRAN.R-project.org/package=stringdist, checked on 11/19/2021.
Appendix
Appendix A URL components
Table 5 shows some common components of URLs with
https://ec.europa.eu/eurostat/cros/WIN_en as an example. Many URLs also contain the
prefix www. Technically speaking, this is a subdomain and does not necessarily need to be the same as the
domain without www. Usually, website owners will have one “correct” version of their URL (e.g. without
the www) and will redirect to this standard version if the user types in the “wrong” version. As an example,
if one navigates to https://www.ec.europa.eu/eurostat/cros/WIN_en the browser will
redirect to the URL without www.
Table 5: URL structure
Appendix B Example for a file blocklist
The following file endings indicate that a URL does not contain content that can be rendered to HTML.
.mng, .pct, .bmp, .gif, .jpg, .jpeg, .png, .pst, .psp, .tif, .tiff, .drw, .dxf,
.eps, .svg, .mp3, .wma, .ogg, .wav, .ra, .aac, .mid, .aiff, .3gp, .asf, .asx,
.avi, .mp4, .mpg, .qt, .rm, .swf, .wmv, .m4a, .css, .pdf, .doc, .docx, .exe,
.bin, .rss, .zip, .rar, .msu, .flv, .dmg, .xls, .xlsx, .mng?download=true,
.pct?download=true, .bmp?download=true, .gif?download=true, .jpg?download=true,
.jpeg?download=true, .png?download=true, .pst?download=true,
.psp?download=true, .tif?download=true, .tiff?download=true, .ai?download=true,
.drw?download=true, .dxf?download=true, .eps?download=true, .ps?download=true,
.svg?download=true, .mp3?download=true, .wma?download=true, .ogg?download=true,
.wav?download=true, .ra?download=true, .aac?download=true, .mid?download=true,
.au?download=true, .aiff?download=true, .3gp?download=true, .asf?download=true,
.asx?download=true, .avi?download=true, .mov?download=true, .mp4?download=true,
.mpg?download=true, .qt?download=true, .rm?download=true, .swf?download=true,
.wmv?download=true, .m4a?download=true, .css?download=true, .pdf?download=true,
.doc?download=true, .exe?download=true, .bin?download=true, .rss?download=true,
.zip?download=true, .rar?download=true, .msu?download=true, .flv?download=true,
.dmg?download=true
Appendix C Technical Requirements
The following non-exhaustive list defines some technical requirements for a common URL finder software
within the ESS. In general, this URL finder should be easily configurable, even without knowledge of the
programming language it was written in.