Pointcloud Processing Tooling

For some of the more recent LIDAR maps I’ve been producing over the past six months - see this post with initial experimentation, and the resulting full-sized maps I’ve rendered so far can be found here - the source data has been in Pointcloud form rather than as a ‘flat’ raster image format (GeoTIFF) of just the DSM height values, and so it requires different processing in order to clean it up, and then convert it into a ‘displacement’ image map so I can render it as a 3D geometry mesh.

The below rendering is of Sydney, and uses a conversion from Pointcloud source data (available from https://elevation.fsdf.org.au/ as .laz files):

Sydney DSM map LIDAR render

The data being in Pointcloud format has both advantages and disadvantages over more simple raster height images: one of the main advantages is that there’s often (depending on the density and distribution of the points) more data per 2D area measure of ground, and each point has separate positions and attributes, rather than just being an average height value for the pixel as it is in raster image format. This means that theoretically it’s easier to “clean up” the data and remove “bad” points without destroying surrounding data. Another advantage is that being fully-3D, it’s possible to view the points from any direction, which is very useful when visualising and when looking for outliers and other “bad” points.

The disadvantages are that the data is generally a lot heavier and takes up more memory when processing it - three 8-byte double/f64 values need to be stored for the XYZ co-ordinates - at least when read from LAS files, as well as additional metadata per point (although there are ways to losslessly compress the memory usage a bit) - in addition to needing different tools to process than with raster images. Newer QGIS versions do now support opening and viewing .LAS/.LAZ Pointclouds (both in 2D and 3D views), although on Linux I’ve found the 3D view quite unstable, and other than being able to select points to view the classification, there’s not much else you can do to process the points, other than some generic processing which uses the PDAL tooling under-the-hood. It also appears QGIS has to convert the .LAS/.LAZ formats to an intermediate format first, which slows down iteration time when using other processing tooling side-by-side.

PDAL is a library for translating and manipulating Pointcloud data (similar to GDAL for raster data / GeoTIFFs, which QGIS uses for a lot of raster and vector operations under-the-hood), and it has quite a few useful features including merging Pointclouds (a lot of the source DSM Pointcloud data is only available as tiles of areas, and so the data needs to be merged to render entire cities, either before converting to a raster displacement map or after), filtering points, rejecting outliers and converting to raster image heightfields.

I have however found its memory usage somewhat ‘excessive’ for some operations, in addition to being slow (despite the fact it’s written in C++). Because of this - and also to learn more about Pointclouds and the file formats - I’ve started to write my own basic Pointcloud processing utility application (in Rust - the las-rs crate allowed out-of-the-box reading and writing of the .LAS/.LAZ Pointcloud formats which was very useful to get started quickly), which despite not really doing anything especially ‘fancy’ for some of the more simple operations like merging .LAS/.LAZ files - it just does a naive loop over all input files, filtering the points within based on configured filters and then accumulating and saving them to the output file - uses a lot less memory than PDAL does, and is quite a bit faster, so I’ve been able to process larger chunks of data with my own tooling and with faster iteration time.

The one area I haven’t tackled yet and am still using PDAL for is conversion to output raster image (GeoTIFF) - which I then use as the displacement map in my renders - however I hope to implement this rasterisation functionality myself at some point.

I am on the lookout for better Pointcloud visualisation software (in particular on Linux - a lot of the commercial software seems to be Windows or Mac only). QGIS’ functionality is adequate but not great, and is fairly lacking in terms of selection, and other open source software I’ve found like CloudCompare seem a bit unstable (at least when compiling from source on Linux), and it’s not clear how well it’d scale to displaying hundreds of millions of points at once.

I have found displaz which is pretty good for displaying very large Pointclouds (it progressively draws them, and seems to store them efficiently in memory), however it has no support for selection or manipulation of points (by design), so I’m still looking for something which caters to that additional need: in particular the selecting of outlier points interactively and culling them.

Simple Photo Collage Generation

Last week I implemented support for generating very simple (grid only) collages from photos/images in my image processing infrastructure.

Example Photo Collage of Pedestrian crossing lights in Wellington

I had wanted to create some simple grid-based collages of some photos, and I was somewhat surprised to discover that neither Krita nor GIMP (free/open source image manipulation software) seem to provide any built-in functionality to generate this type of output, without manually setting up grids/guides and resizing and positioning each image separately - which while not difficult in theory - is somewhat onerous, especially when you want to generate multiple similar collages from different sets of input images in a procedural/repeatable way.

I did briefly look into (free) web-based solutions, however I wasn’t really happy with that avenue either, partly due to most web solutions having the same lack of procedural/recipe generation (i.e. being able to just change the input images and get the same type of result without re-doing things from scratch again), but also because many web solutions seemed more targeted at “artistic” collages with photos having arbitrary positions and rotations, rather than having grid presets, as well as the fact that many (although not all) of the web apps in question required some form of registration or sign-in.

So I ended up just quickly implementing basic support for this collage generation myself in my image processing infrastructure, which took less than two hours, and means I can now generate arbitrary grid collages from a ‘recipe’ source parameters file which configures the target output resolution, the row and column counts, input images list, inner and outer border widths and border colour, as well as the image resize sampling algorithm/filter to use (i.e. bilinear, bicubic, Lanczos, etc) for resizing the input images to fit into the collage grid.

Timelapse Blending

Over the past few months I’ve made some attempts at timelapse photography, mainly motivated by seeing this site/software on High Dynamic Time Range Images which effectively “blends” multiple images taken at different times into one final image.

Rather than use the above software (which is written in Perl), I decided to write my own implementation using my existing image processing infrastructure I have, and have so far come up with a simple implementation that supports linear “equi-width” blending, and in the future I plan to implement more varied interpolations similar to the original software, as from experimentation, Sunrises/Sunsets and the progression from day to night are not often linear in the resultant brightness of captured images.

Scenes with many lights in that progressively turn on within the timelapse duration seem to work very well generally: here are two examples I’m fairly happy with, showing both non-blended and fully-blended examples of each.

San Francisco:

Time Blend of San Francisco

Time Blend of San Francisco


Time Blend of Wellington

Time Blend of Wellington

There do though appear to be some types of scenes that don’t always seem to work that well with this technique, in particular ones where the sun is either quite prominent or the sky gradient in the horizontal direction is very noticeable: this can lead to “odd”-looking situations where the image “slices” which show the sky should in theory get darker as you progress through time, but due to the sky colour gradient in the source images, it counteracts this on one edge of each image slice, looking a bit weird (at least to my eyes).

I also tried converting a sunrise timelapse sequence I took several years ago in Australia which had clouds moving very slowly across the sky horizontally in the frame, and this produced what almost looked like an artifact-containing/repeating-pattern image (it was technically correct and valid though) in that the same bits of cloud were repeatedly in each image slice by coincidence due to their movement across the sky being in sync with the time delay between each subsequent image.

Other things to look out for are temporal position continuity when blending (see the Wellington blended version with the boat masts moving between captures above), where things like people, vehicles, and trees vary position over time, meaning the blending leads to “ghosting” due to the differing positions in the adjacent images which are being blended/merged together.

Average Global Cloud Cover Satellite Imagery

In a third instalment of attempting to copy images I’ve seen online with my own code, I recently saw some images generated by Johannes Kröger, whereby he ‘integrated’ or averaged a satellite image taken every day from the Suomi VIIRS Satellite into a final image which approximated the median average of cloud cover over the year. He had an original blog post in 2019 here, and a follow-up in 2021 with more technical details here.

I liked the look of the imagery, and was curious how easy it would be to generate myself, and on top of that, was also interested in generating per-quarter/season images rather than ones only for the entire year, in order to try to see obvious variations between seasons.

It should be noted that these will be approximations: the source imagery is taken once a day - generally around noon (although it varies per day per location due to the satellite orbits, as can be seen when comparing adjacent per-day images) - and these processed imagery will include snow/ice cover as well, as shown in this preview of the North Pole area for the approximate average pixel colour of all 366 days of 2020: Example Polar projection image of median cloud cover for 2020

Johannes Kröger’s 2021 blog post contained a bash script example which used the gdal_translate command of the Geospatial Data Abstraction Library (GDAL) suite of tools to download the source imagery from NASA’s Global Imagery Browse Services (GIBS) using a web API which provides tiled images, allowing the download of entire images from source tilesets.

I needed to modify his script to get it working (the ‘TileLevel’ needed to be changed), but I didn’t really want to use bash shell anyway, so I wrote a Python script to do the same thing, but added the functionality to also download imagery for multiple days at a time as a date range, and to also use multiple threads to download multiple images in parallel (downloading a single set of tiles for a single date is quite slow), and also added a ‘cubic’ resize filter to the gdal_translate command line args. The Python script in its final form (albeit slightly sanitised - the save path will need to be changed in order to use it) can be downloaded here.

Note that the size of the images are quite large on disk given their fairly high resolution.

Johannes Kröger did give instructions on how to use available software (GDAL in his case) to perform the ‘averaging’ operation, but this was the bit I wanted to fully implement in code myself: I already had fairly comprehensive Image reading and processing infrastructure code of my own, so modifying it to perform ‘mean’ averages was pretty trivial: to just loop through the entire planar image of each final .tif file for each day’s imagery, and add them all together, and then divide each pixel value by the total number of images. This worked, however the result of using ‘mean’ average pixel values produces an image which does not really represent (at least directly) pixel values that actually occurred in terms of cloud cover: it’s an interpolation, and doesn’t show the pixel values that were most common (i.e. the colour values which occurred the most over the duration of the year for each day).

To find the most common pixel values over the course of the year for each pixel position in the imagery, the ‘median’ average needs to be used, and to calculate this was more development work, as a ‘median’ average requires having all the values for a pixel sorted in order (of luminance/brightness generally), and doing that for 365 16k images at full float32 precision in linear space (despite the source data being in sRGB 8-bit space, it’s generally a good idea to pull pixel values into ‘linear’ colourspaces in order to do computation on them) would take around 1.61 GB of memory (16,384 x 8,192 pixels x 3 channels x 4 bytes) per image, and so it was not going to be feasible to store all 365 entire planar images in memory at once (that would take at least 587 GB of RAM). I could have quantised the pixel values a bit whilst still keeping them in linear space (say to half float16 precision) or something lower with fixed-point, but that still wouldn’t have been anywhere near enough of a reduction, so it was clearly going to require breaking the image up into chunks, so I decided to process the ‘median’ average values in tiled regions, given the source TIFF images were in tiled form anyway, and so reading the individual tile regions for each source image would be easy and pretty fast to iterate through them.

I ended up with an algorithm that would for each tile region (256 x 256 size for the images I had downloaded) of all images (they all obviously have to be the same resolution), iterate through all images for the year, but just for that single tile region at a time, and accumulate all pixel values for all images into arrays per pixel position within the tile region. This way, the total memory usage was “just” the tile size dimensions (256 x 256) x 3 x 4 bytes = 786 KB (plus a bit extra for data structure overhead). Total memory cost for 365 tiles would then be around 287 MB, which is much more reasonable. Then for each pixel position within the tile region, all the pixel values for that pixel position from all the source images needed to be sorted (by luminance), and then the middle ‘median’ value picked. This single RGB value per pixel position within the tile region could then be baked down to a single final image buffer for the tile region, and the memory allocation of all previous pixel values for all 365 tile images could be freed, and then the next tile region could be processed in the same way, for all tile regions in the source images.

Then, finally, these per-tile-region final images would be re-assembled based off their tile position into a final image of the resolution of the original full source images, and this result saved to a final full output image. Given enough memory (my main Linux desktop has 32 GB), it was also trivial enough to process the per-tile-region reading of all source tile images for that tile region and ‘median’ sorting and evaluation in multiple threads, as each tile region could be completely independent from one another, speeding up processing considerably.

Processing 365 16k images into a final output image took around 14 minutes using 12 threads on a Ryzen 5900X, which wasn’t too bad, and there’s still a bit of room for further optimisation I think.

After experimentation with the output, I also added thresholding so as to not accumulate pixel values that were black: the poles of the earth in the satellite imagery were occasionally black, depending on the orbit and that affected the output values a bit.

I had tried to produce average images for the 2022 year, but it turns out the Suomi VIIRS Satellite was missing imagery for late July 2022 and the first half of August 2022, so I used 2021 instead which seemed to have full imagery, and also did 2020 for comparison purposes.

The output result of this for all days in 2021 is this image (full 4k version link): Approximate median average cloud cover in 2021

which is using the same WGS 84 projection that the source images used. Reprojected to a more “true-size preserving” projection - Robinson Projection - provides this image (full 4k version link): Approximate median average cloud cover in 2021, Robinson projection

Below is a table containing links to 4k versions of per-quarter images for 2021:

2021 Q1 (Jan-Mar)4k Image Link
2021 Q2 (Apr-Jun)4k Image Link
2021 Q3 (Jul-Sep)4k Image Link
2021 Q4 (Oct-Dec)4k Image Link

The per-quarter versions do clearly show (as expected) obvious differences in seasons, although there may well be yearly variation as well, and snow/ice cover changes will also be included in the changes.

I’m keen to produce more of these in the future - ideally at higher resolution for more localised regions with better projections - in addition to attempting to generate (mostly) cloudless imagery similar to the famous “Blue Marble” images, to see how easy it is to detect clouds vs snow/ice on the ground: either with large vs. small changes day-to-day between images, or I wonder if it’s possible to use Infrared imagery to detect if colours are likely clouds or not, or by using some of the other output info from the VIIRS sensors.

Full Index

2024 (3)
2023 (7)
2022 (3)
2021 (5)
2020 (4)
2019 (7)
2017 (1)
2016 (2)
2015 (1)
2014 (9)
2013 (10)
2012 (7)

Tags List