DataViz for High-dimensional Environmental Data #0

1 minute read

Some preliminary thoughts

If you’re just here for the code, see my first post.

I’ve always been a bit of a dataviz nerd, but mostly in my spare time.

These days, I’m lucky to work with plenty of interesting environmental chemistry datasets in my PhD, and I thought I’d share some lessons learned in this blog.

In general, my approach to dataviz is one-size does not fit all - it really depends on what you are trying to achieve.

Typically in a scientific context, I think a data visualisation should support or demonstrate whatever claim you are making. Sometimes, dataviz can also be used to identify knowledge gaps and generate further research questions (more on that later).

But for now, before getting too philosophical, let’s consider what I think is one hallmark of many environmental chemistry datasets: they will most likely have many features, a.k.a. dimensions.

A feature or a dimension is basically a column in your table - something you can put on the axis of a plot.

In fact, it’s fairly common these days to find papers studying environmental chemical pollutants as part of regional or even national sampling campaigns, which means at minimum, you’ll have 3-4 dimensions to deal with: compound, concentration, location, and time.

(And with the growing number and quality of open source workflows, it’s only becoming ‘easier’ to process huge amounts of data - bring on more dimensions!)

Inevitably, the question arises…

What’s a good way to visualise these high-dimensional environmental data?

That’s what I hope to continue exploring in this series of blogposts.

Leave a comment!



Leave a comment

Your email address will not be published. Required fields are marked *