I’m catching up on 23 (Research Data) Things from ANDS. I’m late to the party, so I’m going at my own pace and blogging my thoughts.
I found Boston University Libraries page to be very helpful, especially for identifying categories of research data:
- Observational: data captured in real-time, usually irreplaceable. For example, sensor data, survey data, sample data, neurological images.
- Experimental: data from lab equipment, often reproducible, but can be expensive. For example, gene sequences, chromatograms, toroid magnetic field data.
- Simulation: data generated from test models where model and metadata are more important than output data. For example, climate models, economic models.
- Derived or compiled: data is reproducible but expensive. For example, text and data mining, compiled database, 3D models.
- Reference or canonical: a (static or organic) conglomeration or collection of smaller (peer-reviewed) datasets, most probably published and curated. For example, gene sequence databanks, chemical structures, or spatial data portals.
The data can be in a great amount of formats, which was quite obvious from looking around the CSIRO Data Access Portal. Having data openly available is a good start to help more people (non-scientists) access and reuse research data. Still, I don’t think many people know they can freely access actual research data like this.