Open Research Data

Promoting open science is one of FMI’s strategic objectives. Once data is collected, it can be released in different ways, from closed networks within your research group or within FMI all the way to publishing with fully opened license, with all the spectrum in between.

At FMI, publicly funded research data should be made available to the widest possible audience (under CC BY license, at the minimum), as the best way to maximize the data impact but also to do justice to all the hard labor put into collecting, cleaning, and analyzing the data. However, even if the publication of data is not possible for reasons listed in the institute’s Open Research Data policy, FMI seeks to publish the metadata, acknowledging their existence, topic, contact information and ways –whenever possible – to obtain the data.

There are obviously many benefits to opening the data. In so doing, you may increase the visibility and impact of your research, open new collaborations and research projects, provide stronger evidence for advocacy, play a role in decision making, encourages transparency and accountability, just to name few of the potential benefits.

The datasets created at FMI can be found in:

Read the FMI Research Data policy.

Open vs. restricted access to data

“As open as possible, as closed as necessary”

According to the definition, open data can be freely used, modified, and shared by anyone for any purpose. This doesn’t mean that any published data or online data is necessary open data, unless they are accompanied by a proper license. Of course, public sharing of research data should not conflict with sensitive information content, military and security issues, contractual restrictions, patenting or just with a certain embargo period. Research funders and publishers that typically require that research data should be made as openly available as possible, recognize that there may be legal, ethical, or commercial reasons why access to some data may need to be restricted. These restrictions typically apply at all stages of a project so that the research process is not damaged by inappropriate release of data. That is why a well thought Data Management Plan (DMP) is important at the very beginning of and during the project.

There are several layers of access to data that should be considered. When needed, access to data can be restricted and regulated, granting more control over who has access, when and under which conditions.

  • Publicly open data - when there is no justifiable legal, ethical, commercial, contractual, security reasons for not opening the data.

  • Embargo – a delayed access to data. During the embargo time the access to data is completely restricted until the article based upon the data is accepted for publication or even published. However, access to the data may be granted to the editors and reviewers.

  • Access upon request - in case of embargoed, confidential, or sensitive data. In this case the metadata (which is publicly available) will include the contact name of a person who can decide whether access can be granted or not.

  • Non-disclosure agreements – for sharing sensitive data with specific people under specific conditions.

Fair data principles

The FAIR Data Principles are a set of guiding principles in order to make data findable, accessible, interoperable and reusable, providing guidance for scientific data management and stewardship.

One of the challenges of data intensive science is to help sharing knowledge. It is important that people and information systems are able to find, use, reuse, analyze and combine scientific data that is suitable for the task at hand. In general, research institutions, funders and publishers have significantly intensified their demands on research data management and opening research data for reuse, requiring the FAIR principles to be applied in order to encourage researchers to ensure that their data is soundly managed and subsequently shared.

Fair data is findable, accessible, interoperable, and reusable.
Image: Libereurope

Links of interest

https://www.fairdata.fi/en/

https://www.force11.org/

https://www.go-fair.org/

Data Management

What is data management?

Data management comprises in a series of actions related to acquiring, validating, organizing, storing, maintaining, and protecting the data created and used in a research project in order to ensure the availability and quality of the data to the external users. In other words, data management is the glue that holds together all the segments of data lifecycle.

Data, as a scholarly product, is fragile and easily lost. That is why the research data management should include both everyday management of the data during the lifetime of a research projects, and also decisions regarding the preservation and sharing after the end of the project.

Why data management

There is a multitude of reasons why data management is important. Research data management saves time and resources in the long run, prevents errors, increases the quality of data analysis, and facilitates sharing of research data. At a very pragmatic level, data management and data management plans are part of the requirements imposed by funders and publishers.

More and more funders are asking the researcher to include a DMP as part of their grant applications, but such plans are also useful for:

  • Developing a strategy for data storage and long-term preservation, handling sensitive data, and sharing,

  • Forecasting possible legal, ethical, and commercial issues related to your data release,

  • Preventing or reducing the likelihood of data loss, errors, and unethical use of data,

  • Properly budgeting your project (in the unlikely case you need to buy storage space from a commercial provider)

Data Management Plan (DMP)

A DMP is used to help identify decisions that must be made regarding the key activities in a research data cycle throughout a research project, addressing the following questions:

  • What kind of data will be collected and how is it described?

  • How data is managed, stored, and secured during the project?

  • How will the data be made findable and accessible after the project?

  • How will data ownership be managed?

  • What kind of access is given, to whom, under what conditions and for how long?

When writing a DMP, one can follow the diagram below, which describes the processes the research data is going through during a lifetime of a project.

There are multiple steps that research data goes through during a lifetime of a project.

METIS - FMI's Research Data Repository

The FMI research data repository METIS is provided by EUDAT and enables the institute data to be preserved, discovered, and accessed. FMI’s researchers are strongly encouraged to deposit their research data in the FMI data repository unless there is a subject-specific repository or data center which is commonly used in your field or if it is a contractual obligation.

FMI suggest that staff should register details of their data in METIS even if the data are stored or published elsewhere. This promotes reuse and collaboration, helps evidence research impact, and maintains a record of the FMI’s research data.

Key features

  • Enables the preservation and curation of research data.

  • Enables publication of research data, with a unique Digital Object Identifier (DOI) for citation.

  • Available to general public to search the Repository for data.

  • Access to data can be embargoed prior to publication up to a maximum 2 years.

  • Enables compliance with Academy of Finland and other major funding bodies’ policies.

  • Specialist technical support and expert research data management advice available.

  • Free at the point of use for FMI researchers.

Alternative data repositories

In addition to FMI’s research data repository, the following services could be eventually used:

  • IDA – Research Data Storage -A service provided by the IT Centre for Science CSC meant for storing stable research data.

  • Zenodo – A general-purpose open-access repository developed under the European OpenAIRE program and operated by CERN.

  • Figshare helps academic institutions to store, share, and manage all their research outputs. Uploads files up to 5Gb.

  • Harvard Dataverse – The Harvard Dataverse is open to all scientific data from all disciplines worldwide. It includes the world's largest collection of social science research data.

  • Mendeley Data – Mendeley Data is an open, cloud-based platform that helps research institutions to manage the entire lifecycle of research data, and enables researchers to discover, collect, and share research data. Supported by Elsevier.

  • Open Science Framework – OSF is part network of research materials, part version control system, and part collaboration software.

  • EOSC portal (not yet in production)

A searchable directory of international research data repositories can be found at re3Data. Fairdata provides services at national level which are part of the digital preservation services of the Ministry of Education and Culture, Finland.

Sometimes the funder or publisher will require that you deposit your data into a specified repository. In this case you must follow the requirements of the funding agency/publisher.

Services for searching datasets and other useful tools

Universities, research institutes, funders, and publishers require you to make your research data as reusable as possible. This also makes it possible for you to use the data of others.

Below are listed the most popular research data search engines:

General search engines

Google Dataset search

Dataset Search is a search engine for making datasets universally accessible and useful.

OpenAIRE explore

International search engine based in Europe specifically for academic and scientific research. Search for publications, datasets, software, and other research outputs. See how these are linked together and how they are linked to funding and organizations. View statistics on projects and institutions.

Datacite Search

International search engine for academic and scientific research. Search Datacite registry allows you to find datasets, software, images, and other research material.

Re3data

Re3data is a global registry of research data repositories that covers research data repositories from different academic disciplines.

Elsevier Mendeley datasearch

Open research data repositories in Mendeley index covers OpenAIRE, Datacite, Zenodo.

Zenodo

General global repository for data and publications.

Etsin

Etsin contains information about the datasets and metadata in the national Finnish Fairdata services.

EOSC

European Partnership aiming to federate services for sharing, storage, management, analysis, and re-use of research outputs. Not completely functional.

Field specific search engines

Copernicus

European Union's Earth observation programme, looking at our planet and its environment. It offers information services that draw from satellite Earth Observation and in-situ (non-space) data.

GEOSS

The Geoss portal is an online map-based user interface which allows users to discover and access Earth observation data and resources from different providers from all over the world.

Environmental research infrastructures data portals

ACTRIS

ACTRIS is the pan-European research infrastructure producing high-quality data and information on short-lived atmospheric constituents and on the processes leading to the variability of these constituents in natural and controlled atmospheres.

EISCAT

EISCAT is an international scientific association with member institutes in several countries conducting ionospheric and atmospheric measurements with radars. EISCAT operates in three countries: Finland, Norway and Sweden, with all the facilities located north of the Arctic circle.

IAGOS

IAGOS is a European Research Infrastructure for global observations of atmospheric composition from commercial aircraft. IAGOS combines the expertise of scientific institutions with the infrastructure of civil aviation in order to provide essential data on climate change and air quality at a global scale. To provide optimal information, two complementary systems have been implemented, (i) IAGOS-CORE providing global coverage on a day-to-day basis of key observables and (ii) IAGOS-CARIBIC providing a more in-depth and complex set of observations with lesser geographical and temporal coverage.

ICOS

ICOS produces standardized, high-precision and long-term observations and facilitate research to understand the carbon cycle and to provide necessary information on greenhouse gases.

SIOS

SIOS is a regional observing system for long-term measurements in and around Svalbard addressing Earth System Science questions. SIOS integrates the existing distributed observational infrastructure and generates added value for all partners beyond what their individual capacities can provide.

SeaDataNet

SeaDataNet is a pan-European distributed marine data infrastructure for the management, exchange and re-use of marine and oceanographic data sets.