Collecting Data in a Useful and Responsible Way

DATA

Lorena Aguiar Franjoux

1/7/2026

Why collecting is not just a formality

This article is part of a Data & Impact series that brings together several articles on how to use data more responsibly. In the last article, we explain why and how to define your problem in order to get more value out of your data. If you haven't read it yet, feel free to check it out by clicking here.

Once the question has been defined, it is time to identify, select, and organize the relevant data. But be careful: collecting everything “just in case” can quickly become unproductive. It consumes resources, complicates processing, and can undermine the overall quality of the analysis.

Mapping your sources

The first step is to find out where the necessary data is located, starting by distinguishing between three types of sources:

Source type:

  1. Internal: Data produced by the organization. E.g.: Excel databases, CRM, registration forms, accounting

  2. Partners: Data shared as part of collaborations. E.g.: Data exchanged with another partner company, supplier, or customer

  3. External: Public or commercial data. E.g.: Open data, INSEE figures, studies de fondations

This inventory allows us to identify gaps, duplicates, inconsistencies, or areas requiring attention.

Qualify the existing data

For each source, you must check:

  • The type of data (text, numeric, date, etc.)

  • The format (structured or free-form)

  • Quality: recency, reliability, consistency

  • Level of confidentiality (e.g., personal data)

Example: if an organization collects the birth dates of its members, but some are incomplete (missing month or year), this can lead to inaccurate statistics.

Use a Data Catalog to formalize your approach

A Data Catalog is a documented inventory of available data. It allows you to:

  • Identify useful datasets

  • Specify their structure (fields, units, definitions)

  • Identify the person responsible for each dataset

  • Track their status (raw, cleaned, validated)

For example, in a small organization, this catalog can be a simple spreadsheet listing all the files used with their function and the person responsible for them.

Understanding the data lifecycle

Responsible data collection must take into account the lifecycle of the data:

  • Update frequency (weekly, monthly, yearly, etc.)

  • Retention period

  • Deletion or archiving rules

  • Change log and history

This simplifies technical control, storage optimization, and compliance with data protection regulations.

A sober and ethical collection

Too much poorly targeted data can:

  • Overload teams

  • Weigh down work tools

  • Impair the comprehension and readability of analyses

  • Undermine user confidence

Example: A form that is too long to register for a workshop may discourage participants and make it difficult to follow up on the action.

Responsible collection means collection that is:

  • Justified: we know why we collect data

  • Limited to what is essential: only what is useful, nothing more

  • Transparent: we provide information on the intended use

  • Sustainable: easy to maintain over time

In summary

  1. Map sources

  2. Identify what you have and what is missing

  3. Qualify data

  4. Check its usability, reliability, and consistency

  5. Create a data catalog

  6. Document and organize data for the entire team

  7. Manage the lifecycle

  8. Control storage time and access frequency

  9. Adopt responsible collection practices Streamline systems, build trust, and improve efficiency

Data collection is a technical step, but also a strategic one: it reflects the organization's choices and its vision of useful data.