Collecting Data in a Useful and Responsible Way
DATA
Lorena Aguiar Franjoux
1/7/2026


Why collecting is not just a formality
This article is part of a Data & Impact series that brings together several articles on how to use data more responsibly. In the last article, we explain why and how to define your problem in order to get more value out of your data. If you haven't read it yet, feel free to check it out by clicking here.
Once the question has been defined, it is time to identify, select, and organize the relevant data. But be careful: collecting everything “just in case” can quickly become unproductive. It consumes resources, complicates processing, and can undermine the overall quality of the analysis.
Mapping your sources
The first step is to find out where the necessary data is located, starting by distinguishing between three types of sources:
Source type:
Internal: Data produced by the organization. E.g.: Excel databases, CRM, registration forms, accounting
Partners: Data shared as part of collaborations. E.g.: Data exchanged with another partner company, supplier, or customer
External: Public or commercial data. E.g.: Open data, INSEE figures, studies de fondations
This inventory allows us to identify gaps, duplicates, inconsistencies, or areas requiring attention.
Qualify the existing data
For each source, you must check:
The type of data (text, numeric, date, etc.)
The format (structured or free-form)
Quality: recency, reliability, consistency
Level of confidentiality (e.g., personal data)
Example: if an organization collects the birth dates of its members, but some are incomplete (missing month or year), this can lead to inaccurate statistics.
Use a Data Catalog to formalize your approach
A Data Catalog is a documented inventory of available data. It allows you to:
Identify useful datasets
Specify their structure (fields, units, definitions)
Identify the person responsible for each dataset
Track their status (raw, cleaned, validated)
For example, in a small organization, this catalog can be a simple spreadsheet listing all the files used with their function and the person responsible for them.
Understanding the data lifecycle
Responsible data collection must take into account the lifecycle of the data:
Update frequency (weekly, monthly, yearly, etc.)
Retention period
Deletion or archiving rules
Change log and history
This simplifies technical control, storage optimization, and compliance with data protection regulations.
A sober and ethical collection
Too much poorly targeted data can:
Overload teams
Weigh down work tools
Impair the comprehension and readability of analyses
Undermine user confidence
Example: A form that is too long to register for a workshop may discourage participants and make it difficult to follow up on the action.
Responsible collection means collection that is:
Justified: we know why we collect data
Limited to what is essential: only what is useful, nothing more
Transparent: we provide information on the intended use
Sustainable: easy to maintain over time
In summary
Map sources
Identify what you have and what is missing
Qualify data
Check its usability, reliability, and consistency
Create a data catalog
Document and organize data for the entire team
Manage the lifecycle
Control storage time and access frequency
Adopt responsible collection practices Streamline systems, build trust, and improve efficiency
Data collection is a technical step, but also a strategic one: it reflects the organization's choices and its vision of useful data.

