This study aims to critically discuss the data quality and practical utility of open-ended questions for classifying organisational economic activities in survey research. The research addresses two primary questions. First, it examines whether the length of responses to open-ended questions about the economic activities of organisations is associated with the semantic quality and richness of the information provided. Specifically, it investigates whether longer responses can be considered a meaningful indicator of the overall quality and informativeness of the data collected. Manual coding was used to derive a dichotomous variable representing response quality. Second, the study makes use of Structural Topic Modeling (STM) to analyse whether the length of responses varies systematically across different sectors of economic activities. The main findings indicate that the length of open-ended responses is not always correlated with high semantic quality. Both manual coding and STM analysis reveal that complex organisational activities tend to generate longer answers to provide better descriptions, while simpler ones can be limited in terms of characters. The study contributes to survey methodology, showing how open-ended questions are effectively able to capture nuanced organisational practices without raising respondent’s burden, thus enriching the precision and flexibility of economic classification. A key practical implication of these findings is that it could be effective to impose a relatively low word limit on open-ended questions about organisational activities, without impacting significantly the response quality.

From words to categories: data quality and the methodological value of open-ended questions in classifying economic activities

Novello N.
2025-01-01

Abstract

This study aims to critically discuss the data quality and practical utility of open-ended questions for classifying organisational economic activities in survey research. The research addresses two primary questions. First, it examines whether the length of responses to open-ended questions about the economic activities of organisations is associated with the semantic quality and richness of the information provided. Specifically, it investigates whether longer responses can be considered a meaningful indicator of the overall quality and informativeness of the data collected. Manual coding was used to derive a dichotomous variable representing response quality. Second, the study makes use of Structural Topic Modeling (STM) to analyse whether the length of responses varies systematically across different sectors of economic activities. The main findings indicate that the length of open-ended responses is not always correlated with high semantic quality. Both manual coding and STM analysis reveal that complex organisational activities tend to generate longer answers to provide better descriptions, while simpler ones can be limited in terms of characters. The study contributes to survey methodology, showing how open-ended questions are effectively able to capture nuanced organisational practices without raising respondent’s burden, thus enriching the precision and flexibility of economic classification. A key practical implication of these findings is that it could be effective to impose a relatively low word limit on open-ended questions about organisational activities, without impacting significantly the response quality.
2025
File in questo prodotto:
File Dimensione Formato  
s11135-025-02348-8.pdf

accesso aperto

: Publisher’s version
Dimensione 1.39 MB
Formato Adobe PDF
1.39 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1296186
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact