Analysing Wastewater Infrastructure Design Data: Arranging, Cleaning, and Presenting Information

In wastewater infrastructure design, data analysis is the critical bridge between raw field measurements and informed engineering decisions. As outlined in the competency standard “Design Wastewater Collection and Treatment Infrastructure” (Unit Code: CON/OS/CET/CR/09/6A), the analysis of wastewater infrastructure design data involves three essential stages: arranging data based on themes, cleaning data according to best practices, and presenting data effectively for decision-making.

This comprehensive guide explores each stage of the data analysis process, drawing on international best practices and established methodologies to ensure wastewater infrastructure projects are built on reliable, well-organised, and clearly communicated information.


1. Arranging Data and Information Based on Various Themes

The first step in analysing wastewater infrastructure design data is organising raw information into coherent, meaningful categories. Thematic arrangement transforms disparate data points into structured knowledge that supports engineering analysis and design decisions.

1.1 The Thematic Analysis Framework

Thematic analysis is a widely adopted method for structuring complex datasets in wastewater and environmental studies. Research on wastewater management challenges in East Africa demonstrates how data can be effectively arranged into five major themes: current state of wastewater management, challenges associated with wastewater management, environmental and public health implications, innovative strategies and emerging opportunities, and the role of policy and governance reforms . This structured approach enables engineers to systematically address each aspect of the design problem.

Similarly, studies on wastewater surveillance dashboards have identified key thematic areas including development processes, dashboard content, and implementation challenges, with sub-themes covering aspects such as stakeholder identification, data transparency, and communication strategies .

When organising wastewater infrastructure design data, engineers should consider the following thematic categories:

Hydraulic and Hydrological Data:

  • Dry weather flows and diurnal patterns
  • Wet weather flows and storm responses
  • Peak flow events and extreme conditions
  • Infiltration and inflow contributions

Wastewater Quality Characterisation:

  • Organic strength parameters (BOD, COD)
  • Physical parameters (TSS, pH, temperature)
  • Nutrient parameters (nitrogen, phosphorus)
  • Biological parameters (pathogen indicators)

Service Area and Population Data:

  • Current population served
  • Growth projections and design period
  • Land use classification
  • Industrial and commercial contributions

Existing Infrastructure:

  • Asset inventory and condition
  • Treatment plant components and capacities
  • Collection system network
  • Pump station locations and specifications

Regulatory and Environmental Data:

  • Discharge permit requirements
  • Water quality standards
  • Environmental constraints and sensitive areas
  • Climate resilience considerations

1.2 Arranging Data from Multiple Sources

Wastewater infrastructure design data typically comes from diverse sources, including field surveys, laboratory analyses, institutional records, and stakeholder consultations. The FAO’s AQUASTAT database methodology demonstrates best practices for arranging data from multiple sources, focusing on annual volumes at national level to facilitate integration with water resources accounts .

For engineering design, data should be arranged to support specific design calculations and decisions:

Data ThemeKey VariablesDesign Relevance
Flow DataADWF, Peak flows, I&IPipe sizing, pump selection, plant capacity
Quality DataBOD, COD, TSS, NH₃Treatment process selection, unit sizing
Population DataCurrent, projected, servicedDesign period, capacity planning
Site ConditionsTopography, soils, groundwaterStructural design, construction methods

1.3 Structuring Data for Analysis

Proper arrangement facilitates subsequent analysis and validation. Data should be organised at appropriate spatial and temporal resolutions, with clear definitions and metadata attached to each variable. The Braun and Clarke thematic analysis coding process provides a structured approach to identifying, defining, and organising themes from complex datasets .


2. Cleaning Data as per Best Practice

Data cleaning—the process of identifying and correcting errors, inconsistencies, and outliers—is essential for ensuring the reliability of wastewater infrastructure design data. The FAO emphasises that “since data come from many different sources, inconsistencies may occur over the course of time. Therefore, each time new data become available a detailed review and validation of all data, both new ones obtained and the ones already available in the database, is essential” .

2.1 Data Validation Principles

Best practices for data validation in wastewater engineering include the following principles:

Consistency Across the Wastewater Cycle:
Data should be validated against the logical progression of the wastewater cycle, with production (assumed to be the highest volume), followed by collection, treatment, and direct use (the lowest volume). This ensures that reported values are physically plausible .

Temporal Consistency:
Unless justified by factors such as population decline, annual wastewater volumes should show incremental increases over time as water use rises. Validation against historical data (1958–2012) helps identify anomalies .

Cross-Verification:
Flow data should be cross-verified with rainfall data and other relevant environmental measurements. Sensor drift checks and statistical assessment of anomalies are essential components of quality assurance.

2.2 Identifying and Removing Outliers

Outlier detection is a critical component of data cleaning. In wastewater measurement datasets, outliers can arise from sensor misalignment, debris, structural elements in the flow path, or maintenance activities.

The patent methodology for cleaning wastewater measurement data describes a systematic approach:

Dispersion Graph Analysis:
Plotting the mean values of measurement data on a histogram dispersion graph helps identify the normal distribution pattern. Data blocks falling outside the normal dispersion range are identified as outlier blocks and removed . For example, in a capacity dataset, the majority of data may centre around 6–8% capacity, with outlier blocks appearing at 20% capacity, indicating incorrect sensor readings .

Statistical Filtering:
The first cleaned dataset should be further filtered to remove:

  • Periods of null data
  • Implausible values (e.g., negative measurements due to sensor miscalibration)
  • Data outside physically possible ranges

Exclusion Event Identification:
Exclusion events such as blockages, sensor changes, asset cleaning, pump set changes, or broken sensors can render periods of data inconsistent with the rest of the dataset. These periods should be identified and removed, with the affected data labelled for potential manual review .

2.3 Addressing Data Gaps and Implausible Readings

When data gaps exist between plausible readings, interpolation can be used for isolated implausible data points. However, prolonged periods of implausible data should be removed rather than imputed .

The pre-processing methodology from wastewater surveillance studies demonstrates a similar approach:

  • Flagging data points below the limit of detection
  • Creating indices for lab sites and sampling locations
  • Identifying potential outliers through automated flagging procedures 

2.4 Regulatory and Methodological Considerations

Data cleaning must also address reporting inconsistencies. For global wastewater data, studies have found that “the proportion of safely treated wastewater remains strongly uneven between geographic regions and income groups,” and “considerable challenges in assessing the state-of-affairs remain because of its terminology, informal status, and the limited availability (and usefulness) of reported reuse volumes or areas” .

The WHO methodology for tracking domestic wastewater acknowledges that data quality has improved significantly since 2018, but country estimates occasionally show “significant variability in the estimates between reporting years – most often due to new, revised, or reinterpreted data.” Future methodologies aim to incorporate historical datasets to generate more consistent time series estimates .


3. Presenting Data Based on Various Themes

The final stage of data analysis involves presenting cleaned and organised data in formats that support effective decision-making. Well-designed data presentation enables stakeholders to understand the information, identify trends and patterns, and make informed decisions about wastewater infrastructure design.

3.1 Principles of Effective Data Presentation

Studies of wastewater surveillance dashboards have identified several core principles for effective data communication:

Clarity and Interpretability:
Dashboards should present metrics and graphs that are immediately understandable to diverse audiences, including public health officials, engineers, and community stakeholders. Data transparency is essential, with the ability for users to explore and understand the underlying information .

Timeliness and Automation:
Automated workflows using cron jobs and scheduled data updates ensure that dashboards reflect the most current information. Data storage in fast-loading binary formats (such as Feather) can reduce page load times .

Integration of Multiple Data Streams:
Combining wastewater surveillance data with clinical case data enables direct comparison and contextualisation. For example, the SEARCH dashboard enables “direct comparison of wastewater viral loads and clinical case counts, with the raw estimated viral loads measurements shown for user-interpretability” .

3.2 Thematic Data Visualisation

Data should be presented based on the thematic categories established during the arrangement phase:

Flow and Hydraulic Data:

  • Hydrographs showing diurnal and seasonal flow patterns
  • Peak flow events overlaid with rainfall data
  • Flow duration curves for capacity analysis
  • Scatter plots comparing dry weather and wet weather flows

Water Quality Data:

  • Time series plots of key parameters (BOD, COD, TSS, nutrients)
  • Box plots showing variability and distribution
  • Trend lines with confidence intervals
  • Threshold lines showing permit limits

Population and Service Area Data:

  • Projection curves and growth scenarios
  • Choropleth maps showing population density
  • Land use classification charts
  • Service area boundary maps

Infrastructure Assessment:

  • Asset inventory tables with condition ratings
  • GIS maps showing infrastructure location and condition
  • Histograms of pipe ages and material types
  • Capacity utilisation charts

3.3 Dashboards for Integrated Data Communication

Interactive dashboards have become essential tools for communicating wastewater surveillance and design data. The study of wastewater-integrated pathogen surveillance dashboards describes a centralised data aggregation approach where multiple data pipelines are established for data storage, wrangling, and standardised analysis, with custom-built web dashboards enabling public release .

Key features of effective wastewater data dashboards include:

Multi-Scale Capability:
Dashboards should be effective “across scales, computing architectures, and dissemination strategies, and provides an adaptable model to incorporate additional pathogens and epidemiological data” .

Open Data Access:
“Open data access via a dashboard allows for transparent public health guidance and intervention and empowers individuals to make educated decisions about their health and behaviour” .

Standardised Analyses:
Dashboards should include standardised analytical outputs such as:

  • Growth rate calculations using logistic models
  • Smoothing of viral load curves (Savitzky-Golay filter)
  • Moving average filters for trend visualisation
  • Normalised and binned data displays

Clinical Integration:
“Clinical surveillance data aggregation, and more generally the technical resources and expertise needed to establish and maintain a dashboard, were more widely applicable for wastewater-integrated public health surveillance dashboards” .

3.4 Thematic Presentation of Findings

When presenting findings from wastewater infrastructure design data analysis, engineers should structure their reports around the established themes, providing:

  1. Context and Background: The current state of wastewater infrastructure, challenges, and opportunities
  2. Data Collection and Analysis Methods: How data was arranged, cleaned, and validated
  3. Key Findings by Theme: Hydraulic data, quality data, population projections, infrastructure assessment
  4. Visual Comparisons: Side-by-side presentation of alternatives and scenarios
  5. Recommendations: Data-driven recommendations for design decisions

4. Conclusion

The analysis of wastewater infrastructure design data—arranging based on themes, cleaning as per best practice, and presenting based on thematic categories—is essential for reliable and defensible infrastructure design. By organising raw data into meaningful categories, engineers can identify patterns and relationships that inform design decisions. Systematic data cleaning using established methodologies ensures that subsequent analyses are based on accurate, reliable information. Finally, effective presentation of cleaned data enables clear communication of findings and supports informed decision-making by stakeholders.

Key takeaways for engineering practice:

  1. Arrange data thematically to support systematic analysis and design decision-making
  2. Clean data systematically using validation against logical consistency, outlier detection, and exclusion event identification
  3. Present data clearly using interactive dashboards, visualisations that highlight key patterns, and transparent, open data access
  4. Integrate diverse data streams to provide a comprehensive view of the wastewater system
  5. Use automated workflows to ensure data remains current and accessible

By following these best practices, engineers can ensure that wastewater infrastructure design is founded on reliable, well-organised, and clearly communicated data, leading to projects that are safe, sustainable, and fit for purpose.

Scroll to Top