The process of cartographic generalization is used to produce a harmonized picture at different scales of geospatial features.
Generalization is an essential part of any cartographic production process and is, generally, a process that is still at least partly, manually driven. The move to ENC charting has enabled some degree of automation of chart creation at different scales through the development of features for managing “scale-dependent“ features.
Database driven production systems, able to store the data for multiple charts in a single database instance, are then able to reuse features for different charts reducing the need for manual intervention.
The issue remains though, that many features require extensive manual editing in order to produce generalized products which are acceptable to both cartographer and end-user.
There is, therefore, a large potential to increase the efficiency of any data production system by automating generalization into the chart production process as far as possible.
From a generalization point of view, bathymetric content is probably the most challenging as it is both one of the most (if not, THE most) important and safety-critical elements of the navigational chart and also one of the most complex and subtle in the practice of marine cartography.
Bathymetric content is composed primarily of the following features and attributes:
- Individual soundings along with attribution containing various quality parameters
- Areas delimiting specified ranges of depths
- Contours denoting lines of equal-depth (in practice these are the perimeters of the depth area features in the previous point)
- Value of sounding attributes on individual features, rocks, wrecks, etc
The collection, processing, and compilation of bathymetry data are the most labor-intensive and safety-critical of all phases of marine cartography and the resultant surface of depth areas and sounding arrays forms the essential surface for navigation which is presented to the chart’s end user.
Bathymetric source data is obtained from the raw, dense, survey information gathered by sensors from survey vessels and aircraft. Raw survey data is cleaned, validated, and harmonized into a large, dense set of candidate source depths soundings.
From these surfaces, contours and a large set of candidate spot soundings are derived. Features are selected from the available source and sub-processes such as thinning, critical sounding designation and deconfliction with existing sources all take place and are and used to compile the resultant chart, whether by new edition (replacement) or update. From a generalization point of view, the selection of “appropriate” depth vectors from the available source, and the adaptation of large scale line-work are the core tasks.
This selection must be clear, consistent, and safe for the end-user of the chart.
Generalization in terms of marine charts is often used to define cartographic generalization, the stage of viewing the underlying geospatial data. In this model, the representation of the chart features is transformed via a set of fixed generalization operators into viewable representations, the chart symbols. In 1988 McMaster and Shea defined a conceptual model of generalization grouped into:
1. Why — the basis for understanding why generalization takes place
2. When — establishing when particular features require generalization
3. How — the exact process of generalization, such as simplification, aggregation, displacement, and elimination.
The process of generalization of marine geospatial data making up charts and ENC data can be viewed in this light and used to then define where in the compilation process Artificial Intelligence and Machine Learning (AI/ML) can make a positive contribution, for example, should AI/ML define “when” a sounding, contour or obstruction is generalized or “how” it is generalized in terms of its representation in each chart?
The primary difficulty of generalizing depths is the subjective nature of what constitutes a “safe” and informative selection of depth information for use by the end-user. According to IHO S-4, a generalization of depth information should result in a blend of an informative, shoal-biased, and context-sensitive selection of depths at a smaller scale. There are two critical tests referenced in IHO S-4 (B-410) which are crucial to any consideration of the quality of a generalization process, the triangle test and edge test which define a shoal-biased triangulation of soundings (and, potentially, depth areas and other features with bathymetric content) which the mariner can use to interpolate depths in relation to their individual vessel draught and safety margins.
The IHO mechanism described in IHO S-4 assures the mariner safe passage between soundings by eliminating shoal source soundings between, or on the edge of geodesics joining adjacent soundings (recent work by the University of New Hampshire expanded on these tests’ implementation heavily by adding linear contours and some contextual features to the test’s domain — this does not change the test in principle but enhances its applicability). The following image shows an example triangulation of multiple usage band data together with depth contours and illustrates some of the generalization techniques and the constraints placed upon them by the triangle/edge tests. In it, coastal and approach soundings are in blue and red and the triangulation shows the consistent validation of the triangle test in the soundings selected for inclusion in the generalized coastal chart. The green depth contours are generalized from the approach chart and show how the simplified geometry harmonizes with the selected soundings and simplify the detail of the approach ENC data.
The other points in S-4 (and which may be enhanced/reflected in individual member state guidance) are the requirement to take into account the density of soundings in respect of seabed morphology and proximity to other contextual features such as hazards and shorelines, all within the constraints of feature/vertex density to reduce the clutter of the resulting chart.
The interconnected nature of bathymetric elements can be seen in the following diagram which highlights just two of the key features making up the complex interrelationships in a navigational chart:
Generalization of depth is not dealt with exhaustively in IHO S-4, nor in other cartographic guidance within the existing standards base which leaves member state producers to develop their own detailed guidance and styles. Indeed many ENC datasets are digitized from historical paper charts and therefore retain the generalization styles and features in place for many years.
Although, as previously stated, a large body of knowledge exists inland mapping in respect of generalization, little has been written specifically on the topic of marine cartographic generalization, nor of the bathymetric element of that process. Measurements like Topfer’s ratio equating feature density at different resolutions are useful and processes such as the Douglas-Peucker algorithm for smoothing linear features require extensive adaptation for use within the safety-critical processes in marine charting.
So, the difficulties of automating generalization (and specifically bathymetric generalization) have traditionally been :
1. Some aspects of generalization can be highly subjective and resist rigid rules-based formulation. Within marine cartography decluttering of charts is of prime importance and aesthetic judgments have played a strong role in the creation of high-quality products for many years.
2. Marine cartography places strict safety-related rulesets around generalization due to the extraordinary amount of legal liability inherent in the product. Some examples of this are the generalization of obstructions and hazards relevant to IMO functions in the ECDIS and generalization of coastline/depth areas to ensure safety margins are maintained and reproduced. This requirement impacts on the ability to reuse many terrestrial mapping generalization techniques. Bathymetric data, shoals, obstructions, and contours are features on which navigational decisions are made and where mistakes and omissions can result in profound safety issues, carrying large liabilities for producing nations.
3. How new / changed information is harmonized with existing information is a characteristic specific to marine cartography because of the large amount of uncertainty involved and the cost of acquisition of raw data.
4. There is an implicit spatial and semantic interaction between features in a chart. So, for instance, lateral buoyage close to shore should not be absorbed by the seaward generalization of coastline (via its underlying depth areas and land areas). Bathymetric generalization must take into account seabed morphology when determining the appropriate density of included soundings, it must also take into account proximity to the coastline, significant hazards, and navigational context (e.g. when determining critical soundings in confined approaches). At all times the topology and relationships between features in the datasets need to be maintained.
5. The selection of appropriate bathymetric data from the survey for use in multiple scales must be consistent with neighboring charts and meet the concrete tests defined in procedures (and IHO standards).
ENC, the primary cartographic product under SOLAS refines the concept of charts somewhat. ENC is a database of geospatial features used to render a chart image on an ECDIS dependent on a number of user-defined parameters according to fixed international standards for content and portrayal. ENC also has a very rigid topological structure and tight validation rules which only permit certain geospatial relationships and feature/attribute combinations. Real-world features are encoded from a number of sources and expressed via the S-57 object/attribute catalogs using a style derived mainly from the IHO Use of the Object Catalogue. This language of features and attributes is symbolized by an ECDIS for display but also for alarm and indication behavior, the safety-critical functions of the navigation system. Bathymetric data is the most important feature class within the chart with many of the IMO mandated safety-critical functions determined from features with bathymetric content.
From an ENC perspective, bathymetric data is held within
· Sounding Arrays (SOUNDG)
· Depth Areas (DEPARE) (+Dredged Areas DRGARE) with (DRVAL1/DRVAL2) attributes. Associated depth contours (DEPCNT) are linked with DEPARE features. Additionally routing measures such as deepwater routes and fairways have depth attribution within them which should be considered.
· VALSOU attributes on hazards, subsurface obstructions, and wrecks.
All these features make up the bathymetric picture of the ENC and are relevant to generalization processes. Bathymetric cartographic generalization, therefore, in ENC terms needs to preserve the safety-critical nature of certain features as well as delivering a de-cluttered and intuitive presentation of the bathymetric features at all scales. For presentation at smaller scales, therefore, a harmonized approach across all relevant feature types is called for.
There is much work on automated cartographic generalization already established within the terrestrial mapping domain, mainly concerned with the definition of symbology generalization operators, rule-based transformations of feature representation, and their integration together. Symbology generalization for ENC however is restricted to the S-52 visual library (so, for instance, line weights cannot be adjusted, nor colors).
This places a tight vocabulary around what generalization processes are definable and how they should be implemented and suggests an approach based on the vector content of the features and attributes rather than from their appearance on screen
The proposed system platform is shown in the following diagram:
In the proposed system the following steps take place:
1. The input training ENC data is split into its component features. Other input data that may be relevant, such as source bathymetric surfaces and soundings and chart metadata will be digitized into the schema within the system. At this point, an automated process determines the extent and content of the existing generalization within the input cells. This is used to form the generalization labels according to the model configuration.
2. Features that are linked (for instance coastline (COALNE) which is coincident with Land Areas and 0m Depth Areas) are represented as single instances with combined attribution to maintain their validity (e.g. to avoid a depth area being generalized and not matching the appropriate depth contour). From a generalization perspective, it is the underlying skin of the earth features and points soundings/bathymetric attribution which require generalization, not the coastline features.
3. A model (selected from a number of candidates) is trained, tuned, and used to predict generalized forms of the input features. These are formed from the predictions by using values generated by the models (i.e. selections of soundings from source or inclusion/exclusion instructions based on chart context (e.g. controlling depths)) and parameters that can drive line smoothing algorithms.
4. A subset of the component features is used to test the model predictions.
5. The features are re-assembled into a candidate generalized ENC.
6. This ENC can then be evaluated against
a. Validation rules, IHO S-58, and IHO S-57 UOC.
b. IHO S-4 triangle/edge test, national policy tests.
c. Feature density and compilation scale assessments
d. Safety criteria, safety-critical features as defined under IMO SOLAS
e. The cartographic judgment of the effectiveness of the generalization
7. Feedback from the outputs is used to tune the model parameters and modify the feature data designs and labels.
As noted in this section, a loss function defining a measure of generalization based on the many algorithms specific to bathymetric data and attribution and factors relevant to bathymetric generalization is used by the system to progressively improve the generalization processes used to form the results.
It is crucial to ensure as large a training dataset as possible is available to the system — in bathymetry terms, this should also contain the processed source data from which soundings/contours are derived and which form the decision space for the majority of the soundings/contours.
The success of such a system will be heavily dependent on the availability of a critical mass of representative training data at all scales and the generalization processes defining the input ENC cells.
This pipeline technology has the following benefits:
1. It is platform neutral and uses only open source components, Java, Python, PostgreSQL/PostGIS and can be adapted to other spatial database solutions.
2. It allows any AI/ML model to be interfaced with its open schema without any proprietary restrictions whatsoever. This allows for maximum flexibility in choice, tuning, and configuration of the machine learning model, crucial given the number and variation available to the project.
3. The open architecture allows for multiple algorithms to be engineered to generate line features (and associated polygons). This means that depth contour/depth area generalization can be accomplished by using machine learning to learn “parameters” such as offsets from existing contours, inclusion/exclusion of shoals, and identification of critical depths, and then the actual algorithms generating the features can be deterministic rather than defined by the machine learning model.
4. The system would not be limited to feature generalization only. Assessment of for example SCAMIN application (e.g. selection of SCAMIN values on safety-critical soundings) and safety classification of changes to ENCs would be alternative use cases for such an AI/ML adapted system.
5. These components also allow “hybrid” ENCs to be created where some elements are generalized whereas others are untouched. This has the advantage of allowing the project to progress iteratively with more complex generalization included when simple cases are initially proven.
6. The Nautilus process preserves the relationships between the features and their topology so that a standards conformant ENC can be built for full validation/inspection after the prediction processes have run. The system allows for training validation to take place for the classification process to complete.
7. The maximum flexibility of the input data and its labels is achievable.
Original post: https://medium.com/@haucemsadki/cartographic-generalization-with-a-i-and-machine-learning-db65b52f45c4