Chapter 2 — Obtaining and Preparing Geodata (SHP, OSM, GeoJSON)

Obtaining and Preparing Geodata

SHP, OSM, GeoJSON and the First Critical Steps in Map Production


If Chapter 1 explained what a vector map is,
this chapter explains where the real work begins.

Professional map production does not start in Illustrator.

It starts with raw geodata.

And raw geodata is almost never clean.


Where to Get Vector Geodata

There are many sources of geographic data. The key is knowing which to trust — and how to combine them.

1. OpenStreetMap (OSM)

One of the most widely used global datasets.

Advantages:

  • Global coverage

  • Frequent updates

  • Free and open

Limitations:

  • Inconsistent quality across regions

  • Community-maintained (not always standardized)

  • Often overloaded with unnecessary attributes

You can extract data via:

  • Direct regional downloads

  • Overpass API

  • Pre-packaged extracts (Geofabrik, etc.)


2. Government Open Data Portals

For many countries, official data is more reliable than OSM.

Typical datasets:

  • Road networks

  • Administrative boundaries

  • Postal codes

  • Land parcels

  • Building footprints

Government data is often:

  • More standardized

  • Better classified

  • Topologically cleaner

But:

  • Licensing must always be checked.


3. Commercial Data Providers

In some projects (especially infrastructure or logistics),
commercial datasets may be necessary.

They are:

  • Structured

  • Professionally maintained

  • Often expensive

Use them when precision matters.


Never Use Only One Source

This is a critical rule.

Different sources contain different strengths:

  • OSM → good street detail

  • Government datasets → accurate boundaries

  • Satellite imagery → reality check

  • Local municipal data → high precision

Professional production often requires merging multiple sources into one coherent structure.


Common Geodata Formats

Before cleaning, you must understand the format you are importing.


Shapefile (SHP)

A classical GIS format consisting of multiple files:

  • .shp — geometry

  • .shx — geometry index

  • .dbf — attribute table

  • .prj — projection

  • sometimes .cpg, .qix, etc.

Important:
Shapefile is not a single file.

It is a structured dataset.

Advantages:

  • Industry standard

  • Widely supported

  • Stable for production workflows

Limitations:

  • Attribute name length restrictions

  • No complex topology support

  • Multiple-file structure


GeoJSON

A text-based format.

Advantages:

  • Single file

  • Human-readable

  • Ideal for web usage

Limitations:

  • Large file sizes

  • Not optimized for heavy production

  • No indexing

GeoJSON is excellent for web mapping.
For large production maps, it is usually an intermediate format.


Step 1: Verify the Coordinate Reference System (CRS)

Before doing anything else:

Check the CRS.

Common coordinate systems:

  • WGS84

  • Web Mercator

  • Local projected systems

If you combine datasets with different CRS without reprojecting them properly:

  • Roads shift

  • Boundaries misalign

  • Buildings float

  • Printing becomes impossible

Golden rule:

Never work in mixed projections.

Always align CRS at the beginning.


Step 2: Inspect Raw Data Quality

Raw datasets almost always contain problems.

Typical issues:

  • Duplicate geometries

  • Broken line segments

  • Micro-gaps in polygons

  • Overlapping boundaries

  • Incorrect classifications

  • Excessive attributes

  • Redundant POI points

Before design begins, the dataset must be cleaned.


Step 3: Remove What You Do Not Need

One of the biggest beginner mistakes:

Keeping everything.

For a print city map, you usually do NOT need:

  • Every POI

  • Service driveways

  • Temporary objects

  • Internal parcel divisions

  • Minor footpaths (depending on scale)

Overloaded data produces:

  • Heavy files

  • Slow processing

  • Visual chaos

Cartography is selective.

Remove noise early.


Step 4: Clean and Standardize Attributes

Attribute tables often contain:

  • Inconsistent naming

  • Mixed language labels

  • Empty fields

  • Redundant columns

For example:

One dataset may use:

  • highway = primary

Another:

  • road_type = major

Standardize classifications early.

Otherwise your layer hierarchy later becomes unmanageable.


Step 5: Correct Street Network Structure

Even before formal topology cleaning, inspect:

  • Broken road connections

  • Misaligned intersections

  • Duplicated line segments

  • Incorrect direction flags

The street network is the structural backbone of most city maps.

If the graph is broken, everything built on top will be unstable.

We will cover full graph logic in Chapter 3.


Step 6: Save Clean Intermediate Versions

Professional workflow always includes version control.

Example structure:

city_raw_v1.shp
city_clean_v2.shp
city_standardized_v3.shp
city_generalized_v4.shp

Never overwrite the original dataset.

You may need to return to it later.


Common Beginner Mistakes in Data Preparation

  • Mixing coordinate systems

  • Editing without saving versions

  • Ignoring attribute cleanup

  • Trusting one data source

  • Importing directly into Illustrator without GIS cleaning

Illustrator is not a GIS cleaning tool.

Data must be prepared before design begins.


What Clean Geodata Should Look Like

Before moving to the next stage, your dataset should:

  • Have a unified CRS

  • Contain only relevant object types

  • Have standardized classifications

  • Be free from obvious duplicates

  • Be topologically consistent

  • Have manageable file size

If this stage is done correctly,
the rest of the workflow becomes stable.

If not — you will suffer later.


Summary

Obtaining geodata is easy.

Preparing geodata is professional work.

Professional vector map production begins with:

  • Careful source selection

  • CRS alignment

  • Data cleaning

  • Attribute standardization

  • Structural validation

Only after this foundation is solid
can real cartographic design begin.


Next Chapter

Now that the data is clean,
we move to the structural backbone of every city map:

Chapter 3 — Street Network as a Graph (Nodes and Edges Explained)

Go to Start Page: Technology of Vector Map Production

Frequently Asked Questions

Where can I download shapefiles (SHP)?
Shapefiles are available from OpenStreetMap extracts, government open data portals, and commercial geodata providers.

What is CRS in GIS?
CRS (Coordinate Reference System) defines how geographic coordinates are mapped onto a flat surface. All layers must share the same CRS.

Can I skip data cleaning before design?
No. Raw geodata contains structural errors and unnecessary attributes that must be cleaned before production.

Is GeoJSON suitable for large print maps?
GeoJSON is ideal for web applications but usually not optimal for heavy large-format print production.

Table of contents

Chapter 1 — What Is a Vector Map?
Chapter 2 — Obtaining and Preparing Geodata (SHP, OSM, GeoJSON)
Chapter 3 — Street Network as a Graph (Nodes and Edges Explained)
Chapter 4 — Cartographic Layer Hierarchy and Visual Structure
Chapter 5 — Map Projections and Why Distortion Is Inevitable
Chapter 6 — Map Generalization and Scale Control
Chapter 7 — Vector Formats: SHP, GeoJSON, AI and PDF
Chapter 8 — Professional Map Production Workflow
Chapter 9 — Preparing a Vector Map for Print in Illustrator
Chapter 10 — Common Mistakes in Vector Map Production

Author: Kirill Shrayber, Ph.D. FRGS

I have been working with vector cartography for over 25 years, including GPS, GIS, Adobe Illustrator and other professional cartographic software.
Linkedin: https://www.linkedin.com/in/kirill-shrayber-0b839325/
Twitter: https://twitter.com/vectormapper
Wikipedia: https://meta.wikimedia.org/wiki/User:Vectormapper

Menu
NEED HELP? WE ARE ONLINE