Creating Custom Line Types for State Boundaries
with the GMAP Procedure and the SAS® Annotate Facility.

Steven First
Systems Seminar Consultants, Inc.
2997 Yarmouth Greenway Drive
Madison, WI 53711

 


Abstract

SAS/Graph® software through PROC GMAP and PROC GREMOVE provides an excellent way to produce choropleth maps for county, state, regional, or other boundaries. However because PROC GMAP draws adjacent polygons independently and as a result draws common segments between areas twice, it is not normally possible to choose different line types such as a dashed line for state boundaries, a dash-dot-dash pattern for international boundaries and so forth.

This paper will explore the combination of PROC GREMOVE, PROC GMAP, and the SAS annotate facility to produce US regional maps with state boundaries drawn only once in a dashed line type. The technique will produce maps closer to what cartographers prefer, and it can be applied to any regional type map.

Introduction

PROC GMAP along with SAS Institute provided maps provides a easy way to produce US maps with the state boundaries drawn in a solid line type. A typical PROC GMAP and it's output is shown below.

A Typical PROC GMAP program

data us;

input @15 state 2. @20 region 2.;

datalines;

ALABAMA 1 03

ARIZONA 4 00

. . . (Not all states shown)

WISCONSIN 55 05

WYOMING 56 02

;


 

pattern1 c=blue v=mempty;

pattern2 c=blue v=m3x;

pattern3 c=green v=m1n150;

pattern4 c=green v=m1x;

pattern5 c=red v=m2x;

pattern6 c=black v=m1n60;


 

proc gmap data=us map=maps.us;

choro state / nolegend levels=6;

id state;

title f=swiss c=red 'U.S. States';

run;

When showing PROC GMAP to a cartographer (who is too familiar with FORTRAN and not familiar enough with SAS), I was asked the following question: "Do you mean that SAS can't draw the standard line types between states"? He was referring to dashed patterns for state boundaries, dash-dot-dash lines for international boundaries and so on. I had to admit that to my knowledge SAS could not.

After thinking about it for a while and looking for some GMAP option that could change line types and not finding one, I wondered why as well. I concluded that because PROC GMAP draws adjoining polygons (states) independently, and thus traces common lines twice usually in different directions, even if a dashed line could be somehow used, it would probably end up closer to a solid line because of the multiple passes.

This problem becomes even more apparent when creating regional maps. PROC GREMOVE easily removes common boundaries between states when creating regional maps producing a nice regional map, but no trace remains of the interior lines that were deleted.

While working on a large mapping project for a national retailer, one request was to produce a regional map and not only remove interior lines from the regional boundaries, but to also draw those regional boundaries in a bold wide solid line and to include state boundaries in a dashed pattern. Additionally I needed to label both the regional names and also the state names. The client had a hand-drawn map which they wanted to duplicate with PROC GMAP for marketing displays.

Since I had used the SAS/GRAPH annotate facility extensively, labeling the state and regional labels was no big problem. The annotate facility can also draw different line types, so I decided to try and use it to draw those interior lines.


A Pseudo-Regional Map

Pages 251-252 in the SAS/GRAPH User's Guide show a simple program to create the US regional map shown in Figure 2. The polygon area is a state boundary. By assigning a numeric region number to each state and also assigning a SAS format range and pattern to each response, the final map draws all states in a region using the same pattern and color. This is not a true regional map, but just a collection of adjacent states that happen to be drawn in the same pattern, thus it appears to be a regional map.

data us;

input @15 state 2. @20 region 2.;

datalines;

ALABAMA 1 03

. . . . . (Not all states are shown)

WYOMING 56 02

;

proc format;

value regfmt

0 ='Southwest'

1 ='West'

2 ='Central'

3 ='South'

4 ='Northeast'

5 ='Midwest';

run;

pattern1 c=blue v=mempty;

pattern2 c=blue v=m3x;

pattern3 c=green v=m1n150;

pattern4 c=green v=m1x;

pattern5 c=red v=m2x;

pattern6 c=black v=m1n60;

proc gmap data=us map=maps.us;

choro region /discrete;

id state;

format region regfmt.;

title f=swiss c=red 'U.S. Regions';

run;

A true regional map.

Pages 327-328 in the SAS/GRAPH User's Guide show a program similar to the one below which uses PROC GREMOVE to produce a true regional map like the one shown in Figure 3.

data newus; /* build a new dataset */

merge maps.us /* from sas provided map */

us ; /* and the regional data set */

by state; /* in state order */

n+1; /* n has orig. order for later */

run;

proc sort data=newus; /* sort the dataset */

by region n; /* into region, n order */

run;

proc gremove data=newus /* remove interior lines */

out=regions; /* output dataset */

id state; /* id variable */

by region; /* remove all within a region */

run;

proc gmap data=regions /* draw a map with region data */

map=regions; /* and use regions for map */

choro region /discrete; /* choro of region */

id region; /* ID variable */

format region regfmt.; /* format the regions */

title f=swiss color=red /* title */

'U. S. Regions';

run;

What my client wanted was a combination of the two maps, plus annotation of labels. A sample of the final desired output is shown in figure 4.

 

Building a regional map with interior lines.

The first step is to create the regional map shown above in figure 3 and save it. The program above called  it REGIONS, but it should probably have been a permanent SAS dataset.

Next we need to somehow determine which line segments from the MAPS.US map are contained in adjacent states. This is essentially the same type of logic PROC GREMOVE must go through, but instead of discarding the interior lines, we will draw them. It is important to note that we must not just discard common points, because a corner point may exist in up to four states which might not all be in the same region. Instead we have to look at a point and compare it with the next point in the map and consider them together to be a line and try to see if that line exists in two adjoining states.

To complicate things adjoining states don't start defining points at the same place, and they may not even be drawn in the same direction, since PROC GMAP can handle polygons drawn in either a clockwise or counter-clockwise direction.

Another luxury that PROC GMAP provides is to join the very last point in a segment to the first automatically. Since we are in effect drawing lines via annotate instead, we will be responsible to get back to the starting point. This task is not simplified by the fact that many points will be discarded along the way (including perhaps the starting point).

To simplify the presentation the map shown in figure 5 will include just four states, Arizona, Colorado, New Mexico, and Utah. This represents the worst case since a single point is included in all four states. The other points will exist in one, two, or three other states. These points will be contained in as many as four line segments drawn in different directions. The real trick here is to draw each line just once.

A print of the map dataset is shown below.

Arizona, Colorado, New Mexico, Utah

OBS SEGMENT STATE X Y

1 1 4 -0.19626 -0.08981

2 1 4 -0.22656 -0.08517

3 1 4 -0.27674 -0.05547

4 1 4 -0.27483 -0.05196

5 1 4 -0.27426 -0.05186

6 1 4 -0.27064 -0.05028

7 1 4 -0.27293 -0.04202

8 1 4 -0.26886 -0.03712

9 1 4 -0.26795 -0.03205

10 1 4 -0.26105 -0.02736

11 1 4 -0.26557 -0.01299

12 1 4 -0.26534 -0.01144

13 1 4 -0.26363 0.00529

14 1 4 -0.26062 0.00631

15 1 4 -0.25518 0.00323

16 1 4 -0.25069 0.01946

17 1 4 -0.18232 0.00810

18 1 4 -0.19364 -0.07121

19 1 8 -0.08255 0.05101

20 1 8 -0.08591 -0.00173

21 1 8 -0.09918 -0.00080

22 1 8 -0.18232 0.00810

23 1 8 -0.18088 0.01915

24 1 8 -0.17280 0.07755

25 1 8 -0.10768 0.07042

26 1 8 -0.08153 0.06835

27 1 8 -0.08229 0.05691

28 1 35 -0.09980 -0.00989

29 1 35 -0.10679 -0.08815

30 1 35 -0.15907 -0.08321

31 1 35 -0.15828 -0.08704

32 1 35 -0.18284 -0.08394

33 1 35 -0.18404 -0.09147

34 1 35 -0.19626 -0.08981

35 1 35 -0.19364 -0.07121

36 1 35 -0.18232 0.00810

37 1 35 -0.09918 -0.00080

38 1 49 -0.18232 0.00810

39 1 49 -0.25069 0.01946

40 1 49 -0.23405 0.10570

41 1 49 -0.22098 0.10312

42 1 49 -0.19596 0.09876

43 1 49 -0.19875 0.08147

44 1 49 -0.17280 0.07755

45 1 49 -0.18088 0.01915

Steps required to draw the interior lines:

Step 1. Pass through the dataset and save the first point in a SEGMENT, and add it to the end. Also add a variable called N indicating original point order so we can get back into correct order later.

DATA STATES;

SET AZCONMUT;

BY STATE SEGMENT;

IF FIRST.SEGMENT THEN

DO;

SAVEX=X;

SAVEY=Y;

SAVESEG=SEGMENT;

SAVEST=STATE;

RETAIN SAVEX SAVEY SAVESEG SAVEST;

DROP SAVEX SAVEY SAVESEG SAVEST;

END;

N+1;

OUTPUT;

IF LAST.SEGMENT THEN

DO; * REPEAT FIRST POINT AT END;

X=SAVEX;

Y=SAVEY;

SEGMENT=SAVESEG;

STATE=SAVEST;

N+1;

OUTPUT;

END;

KEEP STATE SEGMENT X Y N;

RUN;

PROC PRINT DATA=STATES; TITLE 'STATES BEFORE SORT';RUN;

Step 2. Delete the points obviously not in two states. Sort the map dataset into X, Y, STATE, and SEGMENT order. Pass through the dataset. If a X Y pair appears only once in the map, it obviously cannot be in two states, so that point is discarded. This step could be inserted at the beginning of step 3 and save a pass of the dataset.

PROC SORT DATA=STATES;

BY X Y STATE SEGMENT;

RUN;

DATA DUPS;

SET STATES;

BY X Y STATE SEGMENT;

IF FIRST.Y AND LAST.Y THEN DELETE; /* POINT IN ONLY 1 STATE */

RUN;

PROC PRINT DATA=DUPS;

TITLE 'POINTS IN AT LEAST TWO STATES';

RUN;

POINTS IN AT LEAST TWO STATES

OBS SEGMENT STATE X Y N

1 1 4 -0.25069 0.019464 16

2 1 49 -0.25069 0.019464 42

3 1 4 -0.19626 -0.089814 1

4 1 4 -0.19626 -0.089814 19

5 1 35 -0.19626 -0.089814 36

6 1 4 -0.19364 -0.071207 18

7 1 35 -0.19364 -0.071207 37

8 1 4 -0.18232 0.008103 17

9 1 8 -0.18232 0.008103 23

10 1 35 -0.18232 0.008103 38

11 1 49 -0.18232 0.008103 41

12 1 49 -0.18232 0.008103 49

13 1 8 -0.18088 0.019154 24

14 1 49 -0.18088 0.019154 48

15 1 8 -0.17280 0.077550 25

16 1 49 -0.17280 0.077550 47

17 1 35 -0.09980 -0.009887 30

18 1 35 -0.09980 -0.009887 40

19 1 8 -0.09918 -0.000798 22

20 1 35 -0.09918 -0.000798 39

21 1 8 -0.08255 0.051014 20

22 1 8 -0.08255 0.051014 29

Step 3. Pass through the dataset and assign each point to a line name made up of the two adjoining state numbers and the corresponding segment numbers.

For example points on the line between state 04 (Arizona) and state 49 (Utah) would be assigned to linename 04001-49001. Digits 1-2 and 7-8 are the state numbers and the remaining digits are the segment numbers. The key to eliminating duplicate lines is to always use the smaller numbered state first and disregard the linename which would be labeled 49001-04001. This is accomplished with the following program.

Three arrays are created to store the N variable, the STATE, and SEGMENT for up to four times that a point could be used. A variable called COUNT counts how many states a point is used in. When the last occurrence of a point is detected, two loops output the points and their N and LINENM values. Since the inner loop only loops through higher numbered states than the outer loop, each point is assigned only to one segment. A program listing and the print of the resulting dataset appears here.

DATA DUP2;

SET DUPS;

BY X Y STATE SEGMENT;

LENGTH LINENM $ 11;

ARRAY NARRY{*} N1-N8;

ARRAY STARRY{*} ST1-ST8;

ARRAY SGARRY{*} SG1-SG8;

IF FIRST.Y THEN

COUNT=0;

COUNT+1;

STARRY{COUNT}=STATE;

SGARRY{COUNT}=SEGMENT;

NARRY{COUNT}=N;

IF LAST.Y;

IF COUNT > 1;

DO I=1 TO COUNT - 1;

DO J= I+1 TO COUNT;

N=NARRY{I};

LINENM=PUT(STARRY{I},Z2.)||PUT(SGARRY{I},Z3.) || '-' ||

PUT(STARRY{J},Z2.)||PUT(SGARRY{J},Z3.);

OUTPUT;

END;

END;

KEEP X Y LINENM N;

RETAIN N1-N8 ST1-ST8 SG1-SG8;

RUN;

PROC PRINT DATA=DUP2;

TITLE 'DUP2 ';RUN;

DUP2

OBS X Y N LINENM

1 -0.25069 0.019464 16 04001-49001

2 -0.19626 -0.089814 1 04001-04001

3 -0.19626 -0.089814 1 04001-35001

4 -0.19626 -0.089814 19 04001-35001

5 -0.19364 -0.071207 18 04001-35001

6 -0.18232 0.008103 17 04001-08001

7 -0.18232 0.008103 17 04001-35001

8 -0.18232 0.008103 17 04001-49001

9 -0.18232 0.008103 17 04001-49001

10 -0.18232 0.008103 23 08001-35001

11 -0.18232 0.008103 23 08001-49001

12 -0.18232 0.008103 23 08001-49001

13 -0.18232 0.008103 38 35001-49001

14 -0.18232 0.008103 38 35001-49001

15 -0.18232 0.008103 41 49001-49001

16 -0.18088 0.019154 24 08001-49001

17 -0.17280 0.077550 25 08001-49001

18 -0.09980 -0.009887 30 35001-35001

19 -0.09918 -0.000798 22 08001-35001

20 -0.08255 0.051014 20 08001-08001

Step 4. The final processing step is to sort the dataset into LINENM order and generate MOVE and DRAW annotate functions. The LINE variable in this example is set to 8 which is as close to the cartographers state boundary as I could get. It certainly could be set to any of the 32 line types SAS/GRAPH supports. One final item to note is that we have discarded several points that are no longer needed. If there is such a "gap" we need to move to the next point instead of drawing to it.

PROC SORT DATA=DUP2; BY LINENM N;

RUN;

DATA ANSTLINE;

SET DUP2;

BY LINENM;

LAG1N=LAG(N);

IF FIRST.LINENM AND LAST.LINENM /* DELETE SINGLE POINTS */

THEN DELETE;

RETAIN WHEN 'A';

RETAIN XSYS '2';

RETAIN YSYS '2';

RETAIN LINE 8;

RETAIN SAVEX;

RETAIN SAVEY;

RETAIN SAVEN;

IF FIRST.LINENM THEN /* BEGINNING OF NEW LINE?*/

DO;

FUNCTION='MOVE ';

OUTPUT;

SAVEX=X;

SAVEY=Y;

SAVEN=N;

END;

ELSE

IF LAG1N+1 =N THEN /* IS IT NEXT POINT FROM */

DO; /* ORIGINAL MAP? */

FUNCTION='DRAW '; /* DRAW TO POINT */

COLOR='RED ';

OUTPUT;

END;

ELSE /* NO, MUST BE GAP, MOVE */

DO;

FUNCTION='MOVE ';

COLOR='RED ';

OUTPUT;

END;

KEEP X Y N LINENM WHEN XSYS YSYS LINE FUNCTION COLOR;

RUN;

PROC PRINT DATA=ANSTLINE;

TITLE 'ANSTLINE';

RUN;

ANSTLINE

F

U

L N

I C C

N W X Y L T O

O E H S S I I L

B N E Y Y N O O

S X Y N M N S S E N R

1 -0.19626 -0.089814 1 04001-35001 A 2 2 8 MOVE

2 -0.18232 0.008103 17 04001-35001 A 2 2 8 MOVE RED

3 -0.19364 -0.071207 18 04001-35001 A 2 2 8 DRAW RED

4 -0.19626 -0.089814 19 04001-35001 A 2 2 8 DRAW RED

5 -0.25069 0.019464 16 04001-49001 A 2 2 8 MOVE

6 -0.18232 0.008103 17 04001-49001 A 2 2 8 DRAW RED

7 -0.18232 0.008103 17 04001-49001 A 2 2 8 MOVE RED

8 -0.09918 -0.000798 22 08001-35001 A 2 2 8 MOVE

9 -0.18232 0.008103 23 08001-35001 A 2 2 8 DRAW RED

10 -0.18232 0.008103 23 08001-49001 A 2 2 8 MOVE

11 -0.18232 0.008103 23 08001-49001 A 2 2 8 MOVE RED

12 -0.18088 0.019154 24 08001-49001 A 2 2 8 DRAW RED

13 -0.17280 0.077550 25 08001-49001 A 2 2 8 DRAW RED

14 -0.18232 0.008103 38 35001-49001 A 2 2 8 MOVE

15 -0.18232 0.008103 38 35001-49001 A 2 2 8 MOVE RED

Step 5. Using the above dataset as an annotation dataset to PROC GMAP produces the final graph for our four states as shown here.

PROC GMAP DATA=REGIONS

MAP=REGIONS;

CHORO REGION / DISCRETE NOLEGEND ANNOTATE=ANSTLINE;

ID REGION;

TITLE F=SWISS C=RED 'ARIZONA, COLORADO, NEW MEXICO, UTAH';

PATTERN1 C=BLACK V=EMPTY R=4;;

RUN;

Running the entire country though the program and using the annotate dataset with the regional map shown earlier produces a regional map with state lines drawn by the Annotate facility.

The final product


Additional annotation as described in the SAS/GRAPH users guide can be used to produce the final Map.

The author will be glad to answer questions and accept suggestions at the following address:

Steven First
Systems Seminar Consultants
2997 Yarmouth Greenway Drive
Madison, WI 53711

Voice: (608) 278-9964
Fax: (608) 278-0065

E-mail: train@sys-seminar.com
Website: http://www.sys-seminar.com

Acknowledgments:
Special thanks to: Steven Subichin Miller Brewing Company
Richard Langston SAS Institute Inc.
Tom Miron Systems Seminar Consultants.
SAS and SAS/GRAPH are registered trademarks of SAS Institute Inc.