Hidden Gems in the SAS® System

Steven First
Systems Seminar Consultants, Inc.
Madison, WI 53711 (608) 278-9964


 

Abstract

The SAS system has a wealth of features that makes it so unique, and sometimes those features are not obvious or easy to find in documentation. This paper will look at several new features, as well as some features that have been around for a while, that we have found to be useful.


Introduction

Our office's SAS documentation subscription brings us new manuals almost weekly, and our total library of SAS related material covers one wall. While our library is very thorough and the SAS documentation is of very high quality, one problem that sometimes occurs is that enhancements are often documented in technical reports that many users don't see, and then of course they don't know about those new features. I have noticed that very often that features listed in technical reports are simple items that can save an enormous amount of work.


Raw Data Handling

Certainly one of the strongest features of the SAS system is the ability to read and write just about any file including some with very complex layouts. Examples of some difficult layouts are files with multiple record types, variable length records, files with imbedded data that needs to be scanned, data with lower case characters, short records, data with recurring values, and comma delimited data for import/export to other products. SAS can handle all of the above. INPUT and PUT of course offer a wealth of features, but here I am going to investigate just a few of them.


My Favorite SAS Feature

The trailing @ with INPUT allows us to read a record, examine it, and then "reread" the same record with a second INPUT if need be. Uses are to read multiple record types or to read some of the values in a record which are needed to read other data on the same record. The trailing @ is a very efficient feature because since the record is already in memory, the second INPUT is very fast.


An INPUT trailing @ example:

1. The following system file has 255 kinds of records each with a unique layout:

type 05 is a job record

type 15 is an dataset record

type 21 is a tape dismount record.


2. The records can appear in any order and are not necessarily related.


3. Build a separate SAS dataset for each of the three types listed above. Delete any other records found.


05 PRODPAYR J1981 05/19/91 10:31:02

15 PROD.MONTHLY.PAYROLL 3380

15 PROD.MONTHLY.BACKUP 3400

21 1B1 145664

05 TSTCOMPL J1982 05/19/91 10:32:02

15 SYS1.LINKLIB 3380

15 XYZ.TESTLIB 3380

 

DATA JOBS(KEEP=JNAME JNUM JDATE JTIME)

DSNS(KEEP=DSN UNIT)

TAPE(KEEP=ADDR VOLUME);

INFILE RAWIN;

INPUT @1 TYPE $2. @; /* INPUT AND HOLD */

IF TYPE='05' THEN /* IS IT A 05 ? */

DO; /* YES */

INPUT @4 JNAME $8. @13 JNUM $5.

/* READ IT AGAIN */

@19 JDATE MMDDYY8. @28 JTIME TIME8.;

OUTPUT JOBS; /* WRITE TO JOBS */

RETURN; /* GET OUT */

END;

IF TYPE='15' THEN /* IS IT A 15 ? */

DO; /* YES */

INPUT @4 DSN $44. @50 UNIT $4.;

/* REREAD */

OUTPUT DSNS;

RETURN;

END;

IF TYPE='21' THEN /* IS IT A 21 ? */

DO;

INPUT @4 ADDR $3. @10 VOLUME $6. ;

OUTPUT TAPE;

RETURN;

END;

FORMAT JDATE MMDDYY8. JTIME TIME8.

RUN;




Trailing @ with PUT

Many features of INPUT can also be applied to PUT when a program needs to write raw data. The trailing @ can be coded after a PUT statement so that the next PUT statement doesn't write to a new record, but rather appends to the same record.


A PUT trailing @ example:

DATASET CUSTID

OBS CUSTID SALESID

1 10001 900201

2 10003 900386

3 10003 900386

4 10003 900386



CUSTID and SALESID appear in a SAS dataset, print each CUSTID only once for each SALESID.

DATA _NULL_;

TITLE 'PUTS With Trailing @';

BY CUSTID;

FILE PRINT;

SET CUSTSALE

IF FIRST.CUSTID

THEN PUT @11 CUSTID $5. @; ** put and hold ptr;

PUT @24 SALESID $6.; ** put on same line;

RUN;



PUTS With Trailing @

10001 900201

10003 900386

900386

900386



Scanning With INPUT

An INPUT pointer can be a constant or a character variable which causes INPUT to scan the record looking for the value anywhere in the record, and then continue reading from that point.


Example: Read the name after a title:

INPUT @'Mr.' NAME $CHAR20.;




New Informats With SAS 6.07

The $QUOTEw. informat removes beginning and ending quotes during INPUT.

$UPCASEw. translates any lower case characters to upper case.

Example: Remove the quotes around COMPANY and upcase NAME.

data compfile;

infile datalines;

input @1 company $quote10.

@20 name $upcase10.;

datalines;

"SSC" Steve

;

proc print data=compfile;run;



OBS COMPANY NAME

1 SSC STEVE



Variable Lists and Format Lists

When a table of recurring values needs to be read, both the variables and the formats to be used can be lists rather than specifying all values. This makes it as easy to read 500 values as it is to read 5.


Example: You have a file with a 2 character state followed by a 3 digit temperature. This sequence is repeated for all 50 states. You need to read all 50 state's data into one observation.

AK065 AZ101 AL092 AR078 . . . . WY054


data temps;

infile tempsin;

input @1(state1-state50) ($2. +4)

/* start in 1, read 2, skip 4 */

@3(temp1-temp50) (3. +3)

/* back to 3, read 3, skip 3 */

;

RUN;


 

User Written Informats

PROC FORMAT's INVALUE statement creates informats that edit and change values as they are read. This can do such things as translate character values to numeric, and edit values for correctness. This technique can be a fast and easy way to validate data on screens for online applications.


Example: translate letter grades to numerics for a grade point average and validate for correct departments.

proc format;

invalue gradefmt 'A'=4 'B'=3

'C'=2 'D'=1 'F'=0;

invalue deptfmt 101-888=_same_

other =_error_;

run;



data graddata;

infile datalines;

input @1 grade gradefmt1.

@10 dept deptfmt3. ;

datalines;

A 707

B 999

;

proc print data=graddata;run;



OBS GRADE DEPT

1 4 707

2 3 .

Special Variables

The special variables _ALL_, _CHARACTER_, _NUMERIC_, and _INFILE_ can make some programming tasks easier.

PUT _ALL_ will display all data step variables and is an easy way to debug programs. _CHARACTER_ and _NUMERIC_ along with variable lists can be used to create a comma delimited file suitable for input to many software products.

_INFILE_ references the input buffer and it can be handy to use whenever most or all of the raw input record needs to be moved or changed.


Example: Make a copy of a raw file putting 'SSC' in columns 10-12 of every line.

DATA _NULL_; /* NO DATASET NEEDED */

INFILE IN; /* RAW FILE IN */

FILE OUT; /* RAW FILE OUT */

INPUT; /* READ A RECORD */

PUT _INFILE_ /* OUTPUT INPUT REC */

@10 'SSC'; /* OVERLAY WITH CONST.*/

RUN; /* END OF STEP */



Example: Create a file to import to a spreadsheet from any SAS dataset. All character variables will appear first followed by all numerics.

OBS NAME AGE DEPT RATE

1 STEVE 43 ACCT 12.22

2 DAVID . PAYR 11.21



options missing=' ';

data _null_;

set addrdata;

file rawfile;

put (_Character_) ('"'$'",') '",'

(_numeric_) (best.',');

run;



"STEVE ","ACCT ", 43, 12.22

"DAVID ","PAYR ", , 11.21



Reading Comma Delimited Data

The opposite of the last example is the case where we have a comma delimited file and need to read it into a data step.

This type of file uses the double quote character to surround character values and commas between all fields, and of course all fields can be variable widths. This is very common and can be very difficult to read without some special INFILE options. The DLM= options allows us to specify the comma as the separator for list input, but that won't handle at least two special cases. First when a field is missing two consecutive commas are in the input file or when an imbedded comma is found in a character field is isn't handled correctly. In any case the INFILE DSD option has been a lifesaver that has handled all cases that I have encountered. Further details are found in tech report P-222.


Example: Read a comma delimited file.

"STEVE ","ACCT ", 43, 12.22

"DAVID ","PAYR ", , 11.21



data addrdata;

infile rawfile dsd;

input name $

Dept $

age

rate ;

run;

proc print data=addrdata;run;



OBS NAME DEPT AGE RATE

1 STEVE ACCT 43 12.22

2 DAVID PAYR 11.21
 

Short Records

Another problem that comes up often, especially with ASCII text files is that some records are shorter than others and if the INPUT pointer goes beyond the right side of the record SAS flows into the next record or sets variables to missing. The MISSOVER and TRUNCOVER options can help with the flowing problems and the INFILE LRECL and PAD options can essentially make the records fixed length padded with spaces.


Example: Some cities are longer than others.

1

1234567890123456789

File Steve 32 Monona

RAWIN Tom 44 Milwaukee

Kim 25 Madison



DATA ADDRESS;

INFILE RAWIN PAD LRECL=19;

INPUT NAME $1-5 AGE 8-9 CITY $11-19;

RUN;

PROC PRINT;

TITLE 'ADDRESS';

RUN;



ADDRESS

OBS NAME AGE CITY

1 Steve 32 Monona

2 Tom 44 Milwaukee

3 Kim 25 Madison



Reading or Writing Multiple Files

The INFILE FILEVAR option effectively allows a program to dynamically change the name of the file being read. FILEVAR names a datastep variable whose change in value causes the data step to open a new file.

Uses might be to read all the members in an MVS PDS or all the files in a directory etc. This option is also available as a FILE option when writing multiple raw data files.

Example: Read every file listed after datalines.

data filesds;

input filename $ 1-20;

infile rawfile filevar=filename end=end;

do until(end);

input value $1-5;

output;

end;

datalines;

Q:\TESTDAT\FILE1.DAT

Q:\TESTDAT\FILE2.DAT

Q:\TESTDAT\FILEA.DAT

;



Transporting SAS Data

SAS/CONNECT certainly provides multiple ways to conveniently move SAS libraries and their entries across computer platforms. This does mean that SAS/CONNECT must be licensed on all used platforms, and that scripts are written and debugged. Another way to move data without SAS/CONNECT is to use the SAS XPORT engine to create a SAS XPORT file.

The steps in using this special binary file are to:

1. Create the XPORT file on one system using XPORT and PROC CPORT.

2. Move the XPORT file to the second system using a binary transfer program.

3. Construct a new SAS library on the second system using PROC CIMPORT.


We have moved data using this technique to and from several different SAS platforms and have found it to be especially valuable.


Example: Move a SAS library from MVS to WINDOWS via a XPORT file.


The MVS job:

//STEP02 EXEC SAS6

//SASDATA DD DSN=MY.SAS.LIBRARY,DISP=SHR

//OUT DD DSN=MY.XPORT.DAT,DISP=(,CATLG,DELETE),

// SPACE=(TRK,(100,100),RLSE),

// DCB=(LRECL=80,RECFM=FB,BLKSIZE=8000)

PROC CPORT LIBRARY=SASDATA FILE=OUT;

RUN;



The Windows job:

FILENAME FILEIN 'C:\TEMP\GRAPHDWN';

LIBNAME SASDATA 'C:\SASDATA';

PROC CIMPORT LIBRARY=SASDATA

INFILE=FILEIN;

RUN;
 

DATA Step Views

In release 6.07 DATA step views were introduced. "Introducton to DATA Step Views", POLZIN SUGI 18 along with Tech report P-222 describe this feature. The intention here is to make data appear logically as a SAS data set, even though physically the data may be in flat files, SAS data files, DBMS systems, or other sources. An input DATA step view can be virtually any DATA step that eventually outputs an observation. When a consumer of an observation (a proc or data step) requests an observation, the code in the view is executed and an observation is returned.

Two features of DATA step views are:

1. Simplification of code. Very complex programs can be stored once and when referenced everything appears a SAS dataset.

2. Multiple passes may be eliminated saving CPU, wall time and intermediate work space.


Example: Two large flat files that are already sorted by account number need to be merged.


Solution 1: The traditional method is to convert each flat file to a SAS data set, then in a third step merge them. The data from file one is passed three times, that from file two is passed three times, and finally the final dataset is created.

data file1;

infile raw1 lrecl=60 pad;

input @1 field1 $10.

@10 field2 $10.

@20 acct $20.;

run;

data file2;

infile raw2 lrecl=60 pad;

input @1 field3 $10.

@10 field4 $10.

@20 acct $20.;

run;

data both;

merge file1

file2;

by acct;

run;



Solution 2: By using data step views, the flat files are not passed until the consumer (the merge step), and thus the data from the two raw files are only passed one time. Depending on file sizes, some significant time savings as well as significant intermediate space savings should be realized.

data file1/view=file1;

infile raw1 lrecl=60 pad;

input @1 field1 $10.

@10 field2 $10.

@20 acct $20.;

run;

data file2/view=file2;

infile raw2 lrecl=60 pad;

input @1 field3 $10.

@10 field4 $10.

@20 acct $20.;

run;

data both;

merge file1

file2;

by acct;

run;



Dictionary Tables

As part of PROC SQL dictionary tables and corresponding SAS views have been set up to provide programs with names of SAS libraries allocated, members within the libraries, variables inside data members, entries within SAS catalogs, SAS options and much more.

A partial list of the dictionary tables is shown below; for a complete listing the documentation or examination of the SASHELP library can be used.


SASHELP.VMEMBER SAS files and catalogs

SASHELP.VTABLE SAS datafiles

SASHELP.VCOLUMN SAS vars in all datasets

SASHELP.VCATALG SAS catalogs and entries

SASHELP.VEXTFL external files allocated

SASHELP.VVIEW SAS views

SASHELP.VINDEX SAS indexing information

SASHELP.VOPTION SAS options

SASHELP.VSLIB Sorted libname list

SASHELP.VSTABLE Sorted table list

SASHELP.VSCATLG Sorted catalog list

SASHELP.VSACCESS Sorted access descriptors

SASHELP.VSVIEW Sorted data views

SASHELP.VSTABVW Sorted files and views


To use the views, just reference them like any other SAS dataset.


Example: List all the datasets and their number of observations created since 1 January.


proc print data=sashelp.vtable;

where crdate ge '01jan96:0:0'dt

and libname='WORK' ;

var memname crdate nobs;

run;

OBS MEMNAME CRDATE NOBS

1 ADDRDATA 13JAN96:10:19:58 2

2 BOTH 13JAN96:10:20:07 1730

3 COMPFILE 13JAN96:10:19:52 1

4 FILE1 13JAN96:10:19:59 1730



PROC SQL Pass-Through Facility

Originally access to a third party DBMS was done through SAS ACCESS and VIEW descriptors. In release 6.07 PROC SQL allows you to write native SQL code that is passed through relatively untouched to the DBMS for processing. This facility can be crucial to applications especially using large files. The only documentation for the facility is in the technical reports, thus users consulting only the SAS/ACCESS manuals might not be aware of its existence.

Example: Create an extract datafile from DB2.

PROC SQL;

CONNECT TO DB2(SSID=DP05);

CREATE TABLE ENROLL AS SELECT *

FROM CONNECTION TO DB2

(SELECT DOB,

EFFDATE,

NAME

FROM DBFILE.ENR

WHERE STATE='WI')

AS EXTRACT ( DOB,

EFFDATE,

NAME);

PROC PRINT DATA=ENROLL;RUN;



The SET KEY= Option

KEY= allows direct access to an observation by an index value. This allows what other systems call random access. KEY= can be used with with MODIFY or SET.

A KEY example: Read from MASTER, only the observations that are also in SELFILE.

MASTER

OBS OFFACCT MATCHED

1 12345

2 23456

3 34567

SELFILE

OBS OFFACCT

1 12345

2 67890



OPTIONS ERRORS=0;

DATA MATCHES;

SET SELFILE;

SET MASTER KEY=OFFACCT; * RAND READ MASTER;

IF _IORC_=0 THEN

OUTPUT; * FOUND IT;

RUN;


PROC PRINT DATA=MATCHES;RUN;

RUN;

OBS OFFACCT MATCHED

1 12345



The MODIFY Statement

MODIFY works with some other SAS statements to update a SAS dataset in place rather than creating new SAS files as SET, MERGE, and UPDATE do. The following statements can be used in conjunction with MODIFY:

OUTPUT writes the current obs to the end of the dataset

REPLACE current observation is updated in place

REMOVE deletes the current observation


Example: Update in place any master records that matched with SELFILE.

DATA MASTER;

SET SELFILE;

MATCHED=' ';

MODIFY MASTER KEY=OFFACCT; * RAND READ TO MASTER;

IF _IORC_=0 THEN

DO; * FOUND IT ;

MATCHED='Y'; * MARK OBS AS MATCHED;

REPLACE; * REWRITE OBS;

END;

RUN;



PROC PRINT DATA=MASTER;

RUN;

OBS OFFACCT MATCHED

1 12345 Y

2 23456

3 34567



New DATA Step Functions

Each release brings more and more functions. A pair that are fairly new are the QUOTE and DEQUOTE functions.

QUOTE adds double quotes around a value and DEQUOTE removes them. In the past a lot of substringing and concatenation was required to do the same thing.


Example:

data compfile;

infile datalines;

input @1 company $10.

@20 name $10.;

company=dequote(company);

name=quote(name);

datalines;

"SSC" Steve

proc print data=compfile;run;



OBS COMPANY NAME

1 SSC "Steve"

Old DATA Step Functions

Mentioning just a few of the hundreds of SAS functions is difficult, but a few that are especially useful are MOD, RANUNI, PUT, and INPUT.

MOD returns a remainder from a division and is useful to select every nth record.

RANUNI gives a uniformly distributed random number and can be used to pull approximate size samples easily.

IF MOD(_N_,100)= 0; * select every 100th rec;

IF RANUNI(0) GE .10; * select app. 10 %;
 

PUT and INPUT give very concise ways to convert between SAS character and numeric values without messages.


Examples:

DATA PUTINPUT;

INPUT NAME $1-10

RATE $12-18 /* RATE IS CHAR COMING IN */

HOURS 20-35; /* HOURS IS NUMERIC */

var sc_project=4597449; var sc_invisible=1; var sc_partition=56; var sc_click_stat=1; var sc_security="3d19425c"; var sc_remove_link=1;

myspace view counter