Hidden Gems in the SASŪ System
Steven First
Systems Seminar Consultants, Inc.
Madison, WI 53711 (608) 278-9964
Abstract
The SAS system has a wealth of features that makes it so unique, and sometimes those features are not obvious or easy to find in documentation. This paper will look at several new features, as well as some features that have been around for a while, that we have found to be useful.
Introduction
Our office's SAS documentation subscription brings us new manuals almost weekly, and our total library of SAS related material covers one wall. While our library is very thorough and the SAS documentation is of very high quality, one problem that sometimes occurs is that enhancements are often documented in technical reports that many users don't see, and then of course they don't know about those new features. I have noticed that very often that features listed in technical reports are simple items that can save an enormous amount of work.
Raw Data Handling
Certainly one of the strongest features of the SAS system is the ability to read and write just about any file including some with very complex layouts. Examples of some difficult layouts are files with multiple record types, variable length records, files with imbedded data that needs to be scanned, data with lower case characters, short records, data with recurring values, and comma delimited data for import/export to other products. SAS can handle all of the above. INPUT and PUT of course offer a wealth of features, but here I am going to investigate just a few of them.
My Favorite SAS Feature
The trailing @ with INPUT allows us to read a record, examine it, and then "reread" the same record with a second INPUT if need be. Uses are to read multiple record types or to read some of the values in a record which are needed to read other data on the same record. The trailing @ is a very efficient feature because since the record is already in memory, the second INPUT is very fast.
An INPUT trailing @ example:
1. The following system file has 255 kinds of records each with a unique layout:
type 05 is a job record
type 15 is an dataset record
type 21 is a tape dismount record.
2. The records can appear in any order and
are not necessarily related.
3. Build a separate SAS dataset for each of
the three types listed above. Delete any other records found.
05 PRODPAYR J1981 05/19/91 10:31:02
15 PROD.MONTHLY.PAYROLL 3380
15 PROD.MONTHLY.BACKUP 3400
21 1B1 145664
05 TSTCOMPL J1982 05/19/91 10:32:02
15 SYS1.LINKLIB 3380
15 XYZ.TESTLIB 3380
DATA JOBS(KEEP=JNAME JNUM JDATE JTIME)
DSNS(KEEP=DSN UNIT)
TAPE(KEEP=ADDR VOLUME);
INFILE RAWIN;
INPUT @1 TYPE $2. @; /* INPUT AND HOLD */
IF TYPE='05' THEN /* IS IT A 05 ? */
DO; /* YES */
INPUT @4 JNAME $8. @13 JNUM $5.
/* READ IT AGAIN */
@19 JDATE MMDDYY8. @28 JTIME TIME8.;
OUTPUT JOBS; /* WRITE TO JOBS */
RETURN; /* GET OUT */
END;
IF TYPE='15' THEN /* IS IT A 15 ? */
DO; /* YES */
INPUT @4 DSN $44. @50 UNIT $4.;
/* REREAD */
OUTPUT DSNS;
RETURN;
END;
IF TYPE='21' THEN /* IS IT A 21 ? */
DO;
INPUT @4 ADDR $3. @10 VOLUME $6. ;
OUTPUT TAPE;
RETURN;
END;
FORMAT JDATE MMDDYY8. JTIME TIME8.
RUN;
Trailing @ with PUT
Many features of INPUT can also be applied to PUT when a program needs to write raw data. The trailing @ can be coded after a PUT statement so that the next PUT statement doesn't write to a new record, but rather appends to the same record.
A PUT trailing @ example:
DATASET CUSTID
OBS CUSTID SALESID
1 10001 900201
2 10003 900386
3 10003 900386
4 10003 900386
CUSTID and SALESID appear in a SAS dataset,
print each CUSTID only once for each SALESID.
DATA _NULL_;
TITLE 'PUTS With Trailing @';
BY CUSTID;
FILE PRINT;
SET CUSTSALE
IF FIRST.CUSTID
THEN PUT @11 CUSTID $5. @; ** put and hold ptr;
PUT @24 SALESID $6.; ** put on same line;
RUN;
PUTS With Trailing @
10001 900201
10003 900386
900386
900386
Scanning With INPUT
An INPUT pointer can be a constant or a character variable which causes INPUT to scan the record looking for the value anywhere in the record, and then continue reading from that point.
Example: Read the name after a title:
INPUT @'Mr.' NAME $CHAR20.;
New Informats With SAS 6.07
The $QUOTEw. informat removes beginning and ending quotes during INPUT.
$UPCASEw. translates any lower case characters to upper case.
Example: Remove the quotes around COMPANY and upcase NAME.
data compfile;
infile datalines;
input @1 company $quote10.
@20 name $upcase10.;
datalines;
"SSC" Steve
;
proc print data=compfile;run;
OBS COMPANY NAME
1 SSC STEVE
Variable Lists and Format Lists
When a table of recurring values needs to be read, both the variables and the formats to be used can be lists rather than specifying all values. This makes it as easy to read 500 values as it is to read 5.
Example: You have a file with a 2 character
state followed by a 3 digit temperature. This sequence is repeated for all 50
states. You need to read all 50 state's data into one observation.
AK065 AZ101 AL092 AR078 . . . . WY054
data temps;
infile tempsin;
input @1(state1-state50) ($2. +4)
/* start in 1, read 2, skip 4 */
@3(temp1-temp50) (3. +3)
/* back to 3, read 3, skip 3 */
;
RUN;
User Written Informats
PROC FORMAT's INVALUE statement creates informats that edit and change values as they are read. This can do such things as translate character values to numeric, and edit values for correctness. This technique can be a fast and easy way to validate data on screens for online applications.
Example: translate letter grades to numerics
for a grade point average and validate for correct departments.
proc format;
invalue gradefmt 'A'=4 'B'=3
'C'=2 'D'=1 'F'=0;
invalue deptfmt 101-888=_same_
other =_error_;
run;
data graddata;
infile datalines;
input @1 grade gradefmt1.
@10 dept deptfmt3. ;
datalines;
A 707
B 999
;
proc print data=graddata;run;
OBS GRADE DEPT
1 4 707
2 3 .
Special Variables
The special variables _ALL_, _CHARACTER_, _NUMERIC_, and _INFILE_ can make some programming tasks easier.
PUT _ALL_ will display all data step variables and is an easy way to debug programs. _CHARACTER_ and _NUMERIC_ along with variable lists can be used to create a comma delimited file suitable for input to many software products.
_INFILE_ references the input buffer and it can be handy to use whenever most or all of the raw input record needs to be moved or changed.
Example: Make a copy of a raw file putting
'SSC' in columns 10-12 of every line.
DATA _NULL_; /* NO DATASET NEEDED */
INFILE IN; /* RAW FILE IN */
FILE OUT; /* RAW FILE OUT */
INPUT; /* READ A RECORD */
PUT _INFILE_ /* OUTPUT INPUT REC */
@10 'SSC'; /* OVERLAY WITH CONST.*/
RUN; /* END OF STEP */
Example: Create a file to import to a spreadsheet
from any SAS dataset. All character variables will appear first followed by
all numerics.
OBS NAME AGE DEPT RATE
1 STEVE 43 ACCT 12.22
2 DAVID . PAYR 11.21
options missing=' ';
data _null_;
set addrdata;
file rawfile;
put (_Character_) ('"'$'",') '",'
(_numeric_) (best.',');
run;
"STEVE ","ACCT ", 43, 12.22
"DAVID ","PAYR ", , 11.21
Reading Comma Delimited Data
The opposite of the last example is the case where we have a comma delimited file and need to read it into a data step.
This type of file uses the double quote character to surround character values and commas between all fields, and of course all fields can be variable widths. This is very common and can be very difficult to read without some special INFILE options. The DLM= options allows us to specify the comma as the separator for list input, but that won't handle at least two special cases. First when a field is missing two consecutive commas are in the input file or when an imbedded comma is found in a character field is isn't handled correctly. In any case the INFILE DSD option has been a lifesaver that has handled all cases that I have encountered. Further details are found in tech report P-222.
Example: Read a comma delimited file.
"STEVE ","ACCT ", 43, 12.22
"DAVID ","PAYR ", , 11.21
data addrdata;
infile rawfile dsd;
input name $
Dept $
age
rate ;
run;
proc print data=addrdata;run;
OBS NAME DEPT AGE RATE
1 STEVE ACCT 43 12.22
2 DAVID PAYR 11.21
Short Records
Another problem that comes up often, especially with ASCII text files is that some records are shorter than others and if the INPUT pointer goes beyond the right side of the record SAS flows into the next record or sets variables to missing. The MISSOVER and TRUNCOVER options can help with the flowing problems and the INFILE LRECL and PAD options can essentially make the records fixed length padded with spaces.
Example: Some cities are longer than others.
1
1234567890123456789
File Steve 32 Monona
RAWIN Tom 44 Milwaukee
Kim 25 Madison
DATA ADDRESS;
INFILE RAWIN PAD LRECL=19;
INPUT NAME $1-5 AGE 8-9 CITY $11-19;
RUN;
PROC PRINT;
TITLE 'ADDRESS';
RUN;
ADDRESS
OBS NAME AGE CITY
1 Steve 32 Monona
2 Tom 44 Milwaukee
3 Kim 25 Madison
Reading or Writing Multiple Files
The INFILE FILEVAR option effectively allows a program to dynamically change the name of the file being read. FILEVAR names a datastep variable whose change in value causes the data step to open a new file.
Uses might be to read all the members in an MVS PDS or all the files in a directory etc. This option is also available as a FILE option when writing multiple raw data files.
Example: Read every file listed after datalines.
data filesds;
input filename $ 1-20;
infile rawfile filevar=filename end=end;
do until(end);
input value $1-5;
output;
end;
datalines;
Q:\TESTDAT\FILE1.DAT
Q:\TESTDAT\FILE2.DAT
Q:\TESTDAT\FILEA.DAT
;
Transporting SAS Data
SAS/CONNECT certainly provides multiple ways to conveniently move SAS libraries and their entries across computer platforms. This does mean that SAS/CONNECT must be licensed on all used platforms, and that scripts are written and debugged. Another way to move data without SAS/CONNECT is to use the SAS XPORT engine to create a SAS XPORT file.
The steps in using this special binary file are to:
1. Create the XPORT file on one system using XPORT and PROC CPORT.
2. Move the XPORT file to the second system using a binary transfer program.
3. Construct a new SAS library on the second system using PROC CIMPORT.
We have moved data using this technique to
and from several different SAS platforms and have found it to be especially
valuable.
Example: Move a SAS library from MVS to WINDOWS
via a XPORT file.
The MVS job:
//STEP02 EXEC SAS6
//SASDATA DD DSN=MY.SAS.LIBRARY,DISP=SHR
//OUT DD DSN=MY.XPORT.DAT,DISP=(,CATLG,DELETE),
// SPACE=(TRK,(100,100),RLSE),
// DCB=(LRECL=80,RECFM=FB,BLKSIZE=8000)
PROC CPORT LIBRARY=SASDATA FILE=OUT;
RUN;
The Windows job:
FILENAME FILEIN 'C:\TEMP\GRAPHDWN';
LIBNAME SASDATA 'C:\SASDATA';
PROC CIMPORT LIBRARY=SASDATA
INFILE=FILEIN;
RUN;
DATA Step Views
In release 6.07 DATA step views were introduced. "Introducton to DATA Step Views", POLZIN SUGI 18 along with Tech report P-222 describe this feature. The intention here is to make data appear logically as a SAS data set, even though physically the data may be in flat files, SAS data files, DBMS systems, or other sources. An input DATA step view can be virtually any DATA step that eventually outputs an observation. When a consumer of an observation (a proc or data step) requests an observation, the code in the view is executed and an observation is returned.
Two features of DATA step views are:
1. Simplification of code. Very complex programs can be stored once and when referenced everything appears a SAS dataset.
2. Multiple passes may be eliminated saving CPU, wall time and intermediate work space.
Example: Two large flat files that are already
sorted by account number need to be merged.
Solution 1: The traditional method is to convert
each flat file to a SAS data set, then in a third step merge them. The data
from file one is passed three times, that from file two is passed three times,
and finally the final dataset is created.
data file1;
infile raw1 lrecl=60 pad;
input @1 field1 $10.
@10 field2 $10.
@20 acct $20.;
run;
data file2;
infile raw2 lrecl=60 pad;
input @1 field3 $10.
@10 field4 $10.
@20 acct $20.;
run;
data both;
merge file1
file2;
by acct;
run;
Solution 2: By using data step views, the
flat files are not passed until the consumer (the merge step), and thus the
data from the two raw files are only passed one time. Depending on file sizes,
some significant time savings as well as significant intermediate space savings
should be realized.
data file1/view=file1;
infile raw1 lrecl=60 pad;
input @1 field1 $10.
@10 field2 $10.
@20 acct $20.;
run;
data file2/view=file2;
infile raw2 lrecl=60 pad;
input @1 field3 $10.
@10 field4 $10.
@20 acct $20.;
run;
data both;
merge file1
file2;
by acct;
run;
Dictionary Tables
As part of PROC SQL dictionary tables and corresponding SAS views have been set up to provide programs with names of SAS libraries allocated, members within the libraries, variables inside data members, entries within SAS catalogs, SAS options and much more.
A partial list of the dictionary tables is shown below; for a complete listing the documentation or examination of the SASHELP library can be used.
SASHELP.VMEMBER SAS files and catalogs
SASHELP.VTABLE SAS datafiles
SASHELP.VCOLUMN SAS vars in all datasets
SASHELP.VCATALG SAS catalogs and entries
SASHELP.VEXTFL external files allocated
SASHELP.VVIEW SAS views
SASHELP.VINDEX SAS indexing information
SASHELP.VOPTION SAS options
SASHELP.VSLIB Sorted libname list
SASHELP.VSTABLE Sorted table list
SASHELP.VSCATLG Sorted catalog list
SASHELP.VSACCESS Sorted access descriptors
SASHELP.VSVIEW Sorted data views
SASHELP.VSTABVW Sorted files and views
To use the views, just reference them like
any other SAS dataset.
Example: List all the datasets and their number
of observations created since 1 January.
proc print data=sashelp.vtable;
where crdate ge '01jan96:0:0'dt
and libname='WORK' ;
var memname crdate nobs;
run;
OBS MEMNAME CRDATE NOBS
1 ADDRDATA 13JAN96:10:19:58 2
2 BOTH 13JAN96:10:20:07 1730
3 COMPFILE 13JAN96:10:19:52 1
4 FILE1 13JAN96:10:19:59 1730
PROC SQL Pass-Through Facility
Originally access to a third party DBMS was done through SAS ACCESS and VIEW descriptors. In release 6.07 PROC SQL allows you to write native SQL code that is passed through relatively untouched to the DBMS for processing. This facility can be crucial to applications especially using large files. The only documentation for the facility is in the technical reports, thus users consulting only the SAS/ACCESS manuals might not be aware of its existence.
Example: Create an extract datafile from DB2.
PROC SQL;
CONNECT TO DB2(SSID=DP05);
CREATE TABLE ENROLL AS SELECT *
FROM CONNECTION TO DB2
(SELECT DOB,
EFFDATE,
NAME
FROM DBFILE.ENR
WHERE STATE='WI')
AS EXTRACT ( DOB,
EFFDATE,
NAME);
PROC PRINT DATA=ENROLL;RUN;
The SET KEY= Option
KEY= allows direct access to an observation by an index value. This allows what other systems call random access. KEY= can be used with with MODIFY or SET.
A KEY example: Read from MASTER, only the observations that are also in SELFILE.
MASTER
OBS OFFACCT MATCHED
1 12345
2 23456
3 34567
SELFILE
OBS OFFACCT
1 12345
2 67890
OPTIONS ERRORS=0;
DATA MATCHES;
SET SELFILE;
SET MASTER KEY=OFFACCT; * RAND READ MASTER;
IF _IORC_=0 THEN
OUTPUT; * FOUND IT;
RUN;
PROC PRINT DATA=MATCHES;RUN;
RUN;
OBS OFFACCT MATCHED
1 12345
The MODIFY Statement
MODIFY works with some other SAS statements to update a SAS dataset in place rather than creating new SAS files as SET, MERGE, and UPDATE do. The following statements can be used in conjunction with MODIFY:
OUTPUT writes the current obs to the end of the dataset
REPLACE current observation is updated in place
REMOVE deletes the current observation
Example: Update in place any master records
that matched with SELFILE.
DATA MASTER;
SET SELFILE;
MATCHED=' ';
MODIFY MASTER KEY=OFFACCT; * RAND READ TO MASTER;
IF _IORC_=0 THEN
DO; * FOUND IT ;
MATCHED='Y'; * MARK OBS AS MATCHED;
REPLACE; * REWRITE OBS;
END;
RUN;
PROC PRINT DATA=MASTER;
RUN;
OBS OFFACCT MATCHED
1 12345 Y
2 23456
3 34567
New DATA Step Functions
Each release brings more and more functions. A pair that are fairly new are the QUOTE and DEQUOTE functions.
QUOTE adds double quotes around a value and DEQUOTE removes them. In the past a lot of substringing and concatenation was required to do the same thing.
Example:
data compfile;
infile datalines;
input @1 company $10.
@20 name $10.;
company=dequote(company);
name=quote(name);
datalines;
"SSC" Steve
proc print data=compfile;run;
OBS COMPANY NAME
1 SSC "Steve"
Old DATA Step Functions
Mentioning just a few of the hundreds of SAS functions is difficult, but a few that are especially useful are MOD, RANUNI, PUT, and INPUT.
MOD returns a remainder from a division and is useful to select every nth record.
RANUNI gives a uniformly distributed random number and can be used to pull approximate size samples easily.
IF MOD(_N_,100)= 0; * select every 100th rec;
IF RANUNI(0) GE .10; * select app. 10 %;
PUT and INPUT give very concise ways to convert between SAS character and numeric values without messages.
Examples:
DATA PUTINPUT;
INPUT NAME $1-10
RATE $12-18 /* RATE IS CHAR COMING IN */
HOURS 20-35; /* HOURS IS NUMERIC */
RATEN=INPUT(RATE,COMMA12.);
/* RATEN IS NUMERIC */
HOURSC=PUT(HOURS,Z4.0);
/* HOURSC WILL BE CHARACTER*/
DATALINES;
TOM $10,000 22
;
RUN;
PROC PRINT DATA=PUTINPUT;
RUN;
PUT AND INPUT FUNCTIONS
OBS NAME RATE HOURS RATEN HOURSC
1 TOM $10,000 22 10000 0022
The VNAME Function
VNAME can be used to return the variable name of data step variables. This can be useful when working with arrays where you may not know the actual underlying variable names.
Example: Modify the earlier program that exported
to a spreadsheet to include a line containing the variable names.
OBS NAME AGE DEPT RATE
1 STEVE 43 ACCT 12.22
2 DAVID PAYR 11.21
data _null_;
set addrdata end=eof;
file rawfile;
array chars{*} _character_;
array nums{*} _numeric_;
length varname $ 25;
if _n_ = 1 then
do;
do j=1 to dim(chars);
call vname(chars{j},varname);
put '"' varname'",'@;
end;
do j=1 to dim(nums);
call vname(nums{j},varname);
put '"' varname'",'@;
end;
put +(-1) ' ';
end;
do j=1 to dim(chars);
put '"' chars{j} '",' @ ;
end;
do j=1 to dim(nums);
put nums{j} best9. ',' @;
end;
put +(-1) ' ';
run;
"NAME ","DEPT ","AGE
","RATE "
"STEVE ","ACCT ", 43, 12.22
"DAVID ","PAYR ", , 11.21
Report Writing Gems
PROC FSEDIT in batch jobs can produce reports that display one observation per page much like the terminals display. Connecting it with PROC FSLETTER allows you to produce reports with constant text and one observation per page which is useful not only for form letters, but also for mailing labels or directories with one obs per page. We are all familiar with form letter programs, but here is one that can be used on a mainframe to process thousands of observations.
A form letter example:
PROC FSEDIT DATA=SASDATA.PERSON
SCREEN=SASDATA.PERSON
LETTER=SASDATA.LETCAT
SEND=RETIRE;RUN;
PROC REPORT In Batch
This proc in a batch mode has numerous features that could take an entire day to talk about. Two very simple uses though can be discussed very briefly, with very little more effort than PROC PRINT. These programs can be built upon to compute new fields, control layout and headers and much more. Documentation is in Tech Report P-258.
Example: A very simple PROC REPORT.
proc report data=sales nowindows;
title 'A Simple Proc Report';run;
A Simple Proc Report
DEPT DATE SALES
ACCTG 01/01/95 712.22
ACCTG 02/01/95 622.22
ACCTG 03/01/95 789.32
ACCTG 04/01/95 811.23
ACCTG 05/01/95 799.16
ACCTG 06/01/95 613.66
ACCTG 07/01/95 744.88
. . .
SALES 01/01/95 912.22
SALES 02/01/95 722.22
SALES 03/01/95 889.32
SALES 04/01/95 911.23
SALES 05/01/95 799.16
. . .
SALES 12/01/95 909.77
Example: A phone book style mult-column report.
proc report data=sales(keep=dept sales) nowindows panels=2;
title 'A Multi Column Report';run;
A Multi Column Report
DEPT SALES DEPT SALES
ACCTG 712.22 SALES 813.66
ACCTG 622.22 SALES 644.88
ACCTG 789.32 SALES 909.22
ACCTG 811.23 SALES 677.13
ACCTG 799.16 SALES 891.24
ACCTG 613.66 SALES 777.78
ACCTG 744.88 SALES 909.77
ACCTG 609.22
ACCTG 677.13
ACCTG 691.24
ACCTG 777.78
ACCTG 989.77
SALES 912.22
SALES 722.22
SALES 889.32
SALES 911.23
SALES 799.16
PROC PRINT
For those users who don't care for the way PROC PRINT rotates headers to save room or the way it breaks lines, two new options give you the control.
HEADING=V forces vertical headers
HEADING=H forces horizontal headers
ROWS=PAGE prints only one set of variables per page and continues on the next page with the remainder of variables.
proc print data=sales(obs=5) heading=v;
title 'Vertical Headings';run;
Vertical Headings
S
D D A
O E A L
B P T E
S T E S
1 ACCTG 01/01/95 712.22
2 ACCTG 02/01/95 622.22
3 ACCTG 03/01/95 789.32
4 ACCTG 04/01/95 811.23
5 ACCTG 05/01/95 799.16
BY Variables in Titles
One of the simplest and most useful options from release 6.07 is the ability to reference BY variable names and values in TITLE statements, where in the past a macro or complicated annotation might have been required.
Coding #BYVAR(byvariable) substitutes the BY variable's name, and #BYVAL(byvariable) subs it's value into a TITLE.
Note when using these features, OPTIONS NOBYLINE; should be specified to eliminate the default by lines.
Example:
options nobyline;
proc print data=sales;
title 'Sales for Department #BYVAL(DEPT)';
by dept;
pageby dept;
run;
Sales for Department ACCTG
OBS DATE SALES
1 01/01/95 712.22
2 02/01/95 622.22
3 03/01/95 789.32
4 04/01/95 811.23
5 05/01/95 799.16
. . .
12 12/01/95 989.77
Sales for Department SALES
OBS DATE SALES
13 01/01/95 912.22
14 02/01/95 722.22
15 03/01/95 889.32
16 04/01/95 911.23
17 05/01/95 799.16
. . .
24 12/01/95 909.77
Data Step Reporting
_BLANKPAGE_ and _PAGE_ @
Some very subtle features of PUT can cause extra lines to appear when ejecting pages.
PUT _BLANKPAGE_ ejects a blank page and moves the pointer to top of the next page.
PUT _PAGE_ leaves the pointer on the next line on the same page.
PUT _PAGE_ @ eliminates the extra blank line.
The PUT statements alignment specifications
of -L, -C, and -R can be specified after a format to left, center, or right
justify values.
Example:
PUT X DOLLAR7.2 -C;
The n*"character" format
modifier.
Example: Display 70 dashes in a report.
put @1 70*"-" ;
The SAS Macro Facility
One of the most difficult thing about debugging macros was knowing not only macro variable values, but also knowing in which macro referencing environment they were stored.
One of the handiest features in the DATA
step is the PUT _ALL_ statement, and in release 6.11 an equivalent statement
is available for the macro environment. This is a major enhancement that only
requires 2 words.
Example: Display all macro variables and their environments.
%PUT _ALL_;
SAS PROCs
The procedures listed below are ones that we seem to use extensively and in most cases invoke them very simply. Covering all of the options would be beyond the scope of this paper, but I would like to mention a few uses and new options.
PROC FREQ
Is used to display distributions of a variable. We also use FREQ extensively to verify data while testing. An examples might be to produce a table of states values in a mailing list. This table can give the programmer confidence in the values and easy display incorrectly keyed states etc.
A production system I support has dated transactions IE. For the JUNE run all transactions should have occurred in JUNE, but sometimes the wrong month's tape was input.
The simple PROC FREQ of the transaction date
as the first report gives the data control person an easy way to verify that
the data makes sense.
Example:
PROC FREQ DATA=TRANS;
TABLE TRANDATE;
TITLE 'Table of Transaction dates for this run';
RUN;
PROC SUMMARY/MEANS
Without question SUMMARY/MEANS is one of the most valuable programs to summarize data and reduce data in a production environment.
It is certainly easier to process a smaller summarized file than a large detailed one, and we use SUMMARY/MEANS extensively.
The ORDER= OPTION
A very valuable addition to PROC FREQ, PROC TABULATE, PROC SUMMARY/MEANS, and PROC REPORT has been the ORDER=option.
This option gives four choices for the order that the final tables will be classified and printed. In the past, tricky programs to make up dummy values and fool the system were needed.
The four ORDER= values are:
DATA class values are ordered in the order they were read
FORMATTED orders class values by formatted representation
FREQ values are ordered by descending frequency count (class values found in greatest numbers of values occurs first)
INTERNAL the same order PROC SORT would arrange them
PROC PRINTTO
PROC PRINTTO allows the redirection of the
SAS LOG or PRINT files to a disk file. This can be useful when a PROC produces
the desired values, but perhaps the report needs to be rearranged in some manner.
PROC SORT
The NODUPS and NODUPKEY options have been around for a while and can easily delete duplicate data.
NODUPS will not output a record if the entire observation matches another. This can easily discard data that got doubled up somehow.
NODUPKEY will only output one record per BY variable and could be useful if you need one record per account for example.
With release 6.07 SORT now records how a SAS dataset is sorted so that if a later SORT is coded, SORT first checks to see if it needs to sort, and if not the unnecessary sort is skipped. This is done automatically with no work for the programmer. This is a very nice addition that saves millions of unneeded sorts worldwide, and the developers should be commended on this feature.
Along with the above, PROC DATASETS can mark a dataset as sorted even though PROC SORT didn't do the original sorting. And example might be a SAS datafile build from a raw file that was already sorted. By specifying the MODIFY SORTEDBY clause, the dataset is marked and again later unnecessary sorted will be skipped.
PROC FORMAT CNTLIN Option
Value labeling formats are certainly an integral part of SAS in earlier releases the formats had to be hard coded, or generated with macros. Quite often the data needed already exists in other files and the CNTLIN option allows FORMAT to get it's information from a SAS dataset with some special variables. In almost all cases building this SAS dataset will be simpler than a corresponding macro. There is also a CNTLOUT option if an external file is needed from the FORMAT data.
Selected SAS Options
Among the many system options I would like to just discuss a few.
ALTLOG and ALTPRINT allow for a duplicate copy of the SAS log and print files to be directed to a file. This can be useful to track online systems or in any case where an archived copy makes sense.
The SYSPARM option allows the invoking command to pass values to SAS at startup time very much like the MVS JCL PARM option. The value passed can contain up to 200 characters and it can be retrieved by the SYSPARM function or the &SYSPARM macro variable. This can be very useful when code is to be hidden or protected, but yet still allow for changes to be specified.
MVS // EXEC SAS,OPTIONS='SYSPARM="JANUARY"'
TSO SAS OPTIONS('SYSPARM="JANUARY"')
CMS SAS (SYSPARM JANUARY
DATA X;
IF _N_ = 1 THEN
DO;
MONTH=SUBSTR(SYSPARM(),1,8);
RETAIN MONTH NEWYEAR;
IF SYSPARM()='JANUARY' THEN
NEWYEAR='YES';
ELSE
NEWYEAR='NO ';
END;
. . . rest of program
MVS Features
The PROCs are useful if you are running SAS under MVS
PROC PDSCOPY copies and manages MVS Partitioned datasets.
PROC TAPELABEL and PROC TAPECOPY for handling tapes.
PROC SOURCE for printing and extracting PDS members to a sequential dataset and for performing mass changes.
Accessing System Control Blocks
One of the most exciting and obscure features
is access to several system control blocks through the INFILE and FILE statements.
Normally only assembler programs can access these blocks, and it is normally
impossible for COBOL and other higher languages.
Using the JFCB
The Job File Control Block is 176 bytes of information stored for each DD card specified in the job step. This very useful block contains extensive information about each dataset including datasetname, device type, whether the dataset is catalogued, whether the dataset is a SYSIN or SYSOUT dataset, label processing options and much more. By accessing this block, the SAS DATA step can determine what dataset name was provided in the JCL. This is especially useful if the same program is run against several different datasets, and whatever dsname is provided through DD cards is to end up in a title or similar field.
A SMF processing program can determine whether
it is reading a live VSAM file, a sequential backup disk file, or a tape file.
Since many of the indicators in the JFCB are bit settings, the DATA step may
need to do bit testing, which is no problem for the DATA step.
A JFCB example
Determine the DSNAME and DSORG from the JFCB.
DATA _NULL_; /* DONT NEED DATASET */
INFILE IN JFCB=JFCBIN; /* ASK FOR JFCB */
LENGTH TITLDSN $ 44; /* SET LENGTHS AS */
LENGTH DSORG1 $1.; /* REQUIRED */
IF _N_ = 1 THEN /* FIRST TIME IN ? */
DO; /* YES, DO BLOCK */
TITLDSN=SUBSTR(JFCBIN,1,44);/* EXTRACT DSNAME */
DSORG1=SUBSTR(JFCBIN,99,1); /* AND DSORG BYTE 1 */
IF DSORG1='.1......'B THEN /* BIT TEST AS NEEDED*/
DSORGOUT='PS'; /* MUST BE SEQUENTIAL*/ END; /* END OF BLOCK */
INPUT etc. ; /* REST OF PROGRAM */
. . .
RETAIN TITLDSN DSORGOUT;/* RETAIN */ RUN; /* END OF STEP */
Other Available Control Blocks
Other MVS control blocks available on INFILE and FILE include DEVTYPE containing device type information, DSCB containing the data set control block information for non-vsam disk files, the UCBNAME containing the device address from the UCB, and the VOLUME parameter containing the volume serial of the disk dataset. Another useful INFILE option EOV is set to 1 each time a concatenated datasets boundary is detected. The above mentioned control blocks can then be reread with the new values if desired.
Summary
While SAS is certainly a large and comprehensive system, it is quite often the little things that can make programming easier if we only know about those features.
SAS is a registered trademark of SAS Institute
Inc.
The author will be glad to answer questions and accept suggestions at the following address:
Steven First
Systems Seminar Consultants, Inc
2997 Yarmouth Greenway Drive
Madison, WI 53711
Voice: (608) 278-9964
Fax: (608) 278-0065
E-mail: <train@sys-seminar.com>
Website: <http://www.sys-seminar.com>