Paper 166-2008

 

The SAS INFILE and FILE Statements

Steven First, Systems Seminar Consultants, Madison, WI

 

ABSTRACT

One of the most flexible features of the SAS system, is its ability to read and write just about any kind of raw file. 

The INFILE and FILE statements are the interfaces that connect SAS programs to those external files so that INPUT and PUT can read and write data. These statements provide many options to make reading and writing simple to complex files in an easy way.

 

Introduction

This paper will examine the INFILE and FILE statements and their options.  To access a file from any programming language, a linkage is needed between the program and the desired file.   INFILE and FILE are the statements that are used in SAS to generally link to raw files; that is, files that normally contain only data and no data dictionary. INFILE is used to point to input files and FILE points to output files.  In many ways, other than the direction of data flow, INFILE and FILE act the same and do have many of the same options.  There are also many unique options for INFILE versus FILE.

 

Because there is normally no dictionary describing raw data, it is up to the program to provide enough information so that SAS can read/write the data in or out of the data step.   INFILE/FILE statements have extensive options to help provide that information and allow SAS to process a rich and varied range of files with minimal effort.  INFILE/FILE also work with other SAS statements such as FILENAME, DATALINES, PUT and INPUT to provide extensive data input and output in the DATA step.  Because of the ability to read and write just about any kind of file format with SAS, we have a wonderfully versatile utility tool via the INFILE and FILE statements.

 

 

Basic INFILE Syntax

INFILE file-specification <options > <operating-environment-options> ;

INFILE DBMS-specifications;

 

file-specification

identifies the source of the input data records, which is an external file or instream data. File-specification can have these forms:

'external-file'

specifies the physical name of an external file. The physical name is the name that the operating environment uses to access the file.

fileref

specifies the fileref of an external file.  (note:  The fileref must be previously associated with an external file in a FILENAME statement, FILENAME function, or a system command)

fileref(file)

specifies a fileref of an aggregate storage location and the name of a file or member, enclosed in parentheses, that resides in that location. 

Operating Environment Information:   Different operating environments call an aggregate grouping of files by different names, such as a directory, a MACLIB, or a partitioned data set. Details are given in SAS operating system documentation. 

CARDS | CARDS4

DATALINES | DATALINES4

 

Note:  Complete INFILE and FILE documentation are included as appendices at the end of this paper.

 

How to Use the INFILE/FILE Statement

Because the INFILE statement identifies the file to read, it must execute before the INPUT statement that reads the input data records.  The same holds true for the FILE statement which must precede any PUT statement that performs the writing to the output raw file.

 

data x;                           /* build SAS dataset */

  infile in;                      /* raw file in       */

  input @1  Name $10.             /* read a record     */

        @20 Age    2. ;           /* with two fields   */

run;                              /* end of step       */

 

Reading Multiple files

You may specify more than one INFILE statement to read more than one file in a single data step.  The step will stop when any file tries to read past the end of file.  You may also use the INFILE statement in conditional processing, such as an IF-THEN statement, because it is executable. This enables you to control the source of the input data records.  It is a bit more difficult to read multiple flat files in a data step as compared to reading multiple SAS datasets.  For that reason it is a bit unusual to read multiple files in one step.  

The following DATA step reads from two input files during each iteration of the DATA step. As SAS switches from one file to the next, each file remains open. The input pointer remains in place to begin reading from that location the next time an INPUT statement reads from that file.

data qtrtot(drop=jansale febsale marsale   /* build a dataset and drop input  */

                 aprsale maysale junsale); /* variables                       */ 

   infile file-specification-1;            /* identify location of 1st file   */

   input name $ jansale febsale marsale;   /* read values from 1st file       */

   qtr1tot=sum(jansale,febsale,marsale);   /* sum them up                     */ 

   infile file-specification-2;            /* identify location of 2nd file   */

   input @7 aprsale maysale junsale;       /* read values from 2nd file       */

   qtr2tot=sum(aprsale,maysale,junsale);   /* sum them up                     */  

run;                                       /* end of step                     */ 

 

The DATA step terminates when SAS reaches an end of file on the shortest input file.

 

A  Hex LISTING Program

The INPUT statement does not need to specify any variables to read; it simply reads a line into a buffer from our input file.  This can be useful with the LIST statement to just display the raw input file’s buffer.  LIST will display hexadecimal notation if unprintable characters are present in the file.

data _null_;                      /* don't need dataset*/

  infile in;                      /* raw file in       */

  input;                          /* read a record     */

  list;                           /* list buffer in log*/

  if _n_ > 50 then                /* stop after 50     */

     stop;                        /* adjust as needed  */

run;                              /* end of step       */

Accessing the INPUT Buffer

After an INFILE and INPUT statement executes, the buffer area can be accessed and even altered through a special variable called _INFILE_.  This allows a simple way to read and alter data without extensive coding in the INPUT statement to define the records.   Another way that the previous program could display the input buffer area is shown below:

 

data _null_;                      /* don't need dataset*/

  infile in;                      /* raw file in       */

  input;                          /* read a record     */

  put _infile_;                   /* put  buffer in log*/

  if _n_ > 50 then                /* stop after 50     */

     stop;                        /* adjust as needed  */

run;                              /* end of step       */

Not only can we display the buffer, but we can actually alter the values before reading individual fields.  Suppose we have the following file that contains angle brackets that we would like to discard.

 

 

City Number Minutes Charge

Jackson 415-555-2384 <25> <2.45>

Jefferson 813-555-2356 <15> <1.62>

Joliet 913-555-3223 <65> <10.32>

 

In the following code, the first INPUT statement reads and holds the record in the input buffer. The compress function removes the angle brackets (< >) from special field _INFILE_.  The second INPUT statement parses the value in the buffer and then PUT displays the SAS variables. Note that the FIRSTOBS INFILE option skips the first header record. 

 

data _null_;

   length city number $16. minutes charge 8;

   infile phonbill firstobs=2;

   input @;

   _infile_ = compress(_infile_, '<>');

   input city number minutes charge;

   put city= number= minutes= charge=;

run;

 

Partial SAS log:

city=Jackson number=415-555-2384 minutes=25 charge=2.45

city=Jefferson number=813-555-2356 minutes=15 charge=1.62

city=Joliet number=913-555-3223 minutes=65 charge=10.32

 

 

ASSIGNING ANOTHER VARIABLE TO THE CURRENT BUFFER

The _INFILE_=variable names a character variable that references the contents of the current input buffer for this INFILE statement.  This variable like all automatic variables is not written to the SAS dataset.  It may be useful to define this type of variable rather than use _INFILE_ especially when multiple files are being input.  The results from the following program are identical to those of the above.

 

data _null_;

   length city number $16. minutes charge 8;

   infile phonbill firstobs=2 _infile_=phonebuff;

   input @;

   _infile_ = compress(phonebuff, '<>');

   input city number minutes charge;

   put city= number= minutes= charge=;

run;

 

 

Reading Instream Data Records with INFILE

You may use the INFILE statement with the DATALINES file specification to process instream data and still utilize other INFILE options.  An INPUT statement reads the data records that follow the DATALINES statement.  Again, the results from this next program match the earlier ones.  Note that if the system option CARDIMAGE is on, the record is assumed to be an 80 byte record padded with blanks.  For longer dataline input, OPTIONS NOCARDIMAGE may need to be specified.

data _null_;

   length city number $16. minutes charge 8;

   infile datalines firstobs=2 _infile_=phonebuff;

   input @;

   _infile_ = compress(phonebuff, '<>');

   input city number minutes charge;

   put city= number= minutes= charge=;

datalines;

City Number Minutes Charge

Jackson 415-555-2384 <25> <2.45>

Jefferson 813-555-2356 <15> <1.62>

Joliet 913-555-3223 <65> <10.32>

;

run;

 

Reading Past the End of a Line

By default, if the INPUT statement tries to read past the end of the current input data record, it then moves the input pointer to column 1 of the next record to read the remaining values. This default behavior is governed  by the FLOWOVER option and a message is written to the SAS log.  This is useful in reading data that flows over into several lines of input.

Example:  Read a file of pet names and 6 readings per animal.

 

data readings; 

 infile datalines;

 input Name $ R1-R6;

 datalines;

Gus  22 44 55 

     33 32 14

Gaia 24 22 23

     31 76 31

;

proc print data=readings;

 title 'Readings';

 run;

 

Partial SAS log:

 

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

 

 

SAS Output

 

 

                   Readings

 

Obs    Name    R1    R2    R3    R4    R5    R6

 

 1     Gus     22    44    55    33    32    14

 2     Gaia    24    22    23    31    76    31

 

CONTROLLING SHORT RECORDS WITH FORMATTED INPUT

With data that doesn’t flow over, the FLOWOVER behavior can cause errors. Several options are available to change the INPUT statement behavior when an end of line is reached. The STOPOVER option treats this condition as an error and stops building the data set. The MISSOVER option sets the remaining INPUT statement variables to missing values. The SCANOVER option, used with @'character-string' scans the input record until it finds the specified character-string. The FLOWOVER option restores the default behavior.

The TRUNCOVER and MISSOVER options are similar. After passing the end of a record both options set the remaining INPUT statement variables to missing values. The MISSOVER option, however, causes the INPUT statement to set a value to missing if the statement is unable to read an entire field because the field length that is specified in the INPUT statement is too short. The TRUNCOVER option writes whatever characters are read to the last variable so that you know what the input data record contained.

For example, an external file with variable-length records contains these records:

----+----1----+----2

1

22

333

4444

55555

The following DATA step reads this data to create a SAS data set. Only one of the input records is as long as the informatted length of the variable TESTNUM.

data numbers;

   infile 'external-file';

   input testnum 5.;

run;

This DATA step creates the three observations from the five input records because by default the FLOWOVER option is used to read the input records. This output of course is not correct. 

If you use the MISSOVER option in the INFILE statement, then the DATA step creates five observations. However, all the values that were read from records that were too short are set to missing.

The INFILE TRUNCOVER option tells INPUT to read as much as possible and that the value will be chopped off.

       infile 'external-file' truncover;

 


The table below shows the results of the three different INFILE options.

The Value of TESTNUM Using Different INFILE Statement Options

OBS

FLOWOVER

MISSOVER

TRUNCOVER

1

22

.

1

2

4444

.

22

3

55555

.

333

4

 

.

4444

5

 

55555

55555

 

 

 

Another solution

The INFILE LRECL= and PAD options can be used to specify a record width and if the actual line is shorter to fill (pad) it with blanks.  The program below reads the data correctly as all records are assumed to be 5 wide.

 

data numbers;

   infile 'external-file' lrecl=5 pad;

   input testnum 5.;

run;

proc print data=numbers;

title 'Numbers';

run;

 

 

    Numbers

 

Obs    testnum

 

 1          1

 2         22

 3        333

 4       4444

 5      55555

 

 

handling short records and missing values with list input

The example below shows data records that sometimes don’t contain all six readings.  With FLOWOVER in effect, the data read would be incorrect.  The INFILE MISSOVER option instructs input to set to missing any values that are not found by the end of the record.

 

data readings; 

 infile datalines missover;

 input Name $ R1-R6;

 datalines;

Gus  22 44 55 33

Gaia 24 22 23 31 76 31

;

proc print data=readings;

 title 'Readings';

 run;


 

                  Readings

 

Obs    Name    R1    R2    R3    R4    R5    R6

 

 1     Gus     22    44    55    33     .     .

 2     Gaia    24    22    23    31    76    31

 

 

You can also use the STOPOVER option in the INFILE statement. This causes the DATA step to halt execution when an INPUT statement does not find enough values in a record.

 

infile datalines stopover;

 

Reading Delimited Data

By default, the delimiter to read input data records with list input is a blank space. Both the delimiter-sensitive data (DSD) option and the DELIMITER= option affect how list input handles delimiters. The DELIMITER= option specifies that the INPUT statement use a character other than a blank as a delimiter for data values that are read with list input. When the DSD option is in effect, the INPUT statement uses a comma as the default delimiter.  If the data contains two consecutive delimiters, the DSD option will treat it as two values whereas DELIMITER treats it as a single unit.

In the example below, the data values are separated by the double quote character and commas.  DLM can specify those two delimiters and read the data correctly.

 

data address;

 infile datalines dlm='",';

 length city $10;

 input name $ age city $;

 datalines;

"Steve",32,"Monona"

"Tom",44,"Milwaukee"

"Kim",25,"Madison"

;

proc print data=address;

 title 'Address';

run;

 

 

            Address

 

Obs    city         name     age

 

 1     Monona       Steve     32

 2     Milwaukee    Tom       44

 3     Madison      Kim       25

 

 

In the next example, the two commas would be not be handled correctly with DLM alone.  DSD treats consecutive as well as delimiters imbedded insite quotes correctly.

data address2;

 infile datalines dsd;

 length city $10;

 input name $ age city $;

 datalines;

"Steve",32,"Monona"

"Tom",44,"Milwaukee"

"Kim",,"Madison"   

run;

proc print data=address2;

title 'Address2';

run;

 

 

Address2

 

Obs    city         name     age

 

 1     Monona       Steve     32

 2     Milwaukee    Tom       44

 3     Madison      Kim        .

 

 

Scanning Variable-Length Records for a Specific Character String

This example shows how to use TRUNCOVER in combination with SCANOVER to pull phone numbers from a phone book. The phone number is always preceded by the word "phone:". Because the phone numbers include international numbers, the maximum length is 32 characters.

 

filename phonebk host-specific-path;

data _null_;

  file phonebk;

  input line $80.;

  put line;

  datalines;

    Jenny's Phone Book

    Jim Johanson phone: 619-555-9340

       Jim wants a scarf for the holidays.

    Jane Jovalley phone: (213) 555-4820

       Jane started growing cabbage in her garden.

       Her dog's name is Juniper.

    J.R. Hauptman phone: (49)12 34-56 78-90

       J.R. is my brother.

   ;

run;

 

Use @'phone:' to scan the lines of the file for a phone number and position the file pointer where the phone number begins. Use TRUNCOVER in combination with SCANOVER to skip the lines that do not contain 'phone:' and write only the phone numbers to the log.

 

data _null_;

   infile phonebk truncover scanover;

   input @'phone:' phone $32.;

   put phone=;

run;

 

Partial SAS Log:

 

phone=619-555-9340

phone=(213) 555-4820

phone=(49)12 34-56 78-90

 

 


Reading Files That Contain Variable-Length Records

This example shows how to use LENGTH=, in combination with the $VARYING. informat, to read a file that contains variable-length records.   When a record is read, the variable LINELEN contains the length of the current record.  It can be then used to compute the length of the remaining variable and to read the rest of the record into secondvar.

 

data a;                                      /* SAS data step               */  

   infile file-specification                 /* input file                  */

           length=linelen;                   /* return length from input    */   

   input firstvar 1-10 @;                    /* read first 10.              */

   varlen=linelen-10;                        /* Calculate VARLEN            */

   input @11 secondvar $varying500. varlen;  /* read up to 500, using varlen*/

run;                                         /* end of step                

 

Listing the Pointer Location

INPUT keeps track of the pointer to the next byte in the buffer to be read.  If there are multiple lines read  on each pass, N= indicates how many lines are available to the pointer.  The LINE= and COL= INFIILE options can pass the current value of the line and column pointers respectively, back to the data step.

 

data _null_;                          /* no dataset needed              */

 infile datalines n=2                 /* infile, two lines of buffer    */    

        line=Linept col=Columnpt;     /* save line and column pointers  */

 input      name $ 1-15               /* read first line of group       */

      #2 @3 id;                       /* next line of group             */

 put linept= columnpt=;               /* display where input pointer is */

datalines;                            /* instream data                  */

J. Brooks

  40974

T. R. Ansen

  4032

;                                     /* end of data                    */

run;                                  /* end of step                    */

 

These statements produce the following log lines as the DATA step executes:

Linept=2 Columnpt=9

Linept=2 Columnpt=8

 

reading binary files

Binary files can be read one record at a time.  Since there is no record terminator, we must specify the record length and indicate that each record is fixed length.  This can be especially useful to read binary files on a Windows or UNIX system  that originated on a Z/OS mainframe.  The INFILE options along with appropriate Z/OS informats will read the record correctly.

 

For example:  read a binary file of lrecl=33 that was built on a Z/OS machine.

 

data x;                               /* build a sas data set             */

 infile testing lrecl=33 recfm=f;     /* fixed record, 33 bytes at a time */

  input @1  id    s370fpd5.           /* packed decimal field             */

        @6  st    $ebcdic2.           /* s370 ebcdic                      */

        @8  ecode $ebcdic5.           /* s370 ebcdic                      */

        @13 qty1  s370fzd5.           /* s370 numeric                     */

        @18 qty2  s370fpd4.4          /* s370 pd                          */

        @22 desc  $ebcdic10.          /* s370 ebcdic                      */

        @32 node  s370fpd1.           /* s370 pd                          */

        @33 bunc  s370fpd1. ;         /* 3370 pd                          */

run;                                  /* end of step                      */

 

dynamically specifying input file names

The INFILE FILEVAR= option specifies a variable that contains the complete name of the file to be read.  Each time the variables value changes INFILE will point to the new file for subsequent INPUT.  This technique allows a way to read many different files including all the members in a library or directory, or just to dynamically choose the files to read. Note that even though a file specification is required by the INFILE statement, it is not actually used as the file name is specified in the FILEVAR= variable.  The FILENAME= variable will be set by SAS when the file changes and can be interrogated or displayed as desired.

This DATA step uses FILEVAR= to read from a different file during each iteration of the DATA step:

 

data allsales;

 length fileloc myinfile $ 300;      /* define vars for dsn       */

 input fileloc $ ;                   /* read instream data        */

 infile indummy filevar=fileloc      /* open file named in fileloc*/

                filename=myinfile    /* give filename reading     */

                end=done;            /* true when reading last rec*/ 

 do while(not done);                 /* do until no more recs     */ 

   input name $ jansale              /* read variables            */

         febsale marsale;            /* more                      */

   output;                           /* output to allsales        */

 end;                                /* next rec this file        */ 

 put 'Finished reading ' myinfile=;  /* display msg for fun       */

 datalines;                         

c:\temp\file1.dat

c:\temp\file2.dat

c:\temp\file3.dat

;                                    /* end of data               */

run;                                 /* end of data step          */ 

 

 

Partial SAS log:

 

NOTE: The infile INDUMMY is:

      File Name=c:\temp\file1.dat,

      RECFM=V,LRECL=256

 

Finished reading myinfile=c:\temp\file1.dat

 

NOTE: The infile INDUMMY is:

      File Name=c:\temp\file2.dat,

      RECFM=V,LRECL=256

 

Finished reading myinfile=c:\temp\file2.dat

 

NOTE: The infile INDUMMY is:

      File Name=c:\temp\file3.dat,

      RECFM=V,LRECL=256

 

Finished reading myinfile=c:\temp\file3.dat

 

NOTE: 1 record was read from the infile INDUMMY.

      The minimum record length was 11.

      The maximum record length was 11.

NOTE: 1 record was read from the infile INDUMMY.

      The minimum record length was 11.

      The maximum record length was 11.

NOTE: 1 record was read from the infile INDUMMY.

      The minimum record length was 11.

      The maximum record length was 11.

NOTE: The data set WORK.ALLSALES has 3 observations and 4 variables.

 

accessing file names with ‘wild cards’

INFILE file specification can be a “wild card” appropriate for your operating system.  INFILE will point to all matching files and read all records from each of them.  The EOV= option names a variable that SAS sets to 1 when the first record in a file in a series of concatenated files is read. The variable is set only after SAS encounters the next file, so it is never true for the first file.  The END= options sets the corresponding variable to true on the last record of all.  Again the FILENAME= variable can be used to determine the file being read.  Note that if the directory being read contains subdirectories, the job will fail with a message about not having enough authority to read.  If this happens, another technique would be needed to read the files.

data allsales;

 length  myinfile $ 300;            /* define vars for dsn       */

 infile 'c:\temp\file*.dat'         /* open all matching files   */

         filename=myinfile          /* give filename reading     */

         eov=first_this_file        /* true 1st rec of each file */

                                    /* except first file         */

         end=done;                  /* true when reading last rec*/ 

 input name $ jansale               /* read variables            */

       febsale marsale;             /* more                      */

 savefile=myinfile;                 /* save in normal variable   */

 if _n_ =1 or                       /* first rec first file      */

   first_this_file then             /* or first rec all others   */       

   put 'Start reading ' myinfile=;  /* display msg for fun       */

 run;         

 

 

NOTE: The infile 'c:\temp\file*.dat' is:

      File Name=c:\temp\file1.dat,

      File List=c:\temp\file*.dat,RECFM=V,LRECL=256

 

NOTE: The infile 'c:\temp\file*.dat' is:

      File Name=c:\temp\file2.dat,

      File List=c:\temp\file*.dat,RECFM=V,LRECL=256

 

NOTE: The infile 'c:\temp\file*.dat' is:

      File Name=c:\temp\file3.dat,

      File List=c:\temp\file*.dat,RECFM=V,LRECL=256

 

Start reading myinfile=c:\temp\file1.dat

Start reading myinfile=c:\temp\file2.dat

Start reading myinfile=c:\temp\file3.dat

 

NOTE: 1 record was read from the infile 'c:\temp\file*.dat'.

      The minimum record length was 11.

      The maximum record length was 11.

NOTE: 1 record was read from the infile 'c:\temp\file*.dat'.

      The minimum record length was 11.

      The maximum record length was 11.

NOTE: 1 record was read from the infile 'c:\temp\file*.dat'.

      The minimum record length was 11.

      The maximum record length was 11.

NOTE: The data set WORK.ALLSALES has 3 observations and 5 variables.

 

 

Specifying an Encoding When Reading an External File

If files being read don’t use the normal encoding schemes of ASCII or EBCDIC, a different encoding scheme can be specified.  This example creates a SAS data set from an external file. The external file's encoding is in UTF-8, and the current SAS session encoding is Wlatin1. By default, SAS assumes that the external file is in the same encoding as the session encoding, which causes the character data to be written to the new SAS data set incorrectly.

 

libname myfiles 'SAS-data-library';

filename extfile 'external-file';

data myfiles.unicode;

   infile extfile encoding="utf-8";

   input Make $ Model $ Year;

run;

platform considerations

Each of the platforms where SAS runs provides numerous INFILE options to process special features of the operating system.  This is usually related to features of the file system but does include some other features as well.

 

One very useful application is reading files originally built on UNIX or a Windows platform and vice versa.  The normal line end for Windows text files is a carriage return character and a line feed character.  The Unix record terminator is only the line feed character.  This is of course very confusing especially since the control characters are unprintable, but unless the terminator characters are known and coded for, the data may be read incorrectly.

 

In the example below, a Unix program is reading a file originally built under Windows.  The TERMSTR=CRLF INFILE option tells the program to return records when both a carriage return and a line feed character are  found.  Without this option, the carriage return character would be read as the last character in the record. 

 

data filefromwin;                            /* build a SAS ds            */

 infile ‘/some_unix.txt’ termstr=crlf;       /* text file ended with CRLF */

 input Name $ Age Rate;                      /* input as normal           */

run;                                         /* run                       */

 

The opposite case is a Windows program reading a text file built on a Unix system.  Since Unix only uses the linefeed character as a record terminator, we need to communicate this to INFILE.

 

data filefrounix;                            /* build a SAS ds            */

 infile ‘\some_win.txt’ termstr=lf;          /* text file ended with LF   */

 input Name $ Age Rate;                      /* input as normal           */

run;                                         /* run                       */

 

ADDITIONAL WINDOWS AND UNIX OPTIONS

Additional INFILE options are available for encoding and other file handling.  The various values for RECFM= allow for reading many types of files including Z/OS variable blocked files and much more.  Please refer to the operating system documentation for INFILE for more details.

 

Z/OS options

There are extensive INFILE options that apply to features found in Z/OS.  Special options are also available for accessing ISAM, VSAM, IMS and other special Z/OS files.  In addition there are options to access system control blocks.

The JFCB (job file control block) is a system block of 176 bytes allocated for every dd statement in a Z/OS job.  The jfcb contains information such as dataset name, device type, catalog status, SYSIN or SYSOUT status, label processing options, and create date.  The following are possible uses for the JFCB:

·         accessing dataset name from JCL for titles

·         determining whether the program is reading a live VSAM file, a sequential backup disk file, or a tape file,

·         determining the file creation date.

 

 Since this is a system control block, layout documentation is needed and the program may need to do bit testing.

 


The following example uses the JFCB to determine the DSNAME and DSORG.

data _null_;                      /* don't need dataset */

 infile in jfcb=jfcbin;           /* ask for jfcb       */

 length titldsn $ 44;             /* set lengths as     */

 length dsorg1 $1.;        /* required           */

 if _n_ = 1 then                  /* first time in ?    */

  do;                             /* yes, do block      */

   titldsn=substr(jfcbin,1,44);   /* extract dsname     */

   dsorg1=substr(jfcbin,99,1);    /* and dsorg byte 1   */

    if dsorg1='.1......'b then    /* bit test as needed */

        dsorgout='PS';            /* must be sequential */

  end;                            /* end of block       */

 input etc. ;                     /* rest of program    */

  . . .

 retain titldsn dsorgout;         /* retain             */  

run;                              /* end of step        */

Another system file called a VTOC (volume table of contents) is a file on each Z/OS disk pack containing information about the files on that disk.  Data set characteristics and other system information can be gathered from the VTOC.  Special VTOC options are available on the INFILE statement to read VTOCS.

The MAPDISK program shown below from the SAS sample library reads VTOCs.

/********* mapdisk *****************************************************/  

/*  this program reads the dscbs in a vtoc and produces a listing      */

/* of all data sets with their attributes and allocation data. the     */

/* volume to be mapped must be described by a disk dd stmt.:           */

/* //disk dd disp=shr,unit=sysda,vol=ser=xxxxxx                        */

/***********************************************************************/ 

data free(keep=loc cyl track total f5dscb)

     dsn (keep=dsname created expires lastref lastmod

          count extents dsorg recfm1‑recfm4 aloc blksize

          lrecl secaloc tt r tracks volume)

     fmt1(keep=dsname created expires lastref lastmod

          count extents dsorg recfm1‑recfm4 aloc blksize

          lrecl secaloc tt r tracks volume cchhr)

     fmt2(keep=cchhr tocchhr)

     fmt3(keep=cchhr alloc3);  length default=4;
  retain trkcyl 0;         /* error if no format 4 encountered */

  length volume volser1 $ 6  cchhr cchhr1 $ 5 ;

  format cchhr cchhr1 $hex10. dscbtype $hex2. ;
 

********read dscb and determine which format*******;

  infile disk vtoc cvaf cchhr=cchhr1 volume=volser1

          column=col ;

  input @45 dscbtype $char1. @;  volume=volser1;

  cchhr=cchhr1;

  if dscbtype='00'x then do; null+1;

        if null>200 then stop;

        return;  end;  null=0;

      if dscbtype= '1' then goto format1;

      if dscbtype= '2' then goto format2;

      if dscbtype= '3' then goto format3;

      if dscbtype= '5' then goto format5;

      if dscbtype= '4' then goto format4;

      if dscbtype= '6' then return;

      _error_=1;return; 

format1:                   ****regular dscb type****;

  input @1 dsname $char44.
    . . .


The power of the sAS filename statement

Even though the INFILE and FILE statements have tremendous power, even more capabilities are available by utilizing the SAS FILENAME statement.  The primary purpose of filename is to assign a nickname (fileref) to a single file.  FILENAME can also assign the fileref to a directory or library in which case INFILE can read members by using parentheses in the dataset name.  The SAS documentation refers to directories or partitioned data sets as “aggregate” storage areas.  There are also default extensions assumed if the member name is not quoted.  The following example reads a file which is a member of a Windows directory.

 

filename mytemp 'c:\temp';           /* assign a fileref to directory*/

data allsales;                       /* build a SAS ds               */

 infile mytemp("file1.dat");         /* open all matching files      */

 input name $ jansale                /* read variables               */

       febsale marsale;              /* more                         */

 run;                                /* end of step                  */ 

 

Assigning a filref to a directory can also allow data step functions to open the directory and interrogate it to find number of members, member names and more.  The following program uses the DOPEN function to open the directory, the DNUM function to determine the number of members, DREAD to return the file name of each member, and the PATHNAME function to return the path of the directory.  Those items along with the FILEVAR= and FILENAME= INFILE options can be used to read all the members in a directory in a different way than we saw earlier.

 

%let mfileref=temp;

filename &mfileref 'c:\temp';

/*******************************************************************************/

/* Loop thru all entries in directory and read each file found.                */

/* *****************************************************************************/

data x;  

  did=dopen("&mfileref");                          /* try to open directory    */

  if did = 0 then                                  /*  if error opening file   */

     putlog "ERROR: Directory not found for mfilenames &mfileref";

  else                                             /* if directory opened ok.  */

     do;                                           /* do block                 */

        memberCount = dnum( did );                 /* get number of members    */

          do j = 1 to memberCount;                 /* for each member in       */

             fileName = dread( did, j );           /* directory.               */

             do;                                   /* do this                  */

                pathname=pathname("&mfileref");    /* for this program.        */

                flow=filename;                     /* grab flow, path_flow     */

                path_flow=trim(pathname)!!'\'!!filename;

                infile dummyf filevar=path_flow    /* point to file  file      */

                              filename=myinfile    /* give filename reading    */

                              end=eof lrecl=80 pad;/* mark end of file, lrecl  */

                do until(eof);                     /* loop thru all records    */

                   input name $ jansale            /* read variables           */

                         febsale marsale;          /* end of input             */

                   savefile=myinfile;              /* save in normal variable  */

                   output;                         /* output to file           */

                end;                               /* end read all pgm lines   */

             end;                                  /* end if index >0          */           

          end;                                     /* end loop thru all pgms   */

          did = close(did);                        /* close program directory  */   

     end;                                          /* if directory opened      */

  stop;                                            /* stop data step.          */

run;                                               /* end of step              */

 

Partial SAS Log:

 

NOTE: The infile DUMMYF is:

      File Name=c:\temp\file1.dat,

      RECFM=V,LRECL=80

 

NOTE: The infile DUMMYF is:

      File Name=c:\temp\file2.dat,

      RECFM=V,LRECL=80

 

NOTE: The infile DUMMYF is:

      File Name=c:\temp\file3.dat,

      RECFM=V,LRECL=80

 

NOTE: 1 record was read from the infile DUMMYF.

      The minimum record length was 11.

      The maximum record length was 11.

NOTE: 1 record was read from the infile DUMMYF.

      The minimum record length was 11.

      The maximum record length was 11.

NOTE: 1 record was read from the infile DUMMYF.

      The minimum record length was 11.

      The maximum record length was 11.

NOTE: The data set WORK.X has 3 observations and 11 variables.

 

reading from an ftp site

Not only can FILENAME point to files on your current machine, but using the FTP options our programs can read and write data to any authorized FTP server.  No extra products besides base SAS and FTP are required.  Programs can be very flexible by reading the data from other computers, but keep in mind that there will be transfer time to move the data from the other machine.

This example reads a file called sales in the directory /u/kudzu/mydata from the remote UNIX host hp720:

 

filename myfile ftp 'sales' cd='/u/kudzu/mydata'

         user='guest' host='hp720.hp.sas.com'

         recfm=v prompt;

 

data mydata / view=mydata;   /* Create a view */

   infile myfile;

   input x $10. y 4.;

run;i

 

proc print data=mydata;     /* Print the data */

run;

 

reading from A URL

FILENAME can also point to a web page that contains data that we might be interested in.  This effectively opens our program to millions of online files.

This example reads the first 15 records from a URL file and writes them to the SAS log with a PUT statement:

 

 

filename mydata url

    'http://support.sas.com/techsup/service_intro.html';

      

data _null_;

   infile mydata length=len;

   input record $varying200. len;

   put record $varying200. len;

   if _n_=15 then stop;

run;

 

FILE STATEMENT OVERVIEW

The FILE statement in most respects is the opposite of the INFILE statement.  Its function is to define an output raw file.  Many of the options are the same for INFILE and FILE processing, except that the direction of data is coming out of the DATA step.  Since the FILE statement can also define reports in SAS, however, there are some unique options for that purpose.  The SAS FILE statement also can interact with ODS to route reports to different destinations.  Since so many INFILE options have already been covered, those options that work the same way for FILE will not be covered again.

By default, PUT statement output is written to the SAS log. Use the FILE statement to route this output to either the same external file to which procedure output is written or to a different external file. You can indicate whether or not carriage control characters should be added to the file.

You can use the FILE statement in conditional (IF-THEN) processing because it is executable. You can also use multiple FILE statements to write to more than one external file in a single DATA step.

 

 

Basic file Syntax

 

 

FILE file-specification <options> <operating-environment-options>;

 

file-specification

identifies an external file that the DATA step uses to write output from a PUT statement. File-specification can have these forms:

'external-file'

specifies the physical name of an external file which is enclosed in quotation marks. The physical name is the name by which the operating environment recognizes the file.

fileref

specifies the fileref of an external file.  (note:  The fileref must be previously associated with an external file in a FILENAME statement, FILENAME function, or a system command.)

fileref(file)

specifies a fileref of an aggregate storage location and the name of a file or member, enclosed in parentheses, that resides in that location.

LOG

is a reserved fileref that directs the output produced by any PUT statement to the SAS log. 

 

PRINT

is a reserved fileref that directs the output produced by any PUT statement to the same file as the output that is produced by SAS procedures.

Report Writing with FILE and PUT

The DATA step can produce any report.  Some of the DATA step programming features are

          FILE statement points to the report (or file) being written

          PUT statement does the actual writing

          pointers, formats, and column outputs are available

          end of file indicators,  control break indicators

          has automatic header routines

          N=PS option allows access to full page at one time (default is one line or record at a time)

          All the DATA step programming power

Writing to the SAS LOG

The default file is the SAS log file.  The PUTLOG statement always writes to the log and the PUT statement writes to the most recently executed FILE statement.  Since the SAS log contains SAS notes and other messages along with our output, the report may not be nice looking.  It is, however, an excellent way to debug your DATA step logic by displaying variable information.

data _null_ ;

  infile rawin;