Reference Manual
Erik H.
Baalbergen |
This section discusses the extensions to and deviations from the C language, as described in [1]. The issues are numbered according to the reference manual.
Upper and lower case letters are different. The number of significant letters is 32 by default, but may be set to another value using the −M option. The identifier length should be set according to the rest of the compilation programs.
The keyword asm is recognized. However, the statement
asm(string);
is skipped, while a warning is given.
The enum keyword is recognized and interpreted.
The words entry and fortran are reserved under the restricted option. The words are not interpreted by the compiler.
The type of an integer constant is the first of the corresponding list in which its value can be represented. Decimal: int, long, unsigned long; octal or hexadecimal: int, unsigned, long, unsigned long; suffixed by the letter L or l: long, unsigned long.
A character constant is a sequence of 1 up to sizeof(int) characters enclosed in single quotes. The value of a character constant ’c 1 c 2 ...c n ’ is d n +M×d n−1 +...+M n−1 ×d 2 +M n ×d 1 , where M is 1 + maximum unsigned number representable in an unsigned char, and d i is the signed value (ASCII) of character c i .
The compiler does not support compile-time floating point arithmetic.
The compiler is capable of producing EM code for machines with the following properties
• |
a char is 8 bits |
• |
the size of int is equal to the word size |
• |
the size of short may not exceed the size of int |
• |
the size of int may not exceed the size of long |
• |
the size of pointers is equal to the size of either short, int or long |
Objects of type char are taken to be signed. The combination unsigned char is legal.
The type combinations unsigned char, unsigned short and unsigned long are supported.
The data type enum is implemented as described in Recent Changes to C (see appendix A). Cem treats enumeration variables as if they were int.
Type void is implemented. The type specifies an empty set of values, which takes no storage space.
The names of the fundamental types can be redefined by the user, using typedef.
The order of evaluation of expressions depends on the complexity of the subexpressions. In case of commutative operations, the most complex subexpression is evaluated first. Parameter lists are evaluated from right to left.
The type of a sizeof expression is unsigned int.
Both the second and the third expression in a conditional expression may include assignment operators. They may be structs or unions.
Structures may be assigned, passed as arguments to functions, and returned by functions. The types of operands taking part must be the same.
The combinations unsigned char, unsigned short and unsigned long are implemented.
Fields of any integral type, either signed or unsigned, are supported, as long as the type fits in a word on the target machine.
Fields are left adjusted by default; the first field is put into the left part of a word, the next one on the right side of the first one, etc. The -Vr option in the call of the compiler causes fields to be right adjusted within a machine word.
The tags of structs and unions occupy a different name space from that of variables and that of member names.
The type of expression in
switch (expression) statement
must be integral. A warning is given under the restricted option if the type is long.
See [4] for a discussion on this complicated issue.
Structures may be passed as arguments to functions, and returned by functions.
Typedef names may be redeclared like any other variable name; the ice mentioned in §11.1 is walked correctly.
Lines which do not occur within comment, and with # as first character, are interpreted as compiler control line. There may be an arbitrary number of spaces, tabs and comments (collectively referred as white space) following the #. Comments may contain newline characters. Control lines with only white space between the # and the line separator are skipped.
The #include, #ifdef, #ifndef, #undef, #else and #endif control lines and line directives consist of a fixed number of arguments. The list of arguments may be followed an arbitrary sequence of characters, in which comment is interpreted as such. (I.e., the text between /* and */ is skipped, regardless of newlines; note that commented-out lines beginning with # are not considered to be control lines.)
The replacement text of macros is taken to be a string of characters, in which an identifier may stand for a formal parameter, and in which comment is interpreted as such. Comments and newline characters, preceeded by a backslash, in the replacement text are replaced by a space character.
The actual parameters of a macro are considered tokens and are balanced with regard to (), {} and []. This prevents the use of macros like
CTL([)
Formal parameters of a macro must have unique names within the formal-parameter list of that macro.
A message is given at the definition of a macro if the macro has already been #defined, while the number of formal parameters differ or the replacement texts are not equal (apart from leading and trailing white space).
Recursive use of macros is detected by the compiler.
Standard #defined macros are
__FILE__ name of current input file as string constant __DATE__ curent date as string constant; e.g. "Tue Wed 2 14:45:23 1986" __LINE__ current line number as an integer
No message is given if identifier is not known in
#undef identifier
A newline character is appended to each file which is included.
The #if, #ifdef and #ifndef control lines may be followed by an arbitrary number of
#elif constant-expression
control lines, before the corresponding #else or #endif is encountered. The construct
#elif constant-expression some text #endif /* corresponding to #elif */
is equivalent to
#else #if constant-expression some text #endif /* corresponding to #if */ #endif /* corresponding to #else */
The constant-expression in #if and #elif control lines may contain the construction
defined(identifier)
which is replaced by 1, if identifier has been #defined, and by 0, if not.
Comments in skipped lines are interpreted as such.
Line directives may occur in the following forms:
#line constant #line constant "filename" #constant #constant "filename"
Note that filename is enclosed in double quotes.
If a pointer to a function is called, the function the pointer points to is called instead.
The compiler distinguishes the following types of integral constant expressions
• |
field-width specifier |
• |
case-entry specifier |
• |
array-size specifier |
• |
global variable initialization value |
• |
enum-value specifier |
• |
truth value in #if control line |
Constant integral expressions are compile-time evaluated while an effort is made to report overflow. Constant floating expressions are not compile-time evaluated.
−C |
Run the preprocessor stand-alone while maintaining the comments. Line directives are produced whenever needed. |
−Dname=string-of-characters
Define name as macro with string-of-characters as replacement text. |
−Dname
Equal to −Dname=1. |
−E |
Run the preprocessor stand alone, i.e., list the sequence of input tokens and delete any comments. Line directives are produced whenever needed. |
−Ipath
Prepend path to the list of include directories. To put the directories "include", "sys/h" and "util/h" into the include directory list in that order, the user has to specify -Iinclude -Isys/h -Iutil/h An empty path causes the standard include directory (usually /usr/include) to be forgotten. |
−Mn |
Set maximum significant identifier length to n. |
−n |
Suppress EM register messages. The user-declared variables are not stored into registers on the target machine. |
−p |
Generate the EM fil and lin instructions in order to enable an interpreter to keep track of the current location in the source code. |
−P |
Equivalent with −E, but without line directives. |
−R |
Interpret the input as restricted C (according to the language as described in [1]). |
−Tpath
Create temporary files, if necessary, in directory path. |
−Uname
Get rid of the compiler-predefined macro name, i.e., consider #undef name to appear in the beginning of the file. |
−Vcm.n, −Vcm.ncm.n ...
Set the size and alignment requirements. The letter c indicates the simple type, which is one of s(short), i(int), l(long), f(float), d(double) or p(pointer). If c is S or U, then n is taken to be the initial alignment of structs or unions, respectively. The effective alignment of a struct or union is the least common multiple of the initial struct/union alignment and the alignments of its members. The m parameter can be used to specify the length of the type (in bytes) and the n parameter for the alignment of that type. Absence of m or n causes the default value to be retained. To specify that the bitfields should be right adjusted instead of the default left adjustment, specify r as c parameter. |
−w |
Suppress warning messages |
−−character
Set debug-flag character. This enables some special features offered by a debug and develop version of the compiler. Some particular flags may be recognized, others may have surprising effects. |
d |
Generate a dependency graph, reflecting the calling structure of functions. Lines of the form |
DFA: calling-function: called-function are generated whenever a function call is encountered. |
f |
Dump whole identifier table, including macros and reserved words. |
h |
Supply hash-table statistics. |
i |
Print names of included files. |
m |
Supply statistics concerning the memory allocation. |
t |
Dump table of identifiers. |
u |
Generate extra statistics concerning the predefined types and identifiers. Works in combination with f or t. |
x |
Print expression trees in human-readable format. |
[1] |
Brian W. Kernighan, Dennis M. Ritchie, The C Programming Language |
[2] |
L. Rosler, Draft Proposed Standard - Programming Language C, ANSI X3J11 Language Subcommittee |
[3] |
Erik H. Baalbergen, Dick Grune, Maarten Waage, The CEM Compiler, Informatica Manual IM-4, Dept. of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, The Netherlands |
[4] |
Erik H. Baalbergen, Modeling global declarations in C, internal paper |
The syntax is
enum-specifier: |
enum { enum-list } |
enum-list : |
enumerator |
enumerator : |
identifier |
The identifier has the same role as the structure tag in a struct specification. It names a particular enumeration type.
The identifiers in the enum-list are declared as constants, and may appear whenever constants are required. If no enumerators with = appear, then the values of the constants begin at 0 and increase by 1 as the declaration is read from left to right. An enumerator with = gives the associated identifier the value indicated; subsequent identifiers continue the progression from the assigned value.
Enumeration tags and constants must all be distinct, and, unlike structure tags and members, are drawn from the same set as ordinary identifiers.
Objects of a given enumeration type are regarded as having a type distinct from objects of all other types.
The bold-faced and italicized tokens represent terminal symbols.
external definitions program: external-definition* external-definition: ext-decl-specifiers [declarator [function | non-function] | ’;’] | asm-statement ext-decl-specifiers: decl-specifiers? non-function: initializer? [’,’ init-declarator]* ’;’ function: declaration* compound-statement declarations declaration: decl-specifiers init-declarator-list? ’;’ decl-specifiers: other-specifier+ [single-type-specifier other-specifier*]? | single-type-specifier other-specifier* other-specifier: auto | static | extern | typedef | register | short | long | unsigned type-specifier: decl-specifiers single-type-specifier: type-identifier | struct-or-union-specifier | enum-specifier init-declarator-list: init-declarator [’,’ init-declarator]* init-declarator: declarator initializer? declarator: primary-declarator [’(’ formal-list ? ’)’ | arrayer]* | ’*’ declarator primary-declarator: identifier | ’(’ declarator ’)’ arrayer: ’[’ constant-expression? ’]’ formal-list: formal [’,’ formal]* formal: identifier enum-specifier: enum [enumerator-pack | identifier enumerator-pack?] enumerator-pack: ’{’ enumerator [’,’ enumerator]* ’,’? ’}’ enumerator: identifier [’=’ constant-expression]? struct-or-union-specifier: [ struct | union] [ struct-declaration-pack | identifier struct-declaration-pack?] struct-declaration-pack: ’{’ struct-declaration+ ’}’ struct-declaration: type-specifier struct-declarator-list ’;’? struct-declarator-list: struct-declarator [’,’ struct-declarator]* struct-declarator: declarator bit-expression? | bit-expression bit-expression: ’:’ constant-expression initializer: ’=’? initial-value cast: ’(’ type-specifier abstract-declarator ’)’ abstract-declarator: primary-abstract-declarator [’(’ ’)’ | arrayer]* | ’*’ abstract-declarator primary-abstract-declarator: [’(’ abstract-declarator ’)’]? statements statement:
expression-statement |
||
| label ’:’ statement |
||
| compound-statement |
||
| if-statement |
||
| while-statement |
||
| do-statement |
||
| for-statement |
||
| switch-statement |
||
| case-statement |
||
| default-statement |
||
| break-statement |
||
| continue-statement |
||
| return-statement |
||
| jump |
||
| ’;’ |
||
| asm-statement |
||
; |
expression-statement: expression
’;’
label: identifier
if-statement: if ’(’ expression
’)’ statement [else
statement]?
while-statement: while ’(’
expression ’)’ statement
do-statement: do statement while
’(’ expression ’)’
’;’
for-statement: for ’(’ expression?
’;’ expression? ’;’
expression? ’)’ statement
switch-statement: switch ’(’
expression ’)’ statement
case-statement: case constant-expression
’:’ statement
default-statement: default ’:’
statement
break-statement: break ’;’
continue-statement: continue
’;’
return-statement: return expression?
’;’
jump: goto identifier ’;’
compound-statement: ’{’ declaration*
statement* ’}’
asm-statement: asm ’(’
string ’)’
’;’
expressions
initial-value: assignment-expression |
initial-value-pack
initial-value-pack: ’{’
initial-value-list ’}’
initial-value-list: initial-value [’,’
initial-value]* ’,’?
primary: identifier | constant | string |
’(’ expression ’)’
secundary: primary [index-pack | parameter-pack |
selection]*
index-pack: ’[’ expression
’]’
parameter-pack: ’(’ parameter-list?
’)’
selection: [’.’ |
’−>’] identifier
parameter-list: assignment-expression
[’,’ assignment-expression]*
postfixed: secundary postop?
unary: cast unary | postfixed | unop unary | size-of
size-of: sizeof [cast | unary]
binary-expression: unary [binop binary-expression]*
conditional-expression: binary-expression
[’?’ expression ’:’
assignment-expression]?
assignment-expression: conditional-expression [asgnop
assignment-expression]?
expression: assignment-expression [’,’
assignment-expression]*
unop: ’*’ | ’&’ |
’−’ | ’!’ |
’~ ’ | ’++’ |
’−−’
postop: ’++’ |
’−−’
multop: ’*’ | ’/’ |
’%’
addop: ’+’ |
’−’
shiftop: ’<<’ |
’>>’
relop: ’<’ | ’>’
| ’<=’ |
’>=’
eqop: ’==’ | ’!=’
arithop: multop | addop | shiftop |
’&’ | ’^ ’ |
’|’
binop: arithop | relop | eqop |
’&&’ |
’||’
asgnop: ’=’ | ’+’
’=’ | ’−’
’=’ | ’*’
’=’ | ’/’
’=’ | ’%’
’=’
| ’<<’ ’=’ | ’>>’ ’=’ | ’&’ ’=’ | ’^ ’ ’=’ | ’|’ ’=’ |
|
| ’+=’ | ’−=’ | ’*=’ | ’/=’ | ’%=’ |
|
| ’<<=’ | ’>>=’ | ’&=’ | ’^=’ | ’|=’ |
constant: integer | floating
constant-expression: assignment-expression
identifier: identifier | type-identifier