SILE accepts input in several formats. One is XML. Yes, just XML, no special input language required. Use any tooling you want to create XML. You can either target SILE's commands with XML tags or provide a module that handles the tag schema in your document.
Secondarily for those that want it, a custom intput syntax can be used that is somewhat less verbose and easier to type than XML.
We call it the SIL format (Sile Input Language).
Parsers
The current official reference parser is the Lua LPEG based EPNF variant found in inputters/sil-epnf.lua. Recently we've been working to define a formal grammar spec using ABNF syntax. The current version of this is distributed as sil.abnf along with SILE sources.
sil.abnf
; Formal grammar specification for SIL (SILE Input Language) files
;
; Uses RFC 5234 (Augmented BNF for Syntax Specifications: ABNF)
; Uses RFC 7405 (Case-Sensitive String Support in ABNF)
; IMPORTANT CAVEAT:
; Backus-Naur Form grammars (like ABNF and EBNF) do not have a way to
; express matching opening and closing tags. The grammar below does
; not express SILE's ability to skip over passthrough content until
; it hits the matching closing tag for environments.
; A master document can only have one top level content item, but we allow
; loading of fragments as well which can have any number of top level content
; items, hence valid grammar can be any number of content items.
document = *content
; Top level content can be any sequence of these things
content = environment
content =/ comment
content =/ text
content =/ braced-content
content =/ command
; Environments come in two flavors, passthrough (raw) and regular. The
; difference is what is allowed to terminate them and what escapes are needed
; for the content in the middle.
environment = %s"\begin" [ options ] "{" passthrough-command-id "}"
env-passthrough-text
%s"\end{" passthrough-command-id "}"
; ^^^^^^^^^^^^^^^^^^^^^^
; End command must match id used in begin, see caveat at top
environment =/ %s"\begin" [ options ] "{" command-id "}"
content
%s"\end{" command-id "}"
; ^^^^^^^^^^
; End command must match id used in begin, see caveat at top
; Passthrough (raw) environments can have any valid UTF-8 except the closing
; delimiter matching the opening, per the environment rule.
env-passthrough-text = *utf8-char
; Nothing to see here.
; But potentially important because it eats newlines!
comment = "%" *utf8-char CRLF
; Input strings that are not special
text = *text-char
; Input content wrapped in braces can be attached to a command or used to
; manually isolate chunks of content (e.g. to hinder ligatures).
braced-content = "{" content "}"
; As with environments, the content format may be passthrough (raw) or more SIL
; content depending on the command.
command = "\" passthrough-command-id [ options ] [ braced-passthrough-text ]
command =/ "\" command-id [ options ] [ braced-content ]
; Passthrough (raw) command text can have any valid UTF-8 except an unbalanced
; closing delimiter
braced-passthrough-text = "{"
*( braced-passthrough-text / braced-passthrough-char )
"}"
braced-passthrough-char = %x00-7A ; omit {
braced-passthrough-char =/ %x7C ; omit }
braced-passthrough-char =/ %x7E-7F ; end of utf8-1
braced-passthrough-char =/ utf8-2
braced-passthrough-char =/ utf8-3
braced-passthrough-char =/ utf8-4
options = "[" parameter *( "," parameter ) "]"
parameter = *WSP identifier *WSP "=" *WSP ( quoted-value / value ) *WSP
quoted-value = DQUOTE *quoted-value-char DQUOTE
quoted-value-char = "\" %x22
quoted-value-char =/ %x00-21 ; omit "
quoted-value-char =/ %x23-7F ; end of utf8-1
quoted-value-char =/ utf8-2
quoted-value-char =/ utf8-3
quoted-value-char =/ utf8-4
value = *value-char
value-char = %x00-21 ; omit "
value-char =/ %x23-2B ; omit ,
value-char =/ %x3C-5C ; omit ]
value-char =/ %x3E-7F ; end of utf8-1
value-char =/ utf8-2
value-char =/ utf8-3
value-char =/ utf8-4
text-char = "\" ( %x5C / %x25 / %x7B / %x7D )
text-char =/ %x00-24 ; omit %
text-char =/ %x26-5B ; omit \
text-char =/ %x5D-7A ; omit {
text-char =/ %x7C ; omit }
text-char =/ %x7E-7F ; end of utf8-1
text-char =/ utf8-2
text-char =/ utf8-3
text-char =/ utf8-4
letter = ALPHA / "_" / ":"
identifier = letter *( letter / DIGIT / "-" / "." )
passthrough-command-id = %s"ftl"
/ %s"lua"
/ %s"math"
/ %s"raw"
/ %s"script"
/ %s"sil"
/ %s"use"
/ %s"xml"
command-id = identifier
; ASCII isn't good enough for us.
utf8-char = utf8-1 / utf8-2 / utf8-3 / utf8-4
utf8-1 = %x00-7F
utf8-2 = %xC2-DF utf8-tail
utf8-3 = %xE0 %xA0-BF utf8-tail
/ %xE1-EC 2utf8-tail
/ %xED %x80-9F utf8-tail
/ %xEE-EF 2utf8-tail
utf8-4 = %xF0 %x90-BF 2utf8-tail
/ %xF1-F3 3utf8-tail
/ %xF4 %x80-8F 2utf8-tail
utf8-tail = %x80-BF
This grammar can be converted to a W3C EBNF grammar:
sil.ebnf
document ::= content*
content ::= environment
| comment
| text
| braced-content
| command
environment
::= '\begin' options? '{' passthrough-command-id '}' env-passthrough-text '\end{' passthrough-command-id '}'
| '\begin' options? '{' command-id '}' content '\end{' command-id '}'
env-passthrough-text
::= utf8-char*
comment ::= '%' utf8-char* CRLF
text ::= text-char*
braced-content
::= '{' content '}'
command ::= '\' passthrough-command-id options? braced-passthrough-text?
| '\' command-id options? braced-content?
braced-passthrough-text
::= '{' ( braced-passthrough-text | braced-passthrough-char )* '}'
braced-passthrough-char
::= [#x0-#x7A]
| '|'
| [#x7E-#x7F]
| utf8-2
| utf8-3
| utf8-4
options ::= '[' parameter ( ',' parameter )* ']'
parameter
::= WSP* identifier WSP* '=' WSP* ( quoted-value | value ) WSP*
quoted-value
::= DQUOTE quoted-value-char* DQUOTE
quoted-value-char
::= '\' '"'
| [#x0-#x21]
| [#x23-#x7F]
| utf8-2
| utf8-3
| utf8-4
value ::= value-char*
value-char
::= [#x0-#x21]
| [#x23-#x2B]
| [<-\]
| [#x3E-#x7F]
| utf8-2
| utf8-3
| utf8-4
text-char
::= '\' ( '\' | '%' | '{' | '}' )
| [#x0-#x24]
| [&-[]
| [#x5D-#x7A]
| '|'
| [#x7E-#x7F]
| utf8-2
| utf8-3
| utf8-4
letter ::= ALPHA
| '_'
| ':'
identifier
::= letter ( letter | DIGIT | '-' | '.' )*
passthrough-command-id
::= 'ftl'
| 'lua'
| 'math'
| 'raw'
| 'script'
| 'sil'
| 'use'
| 'xml'
command-id
::= identifier
utf8-char
::= utf8-1
| utf8-2
| utf8-3
| utf8-4
utf8-1 ::= [#x0-#x7F]
utf8-2 ::= [#xC2-#xDF] utf8-tail
utf8-3 ::= #xE0 [#xA0-#xBF] utf8-tail
| [#xE1-#xEC] utf8-tail utf8-tail
| #xED [#x80-#x9F] utf8-tail
| [#xEE-#xEF] utf8-tail utf8-tail
utf8-4 ::= #xF0 [#x90-#xBF] utf8-tail utf8-tail
| [#xF1-#xF3] utf8-tail utf8-tail utf8-tail
| #xF4 [#x80-#x8F] utf8-tail utf8-tail
utf8-tail
::= [#x80-#xBF]
Railroad digrams and EBNF snippets
What followes is EBNF grammar snippets and railroad diagrams for the syntax.
document:
document ::= content*
content:
content ::= environment
| comment
| text
| braced-content
| command
referenced by:
- braced-content
- document
- environment
environment:
environment
::= '\begin' options? '{' passthrough-command-id '}' env-passthrough-text '\end{' passthrough-command-id '}'
| '\begin' options? '{' command-id '}' content '\end{' command-id '}'
referenced by:
- content
env-passthrough-text:
env-passthrough-text
::= utf8-char*
referenced by:
- environment
comment:
comment ::= '%' utf8-char* CRLF
referenced by:
- content
text:
text ::= text-char*
referenced by:
- content
braced-content:
braced-content
::= '{' content '}'
referenced by:
- command
- content
command:
command ::= '\' passthrough-command-id options? braced-passthrough-text?
| '\' command-id options? braced-content?
referenced by:
- content
braced-passthrough-text:
braced-passthrough-text
::= '{' ( braced-passthrough-text | braced-passthrough-char )* '}'
referenced by:
- braced-passthrough-text
- command
braced-passthrough-char:
braced-passthrough-char
::= [#x0-#x7A]
| '|'
| [#x7E-#x7F]
| utf8-2
| utf8-3
| utf8-4
referenced by:
- braced-passthrough-text
options:
options ::= '[' parameter ( ',' parameter )* ']'
referenced by:
- command
- environment
parameter:
parameter
::= WSP* identifier WSP* '=' WSP* ( quoted-value | value ) WSP*
referenced by:
- options
quoted-value:
quoted-value
::= DQUOTE quoted-value-char* DQUOTE
referenced by:
- parameter
quoted-value-char:
quoted-value-char
::= '\' '"'
| [#x0-#x21]
| [#x23-#x7F]
| utf8-2
| utf8-3
| utf8-4
referenced by:
- quoted-value
value:
value ::= value-char*
referenced by:
- parameter
value-char:
value-char
::= [#x0-#x21]
| [#x23-#x2B]
| [<-\]
| [#x3E-#x7F]
| utf8-2
| utf8-3
| utf8-4
referenced by:
- value
text-char:
text-char
::= '\' ( '\' | '%' | '{' | '}' )
| [#x0-#x24]
| [&-[]
| [#x5D-#x7A]
| '|'
| [#x7E-#x7F]
| utf8-2
| utf8-3
| utf8-4
referenced by:
- text
letter:
letter ::= ALPHA
| '_'
| ':'
referenced by:
- identifier
identifier:
identifier
::= letter ( letter | DIGIT | '-' | '.' )*
referenced by:
- command-id
- parameter
passthrough-command-id:
passthrough-command-id
::= 'ftl'
| 'lua'
| 'math'
| 'raw'
| 'script'
| 'sil'
| 'use'
| 'xml'
referenced by:
- command
- environment
command-id:
command-id
::= identifier
referenced by:
- command
- environment
utf8-char:
utf8-char
::= utf8-1
| utf8-2
| utf8-3
| utf8-4
referenced by:
- comment
- env-passthrough-text
utf8-1:
utf8-1 ::= [#x0-#x7F]
referenced by:
- utf8-char
utf8-2:
utf8-2 ::= [#xC2-#xDF] utf8-tail
referenced by:
- braced-passthrough-char
- quoted-value-char
- text-char
- utf8-char
- value-char
utf8-3:
utf8-3 ::= #xE0 [#xA0-#xBF] utf8-tail
| [#xE1-#xEC] utf8-tail utf8-tail
| #xED [#x80-#x9F] utf8-tail
| [#xEE-#xEF] utf8-tail utf8-tail
referenced by:
- braced-passthrough-char
- quoted-value-char
- text-char
- utf8-char
- value-char
utf8-4:
utf8-4 ::= #xF0 [#x90-#xBF] utf8-tail utf8-tail
| [#xF1-#xF3] utf8-tail utf8-tail utf8-tail
| #xF4 [#x80-#x8F] utf8-tail utf8-tail
referenced by:
- braced-passthrough-char
- quoted-value-char
- text-char
- utf8-char
- value-char
utf8-tail:
utf8-tail
::= [#x80-#xBF]
referenced by:
- utf8-2
- utf8-3
- utf8-4