|
|
White Papers | Articles | Featured Guests | Presentations | References |
by Rick Perry
16 September 2001
1. Introduction
2. VDL Format
3. String and Logic Examples
4. XOR and NOT
5. Concatenation and Offsets
6. White space
7. Absolute Offsets
8. Phone Numbers
9. Digits
10. Only Digits
11. | (Low-level Or)
12. Byte Expressions
13. Fuzzy Expressions
14. Repetition Expressions
15. Defining VDL Macros
16. Using VDL Macros
17. VDL Macro Examples
18. File Type Restriction Directives
19. File Type Restriction Examples
20. VDL Version Reporting
21. VFind --vdlc= Option
22. CVDL Syntax Summary
Copyright © August 2001 by CyberSoft, Incorporated.
Permission is granted to any individual or institution to use, copy, or redistribute this document so long as it is not sold for profit, and provided that it is reproduced whole and this copyright notice is retained.
CVDL is the CyberSoft Virus Description Language.
It is used to define patterns for virus scanning by the VFind® Security ToolKit from CyberSoft, Incorporated.
Documentation for CVDL is available in the original 1996 paper and an update describing new features.
The presentation here is a tutorial on using CVDL to create your own patterns. It starts with some simple examples, then presents an overview of all of the CVDL operators.
** VFind Version 11.3.0 or higher is required **
The format for specifying a VDL is:
: name , definition #
The ex1 example VDL uses some strings and logic operators:
:ex1, "pets" AND "cat" OR "dog" #
Due to the higher precedence of AND as compared to OR, VDL ex1 from above is equivalent to:
:ex1, ("pets" AND "cat") OR "dog" #
which will match data that contains either: the word pets anywhere and the word cat anywhere; or the word dog anywhere.
If the intention is to match data containing the word pets anywhere, and either the word dog or the word cat anywhere, the VDL should be written like this instead:
:ex2, "pets" AND ("cat" OR "dog") #
which is equivalent to this more verbose form:
:ex2, ("pets" AND "cat") OR ("pets" AND "dog") #
XOR is the exclusive-OR operator, which by definition can be expressed using AND and OR.
For example, VDL ex3
; guns or bullets, but not both
;
:ex3, ("guns" AND NOT "bullets") OR ("bullets" AND NOT "guns") #
can be expressed more efficiently using XOR:
:ex3, "guns" XOR "bullets" # ; guns or bullets, but not both
ex3 also illustrates the use of the semicolon (;) in VDL files to create comments which extend to the end of the line.
NOT by itself seems to be a strange logical operator for pattern matching, since it means that we have a match if some pattern is not present.
However, consider a situation where all files or email must contain a certain notice, and we want to detect the lack of that notice. An example pattern is:
:ex4, NOT "Copyright 2001, CyberSoft, Inc." #
VDL pattern elements are concatenated into larger patterns by using a comma, for example:
"abc", "def"
is the same as:
"abcdef"
You can specify an offset range for concatenation of strings and other VDL patterns using the @ operator, for example:
"abc", @0-10, "def"
will match "abc" followed by "def" at an offset of anywhere from 0 to 10 bytes from the end of "abc". So this will match "abcdef", "abcXdef", ..., "abcXXXXXXXXXXdef", where X represents any byte.
There are two special operators, WS0 and WS1, for offsets consisting of white space characters, where white space is defined as " " (blank), "\t" (tab), or "\\\n" (backs lash newline):
Examples:
"/bin/rm", WS1, "-rf", WS1, "/" "cat", WS0, ">>", WS0, "/etc/passwd" ~"Bulletproof", WS1, ~"Web", WS1, ~"Hosting"
The ABS operator is used to specify an absolute offset from the beginning of the scanned data to match a pattern.
Example VDLs:
:a1, ABS 0, "#!", WS0, "/bin/sh" # :a2, "abc", @0-20, "def" AND ABS 14, "01234" # :MS/VBA, ABS 0, "\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1" AND "\xFE\xCA" #
The a1 VDL checks for a Bourne shell script file header.
The a2 VDL checks for "abc" followed by "def" within the next 20 bytes, and "01234" at absolute position 14.
The MS/VBA VDL uses ABS 0 to check for the 8-byte Microsoft signature header which appears at the very beginning of most Microsoft application files.
Examples:
~~"123-456-7890"
matches "(123)-456-7890" and "123.456.7890" and
"1 2 3 - 4 5 6 - 7 8 9 0", etc.
~~"800 FREE CAR"
matches "(800) - F r e e C a r !!!", etc.
The \d+ operator matches one or more digits, and is named after the similar Perl operator.
This can be used, for example, to detect obfuscated URLs:
"http://", \d+, "/"
matches URLs like http://3626287830/
"http://0", \d+, ".", "0", \d+, ".", "0", \d+, ".", "0", \d+, "/"
matches URLs like http://00000325.0000030.00000341.00000116/
can also be written using a macro:
$define zerod "0", \d+
"http://", $zerod, ".", $zerod, ".", $zerod, ".", $zerod, "/"
CVDL macros will be discussed in more detail later in the tutorial.
The ~# operator matches only digits, skipping all other characters, over a default maximum range of 30 bytes of scanned input data.
The maximum range of scanned input data can be specified by placing a number between ~# and the digit string.
Examples:
~#"code 1234 sub-code 567"
matches the digits 1234567 in sequence, regardless of any intervening
non-digit characters, over any 30 byte range of scanned input data,
e.g. it will match "1abc2efg34---5 6 7"
~#60"code 1234 sub-code 567"
As above, but over a maximum range of 60 bytes of input data.
The low-level or operator | specifies the occurrence of patterns at a position relative to the preceding pattern in the scanned data.
For example:
"x", ("a" | "b"), "y"
matches "x", followed by "a" or "b", followed by "y"
at some position in the scanned data.
Do not confuse the low-level | operator with the high-level OR operator. The high-level OR operator specifies the occurrence of patterns at any positions in the scanned data.
Bytes, byte ranges, and compliments of bytes and byte ranges can be specified using characters and decimal or hex integers. For example:
65
0x41
'\x41'
'A'
any of these matches a byte whose value is 65 (decimal)
'a'-'f'
matches a byte whose value is in the range 0x61-0x7a
^0-10
matches a byte whose value is not in the range 0-10
Examples:
FUZZY 2 100 FUZZY +-2 100 are the same as: 98-102 FUZZY -2 +2 100 FUZZY -2 +3 "cow" is the same as: 'a'-'f', 'm'-'r', 'u'-'z'
Multiple occurrences of bytes, byte ranges, strings, or case-insensitive strings can be specified by using [number] after the expression. For example:
15[20]
0xF[0x14]
either form matches 20 occurrences of the byte value 15
0-10[40]
matches a sequence of 40 bytes whose values are in the range 0..10
"X"[3]
matches "XXX"
A VDL macro is specified using $define as the first word on a line, and the entire macro definition must be contained all on one line.
The syntax is:
$define name value
where the line contains: optional leading white space, $define, white space, name, white space, value.
VDL macros are invoked by specifying their name after a $ character.
Macros are lexical tokens, which means that they can not be confused with other tokens, e.g. strings. Thus:
"abc", $mac, ...
Invokes the macro named mac, but:
"abc$mac", ...
Does not invoke any macro, and is simply a literal string.
$define pf1 $pets AND $food
$define pets "dog" OR "cat"
$define food "fish" OR "pie"
$define pf2 ($pets) AND ($food)
:v1, $pf1 AND "ate" #
:v2, "ate" AND $pf2 #
Note that pf1 resolves to: "dog" OR "cat" AND "fish" OR "pie"
which is the same as:
"dog" OR ("cat" AND "fish") OR "pie"
but pf2 resolves to: ("dog" OR "cat") AND ("fish" OR "pie")
As with C/C++ #define macros, parentheses may be used in the VDL macro definition or invocation to ensure that the intended result is obtained.
File type restriction directives may be specified in CVDL files.
The directives and their meanings are:
<"...",...> specifies a list of file types to scan,
i.e. scan only the file types specified.
<!"...",...> specifies a list of file types to not scan,
i.e. scan everything except for the file types specified.
<> resets to scan everything.
If the SmartScan file type reported by UAD is "unknown", or if VFind is run standalone (without SmartScan input), then all VDL file type restrictions are ignored and everything is scanned.
:v1,"..."# ; all file types <"text"> ; only "text" file types for the following vdls :v2,"..."# :v3,"..."# <!"HTML"> ; no "HTML" file types for the following vdls :v4,"..."# :v5, <"JPEG","GIF"> "..."# ; only "JPEG" and "GIF" file types for v5 :v6,"..."# <> ; all file types for the following vdls :v7,"..."#
Matching for file type restrictions is case-sensitive, and only requires that the VDL-restricted type be a substring of the SmartScan-reported type.
Versions for VDL files and rules can now be reported using an extension to the file type restriction syntax.
If you use a string starting with version= in a file type restriction directive, whatever follows the = character in that string will be printed as an informative message about the version of the VDL file or rule.
Here is an example which specifies a version for the VDL file and a version for VDL rule `b':
% cat v.vdl <"text","version=1.2.3"> :a, "abc"# :b, <"version=9.9"> "bbb"# % vfind --vdl=v.vdl hi ... ##==>> Loading VDL code from: v.vdl ##==>> All SmartScan file types disabled. ##==>> SmartScan file type `*text*' enabled. ##==>> VDL file `v.vdl' Version: 1.2.3 ##==> VDL model for `a' loaded. ##==> VDL `b' Version: 9.9 ##==> VDL model for `b' loaded. ##==> Checking file: "hi" ...
Conclusion - Listen to the audio
Out-takes - Listen to the audio