The objectives of this article are to explain what a file is in Computing and Computer Science, and give its correct definition. "File" is a
fundamental and simple concept in computing, which is used in a myriad of ways and perhaps for this reason there is an amazing lack of understanding on the matter judging by the fact that
virtually all online dictionaries have it WRONG. I will give the correct definition first and then look at some of the incorrect dictionary definitions.
Definition: a file is a record.
By implication, files are always finite. The interpretation and use of a file depends entirely on system using the file. The owner, location, storage, name, addressing, appended
properties such as attributes, timestamps etc. are usually, but not necessarily, not part of the file, and are entirely optional and when not part of the file are property of, and
depend on the user and/or its owner. This explain why the definition of a file is just "a record" - because everything about files is flexible, and can this, or that way, except that
the file is a record.
In computing terms a book is a file, as well as a poem printed on paper, as well as the PDF or text document on a hard disk, or a segment of data in a PROM, or a programmable data byte in a MCU, and so on and so on.
Although paper printed files are not stored in a computer, a robot with suitable software can scan the book, OCR the text, perform the required action on the data (text) e.g. count the number or words and display
that result, which would be equivalent to doing the same on a PDF file with embedded scanned bitmap pages. So, in computing and computer science terms, any record is a file.
The definition above may seem abstract and may need some time to grasp, especially the part that files are always subject of interpretation. For example, one can rename a bitmap image,
apply the .txt extension to it and open it with Notepad – they will see nothing that makes sense. Similarly if one the opens a word document with an image editor, they will see gibberish
as the data will be interpreted incorrectly.
Note that a file can contain other files, which contain other files and so on, for example a nested zip archive containing zip archives and so on - most "dictionary" definitions do not allow for such.
I have noted other less popular types of perfectly valid and good files that the major dictionaries do not account for as I examine their erroneous definitions. Also, for the record I contacted
many of the dictionaries but my input was ignored - perhaps the editors felt that a non native English speaker cannot possibly know better than them. ;)
According to http://dictionary.reference.com/browse/file
File Computers. a collection of related data or program records stored on some input/output or auxiliary storage medium: This program's main purpose is to update the customer master file.
Every part of this definition is wrong, as follows:
- a collection – a collection in CS is a very specific term and when used requires a definition - one cannot just throw words because they seem suitable to an editor. Random data, in no specific order is not a collection! One bit fixed file is not a collection!
- of related data – the data inside of a file need not be related, indeed the file may contain random data, or be empty, or contain only 1 item. A file can also contain bad or meaningless data items and no information at all.
- program records – the file need not be a program record. For example, it can be a program itself. In rare cases, a program can be written directly by a human in machine code. Further, strictly speaking a file can be created empty by one entity (program or human), and abandoned by them, and until it is unused by any other program this statement is false.
- stored on some input/output or auxiliary storage medium – the storage of a file is irrelevant to it being a file. In addition, this statement may mislead a person in understanding digital computer architecture. A digital computer consists of Processor + Memory (ROM + RAM attached on the address and data buses on the processor + chipset if necessary for processor's particular architecture), everything else is periphery. A file can be found:
In the first 4 cases the file is stored inside the computer, whilst the in the later cases the file is stored on a peripheral device, or on a storage medium. Files can be found (stored) on Read Only and Read/Write mediums of any kind, either directly accessible by the processor, or accessible via suitable sensors.
- Inside the processor (especially in MCU), e.g. specifying the processor operation modes.
- Inside ROM memory – as an area containing settings or data used by one or more software including any OS.
- Re-programmable ROM.
- Magnetic peripherals.
- Paper – punch cards.
- Printed paper - e.g. an automatic scanning device could be attached to a computer (and OCR software) enabling processing of printed documents as files.
According to http://www.merriam-webster.com/dictionary/file
c (1) : a collection of related data records (as for a computer) (2) : a complete collection of data (as text or a program) treated by a computer as a unit especially for purposes of input and output
Similarly wrong. In addition, the word "complete" is entirely out of place, e.g. a segment of a segmented archive is a perfect example of a file, whilst it is totally useless without every other segment usually found in a different media device.
According to http://www.yourdictionary.com/file
Comput. a collection of data (or, often, of logically related records) stored and dealt with as a single, named unit.
As we saw earlier, the term "collection of data" applies to some but not all files, and only in a logical but not a physical sense – e.g. a file with random data,
or invalid data entries (i.e. no data at all), or an empty file are all counter examples in the logical sense. "Stored and dealt with as a single, named unit" is
also wrong. A file need not be referred to as a "unit" – this opens the question "who" refers how to the file, but we will not delve into this question. The
naming stipulation is also not necessarily correct. For example a miniature system with only 1 byte or even 1 bit of memory, which acts as a file to the processing
unit does not need and would not have a name. Another example for a digital/computer system where a file has no name and reference schema is a melodic door bell;
the simplest of which consists of a generator, counter, a ROM memory containing a file with the digitalized music, digital-to-analog converter,
amplifier, speaker, and a trigger controlling the generator/counter.
According to http://www.webopedia.com/TERM/F/file.html
(n.) A collection of data or information that has a name, called the filename. Almost all information stored in a computer must be in a file. There are many different types of files: data files, text files, program files, directory files, and so on. Different types of files store different types of information. For example, program files store programs, whereas text files store text.
This is complete nonsense. It only gives some examples among a myriad of errors. In addition to the those discussed earlier, the Webopedia definition adds a further error:
there is no such thing as a text file. A file may contain text data, however, which needs to be interpreted as such in order to be meaningful. Whilst people use slang and such
language is understood, a dictionary should give formally correct definitions.
This being said, notice the expression: "Almost all information stored in a computer must be in a file." This sentence is close to be a correct statement if we
modify it as follows: "Any and all information stored in a computer are files."
According to http://www.oxforddictionaries.com/definition/english/file
Computing A collection of data, programs, etc. stored in a computer’s memory or on a storage device under a single identifying name:
This definition is similar to the other incorrect ones discussed previously. However, it makes another point of interest; a file and program are related. Until now,
all definitions we examined related files only to data! Note that programs are usually stored as files, but this is not necessary, e.g. a program can be made as a microcode.
According to http://www.collinsdictionary.com/dictionary/english/file?showCookiePolicy=true
(computing) a named collection of information, in the form of text, programs, graphics, etc, held on a permanent storage device such as a magnetic disk
Practically all errors we discussed earlier are present in this latest faulty definition. A file is not a collection, it may contain no information, it need not to have a name; its content, if any,
is irrelevant to the constitution the file; the file storage is also irrelevant.
According to http://pcsupport.about.com/od/termsf/g/file-definition.htm
Definition: A file, in the computer world, is a self contained piece of information available to the operating system and any number of individual programs. Information inside the file could consist of essentially anything but whatever the file contains is likely related somehow.
Same errors as above, but in addition, a computer need not have an operating system, yet it may have and use many files. However, this definition touches on/affirms the
point we underlined multiple times that the contents of a file is immaterial.
According to http://dictionary.cambridge.org/dictionary/british/file_3
information stored on a computer as one unit with one name.
Again, complete nonsense!
- information – as we pointed out, a file need not contain information. For example, a file produced with data from a broken sensor contains no information.
- stored on a computer – as we showed a punch card is a file but is not stored on a computer, as well as a printed book or a CD/DVD, etc.
- as one unit – this is untrue in both the physical and logical senses:
- physical sense – we will provide two counter examples: (1) a file may be stored on two separate ROM chips which need not be sequential in the address space of the processor; (2) on a hard disk, files are rarely sequential, but instead are usually laid-out across the media in a nondeterministic but not random order.
- logical sense – this is even more meaningless - what does "one unit" mean in this sense? (1) If the authors mean "the complete information on something" then that is pure nonsense - files can contain partial information. If they mean that the file contains information only from one thing, then that is also nonsense.
- with one name – also untrue as we showed that a file need not have a name. The uniqueness of the name is also objectionable, e.g. in the conventional
file systems, shortcuts, mountings and other mechanisms can be employed to refer to the same data entity from different locations and under different names (addressing schemes). In a ROM file, the starting
address of a file may have multiple aliases known to the various users of the file. Further consider a dual access memory holding a file inside of a range in it.
At this point, we will conclude. It is evident that the definitions offered by online dictionaries are at best incomplete, and at worst patently wrong. To summarize some of the properties of files:
- Files can be classified in many possible ways, a few of which are classification:
- by type of content:
- random data;
- invalid data;
- valid data;
- program code – valid or invalid;
- by completeness:
- by size – files have always finite and known size, which may vary:
- empty – contains nothing at all;
- one unit;
- larger than one data units;
- by changeability of the size:
- of constant size – physically or logically;
- of variable size;
- by changeability of the content:
- of constant content;
- of variable content
Some of these properties may or may not be mutually exclusive and otherwise related depending on the particular circumstances of any particular file.
- File can be created by a program or by a person, automatically or manually.
- Two empty files are distinct, as well as any two files.
- Files have an order of recording and reading.
- The content of a file is immaterial in respect to the file itself, and is always subject of interpretation. Files are meaningful only in the correct context.
For example, an empty file does not contain information (other than that the file is empty). Another example: this is a file: "5" - this file contains no information by itself, since
what is "5"? - 5 sheep? 5 goats? 5 barrels of oil? "5" paper and ink dollars, 5 actual ounces of gold, 5 poems ... by itself, out of context, the file "5" contains no information, yet
"5" is a perfectly good file.
- Files need not be stored (permanently), e.g. RAM files and internal software files while it is running. Such software can be of any kind; applications, the operating system, or a specific driver enveloping the device running it.
- Files do not need to be named, as long as the software/hardware knows where/how to access them. For example, a device using RAM/EERPROM/FLASH storing data at a fixed physical address or range of addresses - a small digital (and/or computer) device which has memory only representing a single file used by the device for its operation, such as miniature computer, door bell, (melodic) alarm clock, and others.
- Some files may be self-modifiable – some viruses exhibit this characteristic.
In conclusion, in computing "file" simply means "record". It can be data or a program, or anything, and is always subject to the interpretation of the software that uses the file and ultimately by the person using the software.