[Top] | [Contents] | [Index] | [ ? ] |
Version 1.17 – December 2021
This package provides some simple magic value features that simulate the Unix file(1) command to determine the type of a file
or of bytes
from the content. It has an internal set of magic number information or it can process the magic files from
local ~Unix system configuration.
To get started quickly using SimpleMagic, see section Start Using Quickly. There is also a PDF version of this documentation.
Gray Watson http://256stuff.com/gray/
1. Start Using Quickly | Start using SimpleMagic quickly. | |
2. Using SimpleMagic | How to use SimpleMagic. | |
3. Open Source License | Open Source license for the project. | |
Index of Concepts | Index of concepts in the manual. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
To use SimpleMagic you need to do the following steps. For more information, see section Using SimpleMagic.
ContentInfoUtil
instance
with the default constructor, it will load the internal magic entries file. See section How To Load Magic Entries.
ContentInfoUtil
class to get content-types for files or byte[]
:
ContentInfoUtil util = new ContentInfoUtil(); ContentInfo info = util.findMatch("/tmp/upload.tmp"); // or ContentInfo info = util.findMatch(inputStream); // or ContentInfo info = util.findMatch(contentByteArray); // display content type information if (info == null) { System.out.println("Unknown content-type"); } else { // other information in ContentInfo type System.out.println("Content-type is: " + info.getName()); } |
If the findMatch(...)
method does not recognize the content then it will return null. If it does match one of the
entries then it will return a ContentInfo
class which provides:
OTHER
Here are some examples of ContentInfo
output:
For somewhat more extensive instructions, see section Using SimpleMagic.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
2.1 Downloading Jar | Downloading the SimpleMagic jar. | |
2.2 How To Load Magic Entries | Loading in the magic entries from files. | |
2.3 How To Find The Content Info | Finding the content type from data. | |
2.4 Content Information | Content type information returned. | |
2.5 Using With Maven | How to use with Maven. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
To get started with SimpleMagic, you will need to download the jar file. The SimpleMagic release page is the default repository but the jars are also available from the central maven repository.
The code works with Java 6 or later.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The library uses various magic byte information to be able to find and determine details about random blocks of bytes. By default, SimpleMagic has a built in version of a magic file that was copied from a CentOS Linux system. It contains, ~2400 magic file entries describing a number of different file types. It also has an additional ~6600 lines which provide more details about the detected content types.
The magic entries are relatively complex but in general look something like the following. The configuration line says to
look at the start of the file for the string "GIF8"
. If it is there then the file is "GIF image data".
0 string GIF8 GIF image data |
If you do not want to use the internal magic definitions, you can also construct the ContentInfoUtil
class with a file
or directory to have it parse and use another definition file.
ContentInfoUtil util = new ContentInfoUtil("/etc/magic"); |
WARNING: although we’ve tried to support different types of magic entries, there are local per-OS variations that may not be supported.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Once you have loaded the magic entry information into your ContentInfoUtil
, you can use the utility class to find
the content info of files, byte arrays, or InputStream
s. The base method gets content info information from a
byte[]
.
byte[] uploadedBytes = ...; ContentInfo info = util.findMatch(uploadedBytes); |
You can also get the content type of a file which is read with a FileInputStream
:
ContentInfo info = util.findMatch("/tmp/uploadedFile.tmp"); // File uploadedFile = ... // ContentInfo info = util.findMatch(uploadedFile); |
If you have an InputStream
, you can also use it directly:
InputStream inputStream = ... ContentInfo info = util.findMatch(inputStream); |
If you want to process a stream of bytes as the bytes are being read, you can use the ContentInfoInputStreamWrapper
utility class. This takes an InputStream
which it wraps and delegates to. After you have read the bytes through
the wrapper, you can call the findMatch()
method to get its content information.
HttpServletRequest request = ... ContentInfoInputStreamWrapper inputStream = new ContentInfoInputStreamWrapper(request.getInputStream()); // read in the file from the http request, ... // after we have read it in, we can get its content-info ContentInfo info = inputStream.findMatch(); |
For the file and stream versions, the first 10 kilobytes of the data is read and processed.
There is also a long internal list of file types copied from the Apache list. Not all of the files in this list have associated magic number information. However, with the list you can look up mime-types or by file-extension and get the associated information.
You can use the internal list to lookup by file-extension:
// find details about files with .pdf extension ContentInfo info = ContentInfoUtil.findExtensionMatch("file.pdf"); // you can even just pass in the extension name ContentInfo info = ContentInfoUtil.findExtensionMatch("DOC"); |
Or you can look up by mime-type:
// find details about this mime-type ContentInfo info = ContentInfoUtil.findMimeTypeMatch("image/vnd.dwg"); |
Some internal entries provide more information than others. This list is a work in progress. Please submit improvements and edits as necessary.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
If the findMatch(...)
method does not recognize the content then it will return null. If it does match one of the
entries then it will return a ContentInfo
class which provides:
OTHER
. This is determined by mapping the
mime-type string to an internal enumerated type and is not determined from the magic file entries.
Here are some examples of ContentInfo
output:
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
To use SimpleMagic with maven, include the following dependency in your ‘pom.xml’ file:
<dependency> <groupId>com.j256.simplemagic</groupId> <artifactId>simplemagic</artifactId> <version>1.17</version> </dependency> |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document is part of the SimpleMagic project.
Copyright 2021, Gray Watson
Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
The author may be contacted via the SimpleMagic home page.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Jump to: | /
A B C D E F G H I L M O P Q S U W |
---|
Jump to: | /
A B C D E F G H I L M O P Q S U W |
---|
[Top] | [Contents] | [Index] | [ ? ] |
[Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Gray Watson on December 29, 2021 using texi2html 1.82.
The buttons in the navigation panels have the following meaning:
Button | Name | Go to | From 1.2.3 go to |
---|---|---|---|
[ < ] | Back | Previous section in reading order | 1.2.2 |
[ > ] | Forward | Next section in reading order | 1.2.4 |
[ << ] | FastBack | Beginning of this chapter or previous chapter | 1 |
[ Up ] | Up | Up section | 1.2 |
[ >> ] | FastForward | Next chapter | 2 |
[Top] | Top | Cover (top) of document | |
[Contents] | Contents | Table of contents | |
[Index] | Index | Index | |
[ ? ] | About | About (help) |
where the Example assumes that the current position is at Subsubsection One-Two-Three of a document of the following structure:
This document was generated by Gray Watson on December 29, 2021 using texi2html 1.82.