| Title: | Chromatographic File Converter |
|---|---|
| Description: | Reads chromatograms from binary formats into R objects. Currently supports conversion of 'Agilent ChemStation', 'Agilent MassHunter', 'Agilent OpenLab', 'Shimadzu LabSolutions', 'ThermoRaw', 'Varian Workstation', and 'Waters Empower' files as well as various other formats. In addition to its internal parsers, chromConverter contains bindings to parsers in external libraries, such as 'Aston' <https://github.com/bovee/aston>, 'Entab' <https://github.com/bovee/entab>, 'rainbow' <https://rainbow-api.readthedocs.io/>, and 'ThermoRawFileParser' <https://github.com/compomics/ThermoRawFileParser>. |
| Authors: | Ethan Bass [aut, cre] (ORCID: <https://orcid.org/0000-0002-6175-6739>), James Dillon [ctb, cph] (Author and copyright holder of source code adapted from the 'Chromatography Toolbox' for parsing 'Agilent' FID files.), Evan Shi [ctb, cph] (Author and copyright holder of source code adapted from 'rainbow' for parsing 'Agilent' UV files.) |
| Maintainer: | Ethan Bass <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.9.0 |
| Built: | 2026-05-31 17:12:38 UTC |
| Source: | https://github.com/ethanbass/chromConverter |
Converts chromatography date files using entab parsers.
call_entab( path, data_format = c("wide", "long"), format_out = c("matrix", "data.frame", "data.table"), format_in = "", read_metadata = TRUE, metadata_format = c("chromconverter", "raw") )call_entab( path, data_format = c("wide", "long"), format_out = c("matrix", "data.frame", "data.table"), format_in = "", read_metadata = TRUE, metadata_format = c("chromconverter", "raw") )
path |
Path to file. |
data_format |
Whether to return data in |
format_out |
Class of output. Either |
format_in |
Format of input. |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
A chromatogram in the format specified by the format_out and
data_format arguments.
Other external parsers:
call_openchrom(),
call_rainbow(),
read_thermoraw(),
sp_converter(),
uv_converter()
Writes xml batch-files and calls OpenChrom file parsers using a
system call to the command-line interface. Unfortunately, the command-line
interface is no longer supported in newer versions of OpenChrom (starting with
version 1.5.0) and older versions of OpenChrom that do support
the command line interface are no longer available from Lablicate. Thus, this
function is deprecated since it will only work if you happen to have access
to OpenChrom version 1.4.0, which has been scrubbed from the internet.
call_openchrom( files, path_out = NULL, format_in, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), export_format = c("mzml", "csv", "cdf", "animl"), return_paths = FALSE, verbose = getOption("verbose") )call_openchrom( files, path_out = NULL, format_in, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), export_format = c("mzml", "csv", "cdf", "animl"), return_paths = FALSE, verbose = getOption("verbose") )
files |
Path to files. |
path_out |
Directory to export converted files. |
format_in |
Either |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
export_format |
Either |
return_paths |
Logical. If |
verbose |
Logical. Whether to print output from OpenChrom to the console. |
The call_openchrom function works by creating an xml batchfile
and feeding it to the OpenChrom command-line interface. OpenChrom batchfiles
consist of InputEntries (specifying the files you want to convert) and
ProcessEntries (specifying what you want to do to the files). The parsers
are organized into broad categories by detector-type and output format. The
detector-types are msd (mass selective detectors), csd (current selective
detectors, e.g., FID, ECD, NPD), and wsd (wavelength selective detectors,
e.g., DAD, and UV/VIS). Thus, when calling the OpenChrom parsers, one of
these three options must be specified using the format_in argument.
If return_paths is FALSE, the function will return a list of
chromatograms (if an appropriate parser is available to import the files into
R). The chromatograms will be returned in matrix or data.frame format
according to the value of format_out. If return_paths is TRUE, the
function will return a character vector of paths to the newly created files.
Chromatograms will be exported in the format specified
by export_format in the folder specified by path_out.
Activating the OpenChrom command-line will deactivate the graphical user interface (GUI). Thus, if you wish to continue using the OpenChrom GUI, it is recommended to create a separate command-line version of OpenChrom to call from R.
Ethan Bass
Wenig, Philip and Odermatt, Juergen. OpenChrom: A Cross-Platform Open Source Software for the Mass Spectrometric Analysis of Chromatographic Data. BMC Bioinformatics 11, no. 1 (July 30, 2010): 405. doi:10.1186/1471-2105-11-405.
Other external parsers:
call_entab(),
call_rainbow(),
read_thermoraw(),
sp_converter(),
uv_converter()
Uses rainbow parsers to read in Agilent
(.D) and Waters (.raw) files. If format_in is "agilent_d" or
"waters_raw", a directory of the appropriate format (.D or .raw) should
be provided to the path argument. If format_in is "chemstation_uv" a
.uv file should be provided. Data can be filtered by detector type using
the what argument.
call_rainbow( path, format_in = c("agilent_d", "waters_raw", "masshunter", "chemstation", "chemstation_uv", "chemstation_fid", "chemstation_ms"), format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), by = c("detector", "name"), what = NULL, read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), collapse = TRUE, precision = 1, sparse = TRUE )call_rainbow( path, format_in = c("agilent_d", "waters_raw", "masshunter", "chemstation", "chemstation_uv", "chemstation_fid", "chemstation_ms"), format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), by = c("detector", "name"), what = NULL, read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), collapse = TRUE, precision = 1, sparse = TRUE )
path |
Path to file. |
format_in |
Format of the supplied files. Either |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
by |
How to order the list that is returned. Either |
what |
What types of data to return (e.g. |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
collapse |
Logical. Whether to collapse lists that only contain a single
element. Defaults to |
precision |
Number of decimals to round mz values. Defaults to 1. |
sparse |
Logical. Whether to return MS data in sparse format (excluding
zeros). Defaults to |
Returns a (nested) list of matrices or data.frames according to
the value of format_out. Data is ordered according to the value of by.
Ethan Bass
Other external parsers:
call_entab(),
call_openchrom(),
read_thermoraw(),
sp_converter(),
uv_converter()
Configures OpenChrom to use command-line interface. Requires OpenChrom version prior to 0.5.0.
configure_openchrom(cli = c("null", "true", "false", "status"), path = NULL)configure_openchrom(cli = c("null", "true", "false", "status"), path = NULL)
cli |
Defaults to NULL. If "true", R will rewrite openchrom ini file to enable CLI. If "false", R will disable CLI. If NULL, R will not modify the ini file. |
path |
Path to 'OpenChrom' executable (Optional). The supplied path will overwrite the current path. |
If cli is set to "status", returns a Boolean value
indicating whether 'OpenChrom' is configured correctly. Otherwise, returns
the path to OpenChrom command-line application.
Ethan Bass
Extract metadata as a data.frame, data.table or tibble from a list of
chromatograms.
extract_metadata( chrom_list, what = c("instrument", "detector", "detector_id", "software", "method", "batch", "operator", "run_datetime", "sample_name", "sample_id", "injection_volume", "time_range", "time_interval", "time_unit", "detector_range", "detector_y_unit", "detector_x_unit", "intensity_multiplier", "scaled", "source_file", "source_file_format", "source_sha1", "data_format", "parser", "format_out"), format_out = c("data.frame", "data.table", "tibble") )extract_metadata( chrom_list, what = c("instrument", "detector", "detector_id", "software", "method", "batch", "operator", "run_datetime", "sample_name", "sample_id", "injection_volume", "time_range", "time_interval", "time_unit", "detector_range", "detector_y_unit", "detector_x_unit", "intensity_multiplier", "scaled", "source_file", "source_file_format", "source_sha1", "data_format", "parser", "format_out"), format_out = c("data.frame", "data.table", "tibble") )
chrom_list |
A list of chromatograms with attached metadata (as returned
by |
what |
A character vector specifying the metadata elements to extract. |
format_out |
Format of object. Either |
A data.frame, tibble, or data.table (according to the value of
format_out), with samples as rows and the specified metadata elements as
columns.
Prints a summary of a chrom_list without displaying the underlying
chromatographic data. Attributes that are constant across all chromatograms
are collapsed into a single header line, while varying attributes are shown
as a table truncated to the first n rows.
## S3 method for class 'chrom_list' print( x, n = 5, cols = c("sample_name", "run_datetime", "method", "detector"), ... )## S3 method for class 'chrom_list' print( x, n = 5, cols = c("sample_name", "run_datetime", "method", "detector"), ... )
x |
A |
n |
Integer. Maximum number of chromatograms to show in the table.
Defaults to |
cols |
Character vector of attribute names to extract and display.
Defaults to |
... |
Additional arguments (currently ignored). |
Invisibly returns x.
Extracts injection metadata from 'Agilent Common Analytical Markup Language' (ACAML) files into an R object.
read_acaml( path, find_files, format_out = c("data.frame", "data.table", "tibble"), progress_bar = TRUE, cl = 1 )read_acaml( path, find_files, format_out = c("data.frame", "data.table", "tibble"), progress_bar = TRUE, cl = 1 )
path |
Path(s) to ACAML files or to folders that contain the files. |
find_files |
Logical. Set to |
format_out |
Class of output. Either |
progress_bar |
Logical. Whether to show progress bar. Defaults to |
cl |
Argument to pbapply specifying the number
of clusters to use or a cluster object created by
makeCluster. Defaults to |
ACAML is an XML-based format used by Agilent OpenLab to store sequence and
sample metadata. This function extracts information from the
InjectionMetaData nodes embedded in the InjectionMetaDataItems
custom field files, which do not seem to be readily accessible through other
means.
A data.frame, data.table or tibble (according to the value of
format_out) containing sample metadata derived from the supplied ACAML
files.
## Not run: read_acaml(path) ## End(Not run)## Not run: read_acaml(path) ## End(Not run)
Parses an Agilent .amx method archive, extracting instrument parameters
from one or more of its driver sub-files.
read_agilent_amx( path, what = c("dad", "pump", "comp", "sampler"), path_out = NULL, format_out = c("data.frame", "tibble", "data.table"), gradient_format = c("wide", "long") )read_agilent_amx( path, what = c("dad", "pump", "comp", "sampler"), path_out = NULL, format_out = c("data.frame", "tibble", "data.table"), gradient_format = c("wide", "long") )
path |
Path to the |
what |
One or more instrument modules to parse. Any
combination of |
path_out |
Directory into which the archive
is extracted. If |
format_out |
Class of output (for tables). Either |
gradient_format |
Whether to return the gradient in |
A named list with one element per parsed module, plus "metadata".
Elements present depend on what; see below for the structure of each.
metadata — a list with scalar elements:
method_nameOriginal method name.
versionMethod version string.
statusApproval state.
createdCreation timestamp (POSIXct, UTC).
created_byUsername of creator.
modifiedLast-modified timestamp (POSIXct, UTC).
modified_byUsername of last modifier.
pump — a list with scalar elements flow_mL_min, stop_time_min,
post_time_min, pressure_low_bar, pressure_high_bar, plus:
solventsA data.frame of active solvent channels: channel,
percentage, solvent.
gradientA data.frame of timetable entries. Wide format (default):
time_min plus one pct_<channel> column per active channel. Long
format: time_min, channel, percent.
dad — a list with scalar elements peakwidth_nm, slitwidth_nm,
uv_lamp_required, vis_lamp_required, spectra_from_nm,
spectra_to_nm, spectra_step_nm, plus:
signalsA data.frame of active signals: id, wavelength_nm,
bandwidth_nm.
comp — a list with scalar element post_time_min, plus:
temp_controlsTwo-row data.frame (Left/Right): side,
temperature_C, not_ready_limit_C, equilibration_time_min.
sampler — a list with scalar elements: thermostat_installed,
draw_speed_uL_min, eject_speed_uL_min,
wait_after_draw_min, injection_volume_uL, wash_time_s.
## Not run: read_agilent_amx(path) ## End(Not run)## Not run: read_agilent_amx(path) ## End(Not run)
Reads files from 'Agilent' .D directories.
read_agilent_d( path, what = c("dad", "chroms", "peak_table"), format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), collapse = TRUE )read_agilent_d( path, what = c("dad", "chroms", "peak_table"), format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), collapse = TRUE )
path |
Path to 'Agilent' |
what |
Whether to extract chromatograms ( |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
collapse |
Logical. Whether to collapse lists that only contain a single
element. Defaults to |
Currently this function is limited to reading .uv, .ch and peak_table
elements.
A list of chromatograms in the format specified by data_format and
format_out. If data_format is wide, the chromatograms will be
returned with retention times as rows and columns containing signal intensity
for each signal. If long format is requested, retention times will be
in the first column. The format_out argument determines whether the
chromatogram is returned as a matrix, data.frame or data.table.
Metadata can be attached to the chromatogram as attributes if
read_metadata is TRUE.
Ethan Bass
Other 'Agilent' parsers:
read_agilent_dx(),
read_chemstation_ch(),
read_chemstation_csv(),
read_chemstation_ms(),
read_chemstation_reports(),
read_chemstation_uv()
read_agilent_d("tests/testthat/testdata/RUTIN2.D")read_agilent_d("tests/testthat/testdata/RUTIN2.D")
Reads 'Agilent' .dx files.
read_agilent_dx( path, what = c("chroms", "dad"), path_out = NULL, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), collapse = TRUE )read_agilent_dx( path, what = c("chroms", "dad"), path_out = NULL, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), collapse = TRUE )
path |
Path to Agilent |
what |
Whether to extract chromatograms ( |
path_out |
A directory to export unzipped files. If a path is not
specified, the files will be written to a temp directory on the disk. The
function will overwrite existing folders in the specified directory
that share the basename of the file specified by |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
collapse |
Logical. Whether to collapse lists that only contain a single
element. Defaults to |
This function unzips 'Agilent' .dx into a temporary directory using
unzip and calls the appropriate parser on the unzipped file.
A chromatogram in the format specified by format_out (retention
time x wavelength).
Ethan Bass
Other 'Agilent' parsers:
read_agilent_d(),
read_chemstation_ch(),
read_chemstation_csv(),
read_chemstation_ms(),
read_chemstation_reports(),
read_chemstation_uv()
## Not run: read_agilent_dx(path) ## End(Not run)## Not run: read_agilent_dx(path) ## End(Not run)
Reads 'Allotrope Simple Model' files into R.
read_asm( path, data_format = c("wide", "long"), format_out = c("matrix", "data.frame", "data.table"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), collapse = TRUE )read_asm( path, data_format = c("wide", "long"), format_out = c("matrix", "data.frame", "data.table"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), collapse = TRUE )
path |
Path to ASM |
data_format |
Whether to return data in |
format_out |
Class of output. Either |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
collapse |
Logical. Whether to collapse lists that only contain a single
element. Defaults to |
A 2D chromatogram in the format specified by data_format and
format_out. If data_format is wide, the chromatogram will be returned
with retention times as rows and a single column for the intensity. If long
format is requested, two columns will be returned: one for the
retention time and one for the intensity. The format_out argument
determines whether the chromatogram is returned as a matrix, data.frame,
or data.table. Metadata can be attached to the chromatogram as attributes
if read_metadata is TRUE.
Ethan Bass
Reads 'Analytical Data Interchange' (ANDI) netCDF (.cdf) files.
read_cdf( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), what = NULL, read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), collapse = TRUE, ... )read_cdf( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), what = NULL, read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), collapse = TRUE, ... )
path |
Path to ANDI netCDF file. |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
what |
For |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
collapse |
Logical. Whether to collapse lists that only contain a single
element. Defaults to |
... |
Additional arguments to parser. The |
A chromatogram in the format specified by the format_out and
data_format arguments.
Ethan Bass
Reads 'Agilent ChemStation' .ch files.
read_chemstation_ch( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), scale = TRUE, source_file = NULL )read_chemstation_ch( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), scale = TRUE, source_file = NULL )
path |
Path to 'Agilent' |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
scale |
Whether to scale the data by the scaling factor present in the
file. Defaults to |
source_file |
Source file from which chromatogram data was originally derived. |
'Agilent' .ch files come in several different formats. This parser
can automatically detect and read several versions of these files from
'Agilent ChemStation' and 'Agilent OpenLab', including versions 30 and
130, which are generally produced by ultraviolet detectors, as well as
81, 179, and 181 which are generally produced by flame ionization (FID)
detectors.
A 2D chromatogram in the format specified by data_format and
format_out. If data_format is wide, the chromatogram will be returned
with retention times as rows and a single column for the intensity. If long
format is requested, two columns will be returned: one for the
retention time and one for the intensity. The format_out argument
determines whether the chromatogram is returned as a matrix, data.frame,
or data.table. Metadata can be attached to the chromatogram as attributes
if read_metadata is TRUE.
This function was adapted from the Chromatography Toolbox (© James Dillon 2014).
Ethan Bass
Other 'Agilent' parsers:
read_agilent_d(),
read_agilent_dx(),
read_chemstation_csv(),
read_chemstation_ms(),
read_chemstation_reports(),
read_chemstation_uv()
read_chemstation_ch("tests/testthat/testdata/chemstation_130.ch")read_chemstation_ch("tests/testthat/testdata/chemstation_130.ch")
Reads 'Agilent Chemstation' .csv files.
read_chemstation_csv( path, format_out = "matrix", data_format = "wide", read_metadata = TRUE )read_chemstation_csv( path, format_out = "matrix", data_format = "wide", read_metadata = TRUE )
path |
Path to 'Agilent' |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
read_metadata |
Logical. Whether to attach metadata. Defaults to
|
'Agilent Chemstation' CSV files are encoded in UTF-16.
A chromatogram in the format specified by format_out and
data_format.
Ethan Bass
Other 'Agilent' parsers:
read_agilent_d(),
read_agilent_dx(),
read_chemstation_ch(),
read_chemstation_ms(),
read_chemstation_reports(),
read_chemstation_uv()
read_chemstation_csv("tests/testthat/testdata/dad1.csv")read_chemstation_csv("tests/testthat/testdata/dad1.csv")
Reads 'Agilent ChemStation MSD Spectral Files' beginning with
x01/x32/x00/x00.
read_chemstation_ms( path, what = c("MS1", "BPC", "TIC"), format_out = c("matrix", "data.frame", "data.table"), data_format = "long", read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), collapse = TRUE )read_chemstation_ms( path, what = c("MS1", "BPC", "TIC"), format_out = c("matrix", "data.frame", "data.table"), data_format = "long", read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), collapse = TRUE )
path |
Path to 'Agilent' |
what |
What stream to get: current options are |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
collapse |
Logical. Whether to collapse lists that only contain a single
element. Defaults to |
A list of chromatograms in the format specified by data_format and
format_out. If data_format is wide, 2D chromatograms will
be returned with retention times as rows and a single column for the
intensity. Otherwise, two columns will be returned: one for the
retention time and one for the intensity. MS data will always be returned in
long format. The format_out argument determines whether the chromatogram is
returned as a matrix, data.frame, or data.table. Metadata can be
attached to the chromatogram as attributes if read_metadata is TRUE.
Many thanks to Evan Shi and Eugene Kwan for providing helpful information on the structure of these files in the rainbow documentation.
Ethan Bass
Other 'Agilent' parsers:
read_agilent_d(),
read_agilent_dx(),
read_chemstation_ch(),
read_chemstation_csv(),
read_chemstation_reports(),
read_chemstation_uv()
## Not run: read_chemstation_ms(path) ## End(Not run)## Not run: read_chemstation_ms(path) ## End(Not run)
Reads 'Agilent ChemStation' reports into R.
read_chemstation_reports( paths, data_format = c("chromatographr", "original"), metadata_format = c("chromconverter", "raw") )read_chemstation_reports( paths, data_format = c("chromatographr", "original"), metadata_format = c("chromconverter", "raw") )
paths |
Paths to 'ChemStation' report files. |
data_format |
Format to output data. Either |
metadata_format |
Format to output metadata. Either |
A data.frame containing the information from the specified
'ChemStation' report.
Ethan Bass
Other 'Agilent' parsers:
read_agilent_d(),
read_agilent_dx(),
read_chemstation_ch(),
read_chemstation_csv(),
read_chemstation_ms(),
read_chemstation_uv()
Agilent .uv files come in several different formats. This parser can
automatically detect and read several versions of these files from
'Agilent ChemStation' and 'Agilent OpenLab', including versions 31 and
131.
read_chemstation_uv( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), scale = TRUE, source_file = NULL )read_chemstation_uv( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), scale = TRUE, source_file = NULL )
path |
Path to 'Agilent' |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
scale |
Whether to scale the data by the scaling factor present in the
file. Defaults to |
source_file |
Source file from which UV data was originally derived. |
A 3D chromatogram in the format specified by data_format and
format_out. If data_format is wide, the chromatogram will
be returned with retention times as rows and wavelengths as columns. If
long format is requested, three columns will be returned: one for the
retention time, one for the wavelength and one for the intensity. The
format_out argument determines whether the chromatogram is returned as
a matrix, data.frame, or data.table. Metadata will be attached to the
chromatogram as attributes if read_metadata is TRUE.
This function was adapted from the parser in the rainbow project licensed under GPL 3 by Evan Shi https://rainbow-api.readthedocs.io/en/latest/agilent/uv.html.
Ethan Bass
Other 'Agilent' parsers:
read_agilent_d(),
read_agilent_dx(),
read_chemstation_ch(),
read_chemstation_csv(),
read_chemstation_ms(),
read_chemstation_reports()
read_chemstation_uv("tests/testthat/testdata/dad1.uv")read_chemstation_uv("tests/testthat/testdata/dad1.uv")
Reads 'Chromatotec' .Chrom files.
read_chromatotec( path, what = c("chrom", "peak_table"), format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), collapse = TRUE )read_chromatotec( path, what = c("chrom", "peak_table"), format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), collapse = TRUE )
path |
The path to 'Chromatotec' |
what |
Whether to extract chromatograms ( |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
collapse |
Logical. Whether to collapse lists that only contain a single
element. Defaults to |
A chromatogram and/or peak table from the specified path, according
to the value of what. Chromatograms are returned in the format specified by
format_out.
Ethan Bass
## Not run: read_chromatotec(path) ## End(Not run)## Not run: read_chromatotec(path) ## End(Not run)
Reads 'Thermo Fisher Chromeleon™ CDS' ASCII (.txt) files.
read_chromeleon( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), decimal_mark = NULL )read_chromeleon( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), decimal_mark = NULL )
path |
Path to 'Chromeleon' ASCII file. |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
decimal_mark |
Which character is used as the decimal separator in the
file. By default, decimal mark will be detected automatically, but it can
also be manually set as |
A chromatogram in the format specified by format_out (retention
time x wavelength).
Ethan Bass
Reads chromatograms from specified folders or vector of paths using either an internal parser or bindings to an external library, such as Aston, Entab, ThermoRawFileParser, OpenChrom, rainbow.
read_chroms( paths, format_in = c("agilent_d", "agilent_dx", "asm", "chemstation", "chemstation_fid", "chemstation_ch", "chemstation_csv", "chemstation_ms", "chemstation_uv", "masshunter_dad", "chromeleon_uv", "chromatotec", "mzml", "mzxml", "mdf", "shimadzu_ascii", "shimadzu_dad", "shimadzu_fid", "shimadzu_gcd", "shimadzu_qgd", "shimadzu_lcd", "thermoraw", "varian_sms", "waters_arw", "waters_raw", "msd", "csd", "wsd", "csv", "other"), find_files, pattern = NULL, parser = c("", "chromconverter", "aston", "entab", "thermoraw", "openchrom", "rainbow"), format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), path_out = NULL, export_format = c("", "csv", "chemstation_csv", "cdf", "mzml", "animl", "arw"), force = FALSE, read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), progress_bar, cl = 1, verbose = getOption("verbose"), sample_names = c("basename", "sample_name"), dat = NULL, ... )read_chroms( paths, format_in = c("agilent_d", "agilent_dx", "asm", "chemstation", "chemstation_fid", "chemstation_ch", "chemstation_csv", "chemstation_ms", "chemstation_uv", "masshunter_dad", "chromeleon_uv", "chromatotec", "mzml", "mzxml", "mdf", "shimadzu_ascii", "shimadzu_dad", "shimadzu_fid", "shimadzu_gcd", "shimadzu_qgd", "shimadzu_lcd", "thermoraw", "varian_sms", "waters_arw", "waters_raw", "msd", "csd", "wsd", "csv", "other"), find_files, pattern = NULL, parser = c("", "chromconverter", "aston", "entab", "thermoraw", "openchrom", "rainbow"), format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), path_out = NULL, export_format = c("", "csv", "chemstation_csv", "cdf", "mzml", "animl", "arw"), force = FALSE, read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), progress_bar, cl = 1, verbose = getOption("verbose"), sample_names = c("basename", "sample_name"), dat = NULL, ... )
paths |
Paths to data files or directories containing the files. |
format_in |
Format of files to be imported/converted. Current options
include: |
find_files |
Logical. Set to |
pattern |
pattern (e.g. a file extension). Defaults to |
parser |
What parser to use (optional). Current option are
|
format_out |
Class of output. Either |
data_format |
Whether to output data in wide or long format. Either
|
path_out |
Path for exporting files. If path is not specified, the user will be prompted to create a temp directory. |
export_format |
Export format. Currently the options include |
force |
Logical. Whether to overwrite files when exporting. Defaults to
|
read_metadata |
Logical, whether to attach metadata (if it's available).
Defaults to |
metadata_format |
Format to output metadata. Either |
progress_bar |
Logical. Whether to show progress bar. Defaults to |
cl |
Argument to pbapply specifying the number
of clusters to use or a cluster object created by
makeCluster. Defaults to |
verbose |
Logical. Whether to print output from external parsers to the R console. |
sample_names |
Which sample names to use. Options are |
dat |
Existing list of chromatograms to append results. Defaults to
|
... |
Additional arguments to parser. |
Provides a unified interface to all chromConverter parsers. Currently recognizes
'Agilent ChemStation' (.uv, .ch, .dx), 'Agilent MassHunter' (.dad),
'Thermo RAW' (.raw), 'Waters ARW' (.arw), 'Waters RAW' (.raw),
'Chromeleon ASCII' (.txt), 'Shimadzu ASCII' (.txt),
'Shimadzu GCD' (.gcd), 'Shimadzu LCD' (.lcd, DAD and chromatogram streams)
and 'Shimadzu QGD' (.qgd) files. Also, wraps 'OpenChrom' parsers, which
include many additional formats. To use 'Entab', 'ThermoRawFileParser', or
'OpenChrom' parsers, they must be separately installed. Please see the
instructions in the README
for further details.
If paths to individual files are provided, read_chroms will try to
infer the file format and select an appropriate parser. However, when
providing paths to directories, the file format must be specified using the
format_in argument.
A list of chromatograms in matrix, data.frame, or data.table
format, according to the value of format_out. Chromatograms may be returned
in either wide or long format according to the value of data_format.
If export_format is provided, chromatograms will be
exported in the specified format specified into the folder specified by
path_out. Files can currently be converted to csv, mzml, cdf, arw.
If an openchrom parser is selected, ANIML format is available as an
additional option.
Ethan Bass
path <- "tests/testthat/testdata/dad1.uv" chr <- read_chroms(path, find_files = FALSE, format_in = "chemstation_uv")path <- "tests/testthat/testdata/dad1.uv" chr <- read_chroms(path, find_files = FALSE, format_in = "chemstation_uv")
Reads 'Lumex' .mdf files.
read_mdf( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE )read_mdf( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE )
path |
The path to a 'Lumex' |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
A chromatogram in the format specified by the format_out and
data_format arguments.
Ethan Bass
Extracts data from mzML files using parsers from either RaMS or mzR.
The RaMS parser (default) will only return data in tidy (long) format. The
mzR parser will return data in wide format. Currently the mzR-based parser
is configured to return only DAD data.
read_mzml( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), parser = c("RaMS", "mzR"), what = c("MS1", "MS2", "BPC", "TIC", "DAD", "chroms", "metadata", "everything"), verbose = FALSE, ... )read_mzml( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), parser = c("RaMS", "mzR"), what = c("MS1", "MS2", "BPC", "TIC", "DAD", "chroms", "metadata", "everything"), verbose = FALSE, ... )
path |
Path to |
format_out |
Class of output. Only applies if |
data_format |
Whether to return data in |
parser |
What parser to use. Either |
what |
What types of data to return (argument to RaMS::grabMSdata).
Options include |
verbose |
Argument to |
... |
Additional arguments to |
If RaMS is selected, the function will return a list of "tidy"
data.table objects. If mzR is selected, the function will return a
chromatogram in matrix or data.frame format according to the
value of format_out.
Ethan Bass
Reads peak lists from specified folders or vector of paths.
read_peaklist( paths, find_files, format_in = c("chemstation", "shimadzu_fid", "shimadzu_dad", "shimadzu_lcd", "shimadzu_gcd", "chromatotec"), pattern = NULL, data_format = c("chromatographr", "original"), metadata_format = c("chromconverter", "raw"), read_metadata = TRUE, progress_bar, cl = 1 )read_peaklist( paths, find_files, format_in = c("chemstation", "shimadzu_fid", "shimadzu_dad", "shimadzu_lcd", "shimadzu_gcd", "chromatotec"), pattern = NULL, data_format = c("chromatographr", "original"), metadata_format = c("chromconverter", "raw"), read_metadata = TRUE, progress_bar, cl = 1 )
paths |
Paths to files or folders containing peak list files. |
find_files |
Logical. Set to |
format_in |
Format of files to be imported/converted. Current options
include: |
pattern |
A pattern (e.g. a file extension). Defaults to |
data_format |
Either |
metadata_format |
Format to output metadata. Either |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
progress_bar |
Logical. Whether to show progress bar. Defaults to |
cl |
Argument to pbapply specifying the number
of clusters to use or a cluster object created by
makeCluster. Defaults to |
A list of data.frames containing information about peaks where
each list element represents a sample and each row represents an individual
peak in that sample.
Ethan Bass
path <- "tests/testthat/testdata/RUTIN2.D" peak_list <- read_peaklist(path) peak_list[["RUTIN2"]][["254"]]path <- "tests/testthat/testdata/RUTIN2.D" peak_list <- read_peaklist(path) peak_list[["RUTIN2"]][["254"]]
Reads 'Shimadzu' ASCII .txt) files. These files can be exported from
'Shimadzu LabSolutions' by right clicking on samples in the sample list and
selecting File Conversion:Convert to ASCII.
read_shimadzu( path, what = "chroms", format_in = NULL, include = c("fid", "lc", "dad", "uv", "tic"), format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), peaktable_format = c("chromatographr", "original"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), ms_format = c("data.frame", "list"), collapse = TRUE, scale = TRUE )read_shimadzu( path, what = "chroms", format_in = NULL, include = c("fid", "lc", "dad", "uv", "tic"), format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), peaktable_format = c("chromatographr", "original"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), ms_format = c("data.frame", "list"), collapse = TRUE, scale = TRUE )
path |
Path to Shimadzu |
what |
Whether to extract chromatograms ( |
format_in |
This argument is deprecated and is no longer required. |
include |
Which chromatograms to include. Options are |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
peaktable_format |
Whether to return peak tables in |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
ms_format |
Whether to return mass spectral data as a (long)
|
collapse |
Logical. Whether to collapse lists that only contain a single
element. Defaults to |
scale |
Whether to scale the data by the scaling factor present in the
file. Defaults to |
A nested list of elements from the specified file, where the top
levels are chromatograms, peak tables, and/or mass spectra according to the
value of what. Chromatograms are returned in the format specified by
format_out.
Ethan Bass
Other 'Shimadzu' parsers:
read_shimadzu_gcd(),
read_shimadzu_lcd(),
read_shimadzu_qgd(),
read_sz_lcd_2d(),
read_sz_lcd_3d()
path <- "tests/testthat/testdata/ladder.txt" read_shimadzu(path)path <- "tests/testthat/testdata/ladder.txt" read_shimadzu(path)
Read chromatogram data streams from 'Shimadzu' .gcd files.
read_shimadzu_gcd( path, what = "chroms", format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), collapse = TRUE )read_shimadzu_gcd( path, what = "chroms", format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), collapse = TRUE )
path |
Path to 'Shimadzu' |
what |
What stream to get: current options are chromatograms
( |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
collapse |
Logical. Whether to collapse lists that only contain a single
element. Defaults to |
A parser to read chromatogram data streams from 'Shimadzu' .gcd files.
GCD files are encoded as 'Microsoft' OLE documents. The parser relies on the
olefile package in Python to unpack the
files. The PDA data is encoded in a stream called PDA 3D Raw Data:3D Raw Data.
The GCD data stream contains a segment for each retention time, beginning
with a 24-byte header.
The 24 byte header consists of the following fields:
4 bytes: segment label (17234).
4 bytes: Little-endian integer specifying the sampling interval in milliseconds.
4 bytes: Little-endian integer specifying the number of values in the file.
4 bytes: Little-endian integer specifying the total number of bytes in the file (However, this seems to be off by a few bytes?).
8 bytes of 00s
After the header, the data are simply encoded as 64-bit (little-endian) floating-point numbers. The retention times can be (approximately?) derived from the number of values and the sampling interval encoded in the header.
A 2D chromatogram in the format specified by data_format and
format_out. If data_format is wide, the chromatogram will be returned
with retention times as rows and a single column for the intensity. If long
format is requested, two columns will be returned: one for the
retention time and one for the intensity. The format_out argument
determines whether the chromatogram is returned as a matrix, data.frame,
or data.table. Metadata can be attached to the chromatogram as attributes
if read_metadata is TRUE.
This parser is experimental and may still need some work. It is not yet able to interpret much metadata from the files.
Ethan Bass
Other 'Shimadzu' parsers:
read_shimadzu(),
read_shimadzu_lcd(),
read_shimadzu_qgd(),
read_sz_lcd_2d(),
read_sz_lcd_3d()
Read 3D PDA or 2D chromatogram streams from 'Shimadzu' .lcd files.
read_shimadzu_lcd( path, what, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), scale = TRUE, collapse = TRUE )read_shimadzu_lcd( path, what, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), scale = TRUE, collapse = TRUE )
path |
Path to 'Shimadzu' |
what |
What stream to get: current options are |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
scale |
Whether to scale the data by the scaling factor present in the
file. Defaults to |
collapse |
Logical. Whether to collapse lists that only contain a single
element. Defaults to |
A parser to read data from 'Shimadzu' .lcd files. LCD files are
encoded as 'Microsoft' OLE documents. The parser relies on the
olefile package in Python to unpack the
files. The PDA data is encoded in a stream called PDA 3D Raw Data:3D Raw Data.
The PDA data stream contains a segment for each retention time, beginning
with a 24-byte header.
The 24 byte header consists of the following fields:
4 bytes: segment label (17234).
4 bytes: Little-endian integer specifying the sampling rate along the time axis for 2D streams or along the spectral axis (?) for PDA streams.
4 bytes: Little-endian integer specifying the number of values in the file (for 2D data) or the number of wavelength values in each segment (for 3D data).
4 bytes: Little-endian integer specifying the total number of bytes in the segment.
8 bytes of 00.
For 3D data, Each time point is divided into two sub-segments, which begin and end with an integer specifying the length of the sub-segment in bytes. 2D data are structured similarly but with more segments. All known values in this the LCD data streams are little-endian and the data are delta-encoded. The first hexadecimal digit of each value is a sign digit specifying the number of bytes in the delta and whether the value is positive or negative. The sign digit represents the number of hexadecimal digits used to encode each value. Even numbered sign digits correspond to positive deltas, whereas odd numbers indicate negative deltas. Positive values are encoded as little-endian integers, while negative values are encoded as two's complements. The value at each position is derived by subtracting the delta at each position from the previous value.
A chromatogram or list of chromatograms in the format specified by
data_format and format_out. If data_format is wide, the
chromatogram(s) will be returned with retention times as rows and a
single column for the intensity. If long format is requested, two
columns will be returned: one for the retention time and one for the intensity.
The format_out argument determines whether chromatograms are returned
in matrix, data.frame, or data.table format. Metadata will be
attached to the chromatogram as attributes when read_metadata is TRUE.
My parsing of the date-time format seems to be a little off, since the acquisition times diverge slightly from the ASCII file.
Ethan Bass
Other 'Shimadzu' parsers:
read_shimadzu(),
read_shimadzu_gcd(),
read_shimadzu_qgd(),
read_sz_lcd_2d(),
read_sz_lcd_3d()
## Not run: read_shimadzu_lcd(path) ## End(Not run)## Not run: read_shimadzu_lcd(path) ## End(Not run)
Reads 'Shimadzu GCMSsolution' .qgd GC-MS data files.
read_shimadzu_qgd( path, what = c("MS1", "TIC"), format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), collapse = TRUE )read_shimadzu_qgd( path, what = c("MS1", "TIC"), format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), collapse = TRUE )
path |
Path to 'Shimadzu' |
what |
What stream to get: current options are |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
collapse |
Logical. Whether to collapse lists that only contain a single
element. Defaults to |
The MS data is stored in the "GCMS Raw Data" storage, which contains a
MS Raw Data stream with MS scans, a TIC Data stream containing the total
ion chromatogram, and a Retention Time stream containing the retention
times. All known values are little-endian. The retention time stream is a
simple array of 4-byte integers. The TIC stream is a simple array of 8-byte
integers corresponding to retention times stored in the retention time stream.
The MS Raw Data stream is blocked by retention time. Each block begins with a
header consisting of the following elements:
scan number (4-byte integer)
retention time (4-byte integer)
unknown (12-bytes)
number of bytes in intensity values (2-byte integer)
unknown (8-bytes)
After the header, the rest of the block consists of an array of mz values and intensities. The mz values are encoded as 2-byte integers where each mz value is scaled by a factor of 20. Intensities are encoded as (unsigned) integers with variable byte-length defined by the value in the header.
A chromatogram or list of chromatograms in the format specified by
data_format and format_out. If data_format is wide, the
chromatogram(s) will be returned with retention times as rows and a single
column for the intensity. If long format is requested, two columns
will be returned: one for the retention time and one for the intensity.
The format_out argument determines whether chromatograms are returned
as a matrix, data.frame, or data.table. Metadata will be
attached to the chromatogram as attributes if read_metadata is TRUE.
This parser is experimental and may still need some work. It is not yet able to interpret much metadata from the files.
Ethan Bass
Other 'Shimadzu' parsers:
read_shimadzu(),
read_shimadzu_gcd(),
read_shimadzu_lcd(),
read_sz_lcd_2d(),
read_sz_lcd_3d()
Reads 2D PDA data stream from 'Shimadzu' .lcd files.
read_sz_lcd_2d( path, format_out = "data.frame", data_format = "wide", read_metadata = TRUE, metadata_format = "shimadzu_lcd", scale = TRUE )read_sz_lcd_2d( path, format_out = "data.frame", data_format = "wide", read_metadata = TRUE, metadata_format = "shimadzu_lcd", scale = TRUE )
path |
Path to 'Shimadzu' |
format_out |
Matrix or data.frame. |
data_format |
Either |
read_metadata |
Logical. Whether to attach metadata. |
metadata_format |
Format to output metadata. Either |
scale |
Whether to scale the data by the value factor. |
A parser to read chromatogram data streams from 'Shimadzu' .lcd files.
LCD files are encoded as 'Microsoft' OLE documents. The parser relies on the
olefile package in Python to unpack the
files. The chromatogram data is encoded in streams titled
LSS Raw Data:Chromatogram Ch<#>. The chromatogram data streams begin
with a 24-byte header.
The 24 byte header consists of the following fields:
4 bytes: segment label (17234).
4 bytes: Little-endian integer specifying the sampling rate (in milliseconds).
4 bytes: Little-endian integer specifying the number of values in the file.
4 bytes: Little-endian integer specifying the total number of bytes in the file.
8 bytes of 00s
Each segment is divided into multiple sub-segments, which begin and end with an integer specifying the length of the sub-segment in bytes. All known values in this data stream are little-endian and the data are delta-encoded. The first hexadecimal digit of each value is a sign digit specifying the number of bytes in the delta and whether the value is positive or negative. The sign digit represents the number of hexadecimal digits used to encode each value. Even numbered sign digits correspond to positive deltas, whereas odd numbers indicate negative deltas. Positive values are encoded as little-endian integers, while negative values are encoded as two's complements. The value at each position is derived by subtracting the delta at each position from the previous value.
One or more 2D chromatograms from the chromatogram streams in
matrix or data.frame format, according to the value of
format_out. If multiple chromatograms are found, they will be returned as a list of matrices or data.frames. The chromatograms will be returned in wide or long format according to the value of data_format'.
Ethan Bass
Other 'Shimadzu' parsers:
read_shimadzu(),
read_shimadzu_gcd(),
read_shimadzu_lcd(),
read_shimadzu_qgd(),
read_sz_lcd_3d()
Reads 3D PDA data stream from 'Shimadzu' .lcd files.
read_sz_lcd_3d( path, format_out = "matrix", data_format = "wide", read_metadata = TRUE, metadata_format = "shimadzu_lcd", scale = TRUE )read_sz_lcd_3d( path, format_out = "matrix", data_format = "wide", read_metadata = TRUE, metadata_format = "shimadzu_lcd", scale = TRUE )
path |
Path to 'Shimadzu' |
format_out |
Class of output. Either |
data_format |
Either |
read_metadata |
Logical. Whether to attach metadata. |
metadata_format |
Format to output metadata. Either |
scale |
Whether to scale the data by the value factor. |
A parser to read PDA data from 'Shimadzu' .lcd files. LCD files are
encoded as 'Microsoft' OLE documents. The parser relies on the
olefile package in Python to unpack the
files. The PDA data is encoded in a stream called PDA 3D Raw Data:3D Raw Data.
The PDA data stream contains a segment for each retention time, beginning
with a 24-byte header.
The 24 byte header consists of the following fields:
4 bytes: segment label (17234).
4 bytes: Little-endian integer specifying the wavelength bandwidth (?).
4 bytes: Little-endian integer specifying the number of wavelength values in the segment.
4 bytes: Little-endian integer specifying the total number of bytes in the segment.
8 bytes of 00s
Each segment is divided into two sub-segments, which begin and end with an integer specifying the length of the sub-segment in bytes. All known values in this data stream are little-endian and the data are delta-encoded. The first hexadecimal digit of each value is a sign digit specifying the number of bytes in the delta and whether the value is positive or negative. The sign digit represents the number of hexadecimal digits used to encode each value. Even numbered sign digits correspond to positive deltas, whereas odd numbers indicate negative deltas. Positive values are encoded as little-endian integers, while negative values are encoded as two's complements. The value at each position is derived by subtracting the delta at each position from the previous value.
A 3D chromatogram from the PDA stream in matrix, data.frame, or
data.table format, according to the value of format_out.
The chromatograms will be returned in wide or long format according to
the value of data_format.
Ethan Bass
Other 'Shimadzu' parsers:
read_shimadzu(),
read_shimadzu_gcd(),
read_shimadzu_lcd(),
read_shimadzu_qgd(),
read_sz_lcd_2d()
Converts ThermoRawFiles to mzML by calling the ThermoRawFileParser from the command-line.
read_thermoraw( path, path_out = NULL, format_out = c("matrix", "data.frame"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), verbose = getOption("verbose") )read_thermoraw( path, path_out = NULL, format_out = c("matrix", "data.frame"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw"), verbose = getOption("verbose") )
path |
Path to 'Thermo' |
path_out |
Path to directory to export |
format_out |
Class of output. Either |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
verbose |
Logical. Whether to print output from ThermoRawFileParser to the console. |
To use this function, the ThermoRawFileParser must be manually installed.
A chromatogram in the format specified by the format_out and
data_format arguments.
Exports chromatograms in mzML format to the folder
specified by path_out.
Ethan Bass
Hulstaert Niels, Jim Shofstahl, Timo Sachsenberg, Mathias Walzer, Harald Barsnes, Lennart Martens, and Yasset Perez-Riverol. ThermoRawFileParser: Modular, Scalable, and Cross-Platform RAW File Conversion. Journal of Proteome Research 19, no. 1 (January 3, 2020): 537–42. doi:10.1021/acs.jproteome.9b00328.
Other external parsers:
call_entab(),
call_openchrom(),
call_rainbow(),
sp_converter(),
uv_converter()
## Not run: read_thermoraw(path) ## End(Not run)## Not run: read_thermoraw(path) ## End(Not run)
Read peak list(s) from 'Varian MS Workstation'.
read_varian_peaklist(path)read_varian_peaklist(path)
path |
Path to 'Varian' peak list file. |
A data.frame containing the information from the specified report.
Ethan Bass
Other 'Varian' parsers:
read_varian_sms()
## Not run: read_varian_peaklist(path) ## End(Not run)## Not run: read_varian_peaklist(path) ## End(Not run)
Reads 'Varian Workstation' SMS files.
read_varian_sms( path, what = c("MS1", "TIC", "BPC"), format_out = c("matrix", "data.frame", "data.table"), data_format = "long", read_metadata = TRUE, collapse = TRUE )read_varian_sms( path, what = c("MS1", "TIC", "BPC"), format_out = c("matrix", "data.frame", "data.table"), data_format = "long", read_metadata = TRUE, collapse = TRUE )
path |
Path to 'Varian' |
what |
Whether to extract chromatograms ( |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
collapse |
Logical. Whether to collapse lists that only contain a single
element. Defaults to |
Varian SMS files begin with a "DIRECTORY" with offsets for each section. The
first section (in all the files I've been able to inspect) is "MSData"
generally beginning at byte 3238. This MSdata section is in turn divided into
two sections. The first section (after a short header) contains chromatogram
data. Some of the information found in this section includes scan numbers,
retention times, (as 64-bit floats), the total ion chromatogram (TIC), the
base peak chromatogram (BPC), ion time (µsec), as well as some other
unidentified information. The scan numbers and intensities for the TIC and
BPC are stored at 4-byte little-endian integers. Following this section,
there is a series of null bytes, followed by a series of segments containing
the mass spectra.
The encoding scheme for the mass spectra is somewhat more complicated. Each
scan is represented by a series of values of variable length separated from
the next scan by two null bytes. Within these segments, values are paired.
The first value in each pair represents the delta-encoded mass-to-charge ratio,
while the second value represents the intensity of the signal. Values in this
section are variable-length, big-endian integers that are encoded using a
selective bit masking based on the leading digit (d) of each value.
The length of each integer seems to be determined as 1 + (d %/% 4). Integers
beginning with digits 0-3 are simple 2-byte integers. If d >= 4, values are
determined by masking to preserve the lowest n bits according to the
following scheme:
d = 4-5 -> preserve lowest 13 bits
d = 6-7 -> preserve lowest 14 bits
d = 8-9 -> preserve lowest 21 bits
d = 10-11 (A-B) -> preserve lowest 22 bits
d = 12-13 (C-D) -> preserve lowest 27 bits
d = 14-15 (E-F) -> preserve lowest 28 bits (?)
A chromatogram or list of chromatograms from the specified file,
according to the value of what. Chromatograms are returned in the format
specified by format_out.
There is still only limited support for the extraction of metadata from this file format. Also, the timestamp conversions aren't quite right.
Ethan Bass
Other 'Varian' parsers:
read_varian_peaklist()
## Not run: read_varian_sms(path) ## End(Not run)## Not run: read_varian_sms(path) ## End(Not run)
Reads 'Waters' ASCII .arw files.
read_waters_arw( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw") )read_waters_arw( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw") )
path |
Path to Waters |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
For help exporting files from Empower, you can consult the official documentation: How_to_export_3D_raw_data_from_Empower.
A chromatogram in the format specified by the format_out and
data_format arguments.
Ethan Bass
Other 'Waters' parsers:
read_waters_raw()
Reads 'Waters MassLynx' (.raw) files into R.
read_waters_raw( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw") )read_waters_raw( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw") )
path |
Path to Waters |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
A chromatogram in the format specified by the format_out and
data_format arguments.
For now this parser only reads 1D chromatograms (not mass spectra or DAD data) and does not support parsing of metadata from 'Waters' RAW files.
Ethan Bass
Other 'Waters' parsers:
read_waters_arw()
Converts a single chromatogram from MassHunter .sp format to R
data.frame using the Aston file parser.
sp_converter( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw") )sp_converter( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), read_metadata = TRUE, metadata_format = c("chromconverter", "raw") )
path |
Path to file. |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
A chromatogram in the format specified by the format_out and
data_format arguments.
Other external parsers:
call_entab(),
call_openchrom(),
call_rainbow(),
read_thermoraw(),
uv_converter()
Converts a single chromatogram from ChemStation .uv format to R
data.frame.
uv_converter( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), correction = TRUE, read_metadata = TRUE, metadata_format = c("chromconverter", "raw") )uv_converter( path, format_out = c("matrix", "data.frame", "data.table"), data_format = c("wide", "long"), correction = TRUE, read_metadata = TRUE, metadata_format = c("chromconverter", "raw") )
path |
Path to file |
format_out |
Class of output. Either |
data_format |
Whether to return data in |
correction |
Logical. Whether to apply empirical correction. Defaults is TRUE. |
read_metadata |
Logical. Whether to attach metadata. Defaults to |
metadata_format |
Format to output metadata. Either |
Uses the Aston file parser.
A chromatogram in the format specified by the format_out and
data_format arguments.
Other external parsers:
call_entab(),
call_openchrom(),
call_rainbow(),
read_thermoraw(),
sp_converter()
Exports a chromatogram in ANDI (Analytical Data Interchange) chromatography
format (ASTM E1947-98). This format can only accommodate unidimensional data.
For two-dimensional chromatograms, the column to export can be specified
using the lambda argument. Otherwise, a warning will be generated and
the first column of the chromatogram will be exported.
write_andi_chrom(x, path_out, sample_name = NULL, lambda = NULL, force = FALSE)write_andi_chrom(x, path_out, sample_name = NULL, lambda = NULL, force = FALSE)
x |
A chromatogram in (wide) format. |
path_out |
The path to write the file. |
sample_name |
The name of the file. If a name is not provided, the name
will be derived from the |
lambda |
The wavelength to export (for 2-dimensional chromatograms).
Must be a string matching one the columns in |
force |
Whether to overwrite existing files at the specified path.
Defaults to |
Invisibly returns the path to the written CDF file.
Exports a chromatogram in ANDI chromatography format (netCDF) in the directory
specified by path_out. The file will be named according to the value
of sample_name. If no sample_name is provided, the sample_name
attribute will be used if it exists.
Ethan Bass
Other write functions:
write_chroms(),
write_mzml()
Writes chromatograms to disk in the format specified by export_format:
either mzml, cdf, csv, or arw.
write_chroms( chrom_list, path_out, export_format = c("mzml", "cdf", "csv", "arw"), what = "", force = FALSE, show_progress = TRUE, verbose = getOption("verbose"), ... )write_chroms( chrom_list, path_out, export_format = c("mzml", "cdf", "csv", "arw"), what = "", force = FALSE, show_progress = TRUE, verbose = getOption("verbose"), ... )
chrom_list |
A list of chromatograms. |
path_out |
Path to directory for writing files. |
export_format |
Format to export files: either |
what |
What to write. Argument to |
force |
Logical. Whether to overwrite existing files. Defaults to |
show_progress |
Logical. Whether to show progress bar. Defaults to |
verbose |
Logical. Whether to print verbose output. |
... |
Additional arguments to write function. |
No return value. The function is called for its side effects.
Exports a chromatogram in the file format specified by export_format in the
directory specified by path_out.
Ethan Bass
Other write functions:
write_andi_chrom(),
write_mzml()
This function constructs mzML files by writing XML strings directly to a file connection. While this approach is fast, it may be less flexible than methods based on an explicit Document Object Model (DOM).
write_mzml( data, path_out, sample_name = NULL, what = NULL, instrument_info = NULL, compress = TRUE, indexed = TRUE, force = FALSE, show_progress = TRUE, verbose = getOption("verbose") )write_mzml( data, path_out, sample_name = NULL, what = NULL, instrument_info = NULL, compress = TRUE, indexed = TRUE, force = FALSE, show_progress = TRUE, verbose = getOption("verbose") )
data |
List of |
path_out |
The path to write the file. |
sample_name |
The name of the file. If a name is not provided, the name
will be derived from the |
what |
Which streams to write to mzML: |
instrument_info |
Instrument info to write to mzML file. |
compress |
Logical. Whether to use zlib compression. Defaults to |
indexed |
Logical. Whether to write indexed mzML. Defaults to |
force |
Logical. Whether to overwrite existing files at |
show_progress |
Logical. Whether to show progress bar. Defaults to |
verbose |
Logical. Whether or not to print status messages. |
The function supports writing various types of spectral data including MS1,
TIC (Total Ion Current), BPC (Base Peak Chromatogram), and DAD
(Diode Array Detector) data. DAD spectra are written as electromagnetic
radiation spectra (MS:1000804) using Thermo's naming convention with
controllerType=4 in the spectrum ID for compatibility with existing
tools. Support for MS2 may be added in a future release.
If indexed = TRUE, the function will generate an indexed mzML file, which
allows faster random access to spectra.
Invisibly returns the path to the written mzML file.
Ethan Bass
Other write functions:
write_andi_chrom(),
write_chroms()