pytximport.core =============== .. py:module:: pytximport.core .. autoapi-nested-parse:: Expose the functions in the core module. Attributes ---------- .. autoapisummary:: pytximport.core.pytximport Functions --------- .. autoapisummary:: pytximport.core.tximport Package Contents ---------------- .. py:data:: pytximport .. py:function:: tximport(file_paths, data_type = 'salmon', transcript_gene_map = None, counts_from_abundance = None, gene_level = False, return_transcript_data = False, inferential_replicates = False, inferential_replicate_transformer = None, inferential_replicate_variance = False, ignore_transcript_version = True, ignore_after_bar = True, id_column = None, counts_column = None, length_column = None, abundance_column = None, custom_importer = None, existence_optional = False, read_length = None, output_type = 'anndata', output_format = 'csv', output_path = None, output_path_overwrite = False, return_data = True, biotype_filter = None) Import transcript-level quantification files and convert them to gene-level expression estimates. Basic usage: .. code-block:: python from pytximport import tximport txi = tximport( ["quant_1.sf", "quant_2.sf"], data_type="salmon", transcript_gene_map="transcript_to_gene_map.tsv", counts_from_abundance="length_scaled_tpm", ) :param file_paths: The paths to the quantification files. :type file_paths: List[Union[str, Path]] :param data_type: The type of quantification files. Defaults to "salmon". :type data_type: Literal["kallisto", "salmon", "sailfish", "oarfish", "piscem", "stringtie", "rsem", "tsv"] :param transcript_gene_map: The mapping from transcripts to genes. Has to contain two columns: `transcript_id` and `gene_id`. If you provide a path to a file, it has to be either a tab-separated (.tsv) or comma-separated (.csv) file with a header. Defaults to None. :type transcript_gene_map: Optional[Union[pd.DataFrame, Union[str, Path]], optional :param counts_from_abundance: Whether to calculate count estimates based on the abundance. When using scaled_tpm or length_scaled_tpm the counts no longer correlate with the the average transcript length per sample. In those cases, the length offset matrix should not be used for downstream analysis. Note, that this does not normalize the sequencing depth, only the difference in transcript length. When using the gene-summarized counts and not count estimates based on the abundance, the length offset matrix included in the output from this function should be used for downstream analysis. If your downstream analysis tool does not support the length offset matrix, you should probably use `length_scaled_tpm` for gene-level analysis. For transcript-level analysis, we recommend that you use `scaled_tpm` or `dtu_scaled_tpm`. For further guidance on transcript-level analysis, please refer to: https://doi.org/10.12688/f1000research.15398.3. Defaults to None. :type counts_from_abundance: Optional[Literal["scaled_tpm", "length_scaled_tpm", "dtu_scaled_tpm"]], optional :param gene_level: Whether the input files are at the gene level. This is only the case for some RSEM quantification files. Defaults to False. :type gene_level: bool, optional :param return_transcript_data: Whether to return the transcript-level expression. Defaults to False. :type return_transcript_data: bool, optional :param inferential_replicates: Whether to parse and include inferential replicates in the output. If you want to recalculate the counts from inferential replicates, please set this option to True and provide a `inferential_replicate_transformer`. Defaults to False. :type inferential_replicates: bool, optional :param inferential_replicate_transformer: A custom function to transform the inferential replicates. Defaults to None. :type inferential_replicate_transformer: Optional[Callable], optional :param inferential_replicate_variance: Whether to return the variance of the inferential replicates. Defaults to False. :type inferential_replicate_variance: bool, optional :param ignore_transcript_version: Whether to ignore the transcript version. Defaults to True. :type ignore_transcript_version: bool, optional :param ignore_after_bar: Whether to split the transcript id after the bar character (`|`). Defaults to True. :type ignore_after_bar: bool, optional :param id_column: The column name for the transcript id. Defaults to None. :type id_column: Optional[str], optional :param counts_column: The column name for the counts. Defaults to None. :type counts_column: Optional[str], optional :param length_column: The column name for the length. Defaults to None. :type length_column: Optional[str], optional :param abundance_column: The column name for the abundance. Defaults to None. :type abundance_column: Optional[str], optional :param custom_importer: A custom importer function. Defaults to None. :type custom_importer: Optional[Callable], optional :param existence_optional: Whether the existence of the files is optional. Defaults to False. :type existence_optional: bool, optional :param read_length: The read length for the stringtie quantification. Defaults to None. :type read_length: Optional[int], optional :param output_type: The type of output. Defaults to "anndata". :type output_type: Literal["xarray", "anndata"], optional :param output_format: The type of output file. Defaults to "csv". :type output_format: Literal["csv", "h5ad"], optional :param output_path: The path to save the gene-level expression. Defaults to None. :type output_path: Optional[Union[str, Path]], optional :param output_path_overwrite: Whether to overwrite the save path if it already exists. Defaults to False. :type output_path_overwrite: bool, optional :param return_data: Whether to return the gene-level expression. Defaults to True. :type return_data: bool, optional :param biotype_filter: Filter the transcripts by biotype, including only those provided. Enables post-hoc filtering of the data based on the biotype of the transcripts. Assumes that the biotype is present in the transcript_id of the data, bar-separated. If this is not the case, please use the `filter_by_biotype` function from the `pytximport.utils` module instead. Please note that the abundance will NOT be recalculated after filtering to avoid introducing bias. If you wish to recalculate the abundance, please use the `filter_by_biotype` function from the `pytximport.utils` module instead. Defaults to None. :type biotype_filter: List[str], optional :returns: The estimated gene-level or transcript-level expression data if `return_data` is True, else None. :rtype: Union[xr.Dataset, ad.AnnData, SummarizedExperiment, None]