Skip to content

File input

polars_st.read_file #

read_file(
    path_or_buffer: Path | str | bytes,
    /,
    layer: int | str | None = None,
    encoding: str | None = None,
    columns: Sequence[str] | None = None,
    read_geometry: bool = True,
    force_2d: bool = False,
    skip_features: int = 0,
    max_features: int | None = None,
    where: str | None = None,
    bbox: tuple[float, float, float, float] | None = None,
    fids: Sequence[int] | None = None,
    sql: str | None = None,
    sql_dialect: str | None = None,
    return_fids: bool = False,
) -> GeoDataFrame

Read OGR data source into a GeoDataFrame.

IMPORTANT: non-linear geometry types (e.g., MultiSurface) are converted to their linear approximations.

Parameters:

  • path_or_buffer (Path | str | bytes) –

    A dataset path or URI, or raw buffer.

  • layer (int | str | None, default: None ) –

    If an integer is provided, it corresponds to the index of the layer with the data source. If a string is provided, it must match the name of the layer in the data source. Defaults to first layer in data source.

  • encoding (str | None, default: None ) –

    If present, will be used as the encoding for reading string values from the data source, unless encoding can be inferred directly from the data source.

  • columns (Sequence[str] | None, default: None ) –

    List of column names to import from the data source. Column names must exactly match the names in the data source, and will be returned in the order they occur in the data source. To avoid reading any columns, pass an empty list-like.

  • read_geometry (bool, default: True ) –

    If True, will read geometry into WKB. If False, geometry will be None. Defaults to True.

  • force_2d (bool, default: False ) –

    If the geometry has Z values, setting this to True will cause those to be ignored and 2D geometries to be returned. Defaults to False.

  • skip_features (int, default: 0 ) –

    Number of features to skip from the beginning of the file before returning features. Must be less than the total number of features in the file.

  • max_features (int | None, default: None ) –

    Number of features to read from the file. Must be less than the total number of features in the file minus skip_features (if used).

  • where (str | None, default: None ) –

    Where clause to filter features in layer by attribute values. Uses a restricted form of SQL WHERE clause, defined here. For examples:

    • "ISO_A3 = 'CAN'"
    • "POP_EST > 10000000 AND POP_EST < 100000000"
  • bbox (tuple[float, float, float, float] | None, default: None ) –

    If present, will be used to filter records whose geometry intersects this box. This must be in the same CRS as the dataset. If GEOS is present and used by GDAL, only geometries that intersect this bbox will be returned; if GEOS is not available or not used by GDAL, all geometries with bounding boxes that intersect this bbox will be returned.

  • fids (Sequence[int] | None, default: None ) –

    Array of integer feature id (FID) values to select. Cannot be combined with other keywords to select a subset (skip_features, max_features, where or bbox). Note that the starting index is driver and file specific (e.g. typically 0 for Shapefile and 1 for GeoPackage, but can still depend on the specific file). The performance of reading a large number of features usings FIDs is also driver specific.

  • sql (str | None, default: None ) –

    The SQL statement to execute. Look at the sql_dialect parameter for more information on the syntax to use for the query. When combined with other keywords like columns, skip_features, max_features, where, bbox, or mask, those are applied after the SQL query. Be aware that this can have an impact on performance, (e.g. filtering with the bbox or mask keywords may not use spatial indexes). Cannot be combined with the layer or fids keywords.

  • sql_dialect (str | None, default: None ) –

    The SQL dialect the SQL statement is written in. Possible values:

    • None: if the data source natively supports SQL, its specific SQL dialect will be used by default (eg. SQLite and Geopackage: SQLITE, PostgreSQL). If the data source doesn't natively support SQL, the OGRSQL dialect is the default.
    • OGRSQL: can be used on any data source. Performance can suffer when used on data sources with native support for SQL.
    • SQLITE: can be used on any data source. All spatialite_ functions can be used. Performance can suffer on data sources with native support for SQL, except for Geopackage and SQLite as this is their native SQL dialect.
  • return_fids (bool, default: False ) –

    If True, will return the FIDs of the feature that were read.