pandas.read_html(io, match='.+', flavor=None, header=None, index_col=None, skiprows=None, attrs=None, parse_dates=False, thousands=', ', encoding=None, decimal='.', converters=None, na_values=None, keep_default_na=True, displayed_only=True)
[source]
Read HTML tables into a list
of DataFrame
objects.
Parameters: |
|
---|---|
Returns: |
|
See also
Before using this function you should read the gotchas about the HTML parsing libraries.
Expect to do some cleanup after you call this function. For example, you might need to manually assign column names if the column names are converted to NaN when you pass the header=0
argument. We try to assume as little as possible about the structure of the table and push the idiosyncrasies of the HTML contained in the table to the user.
This function searches for <table>
elements and only for <tr>
and <th>
rows and <td>
elements within each <tr>
or <th>
element in the table. <td>
stands for “table data”. This function attempts to properly handle colspan
and rowspan
attributes. If the function has a <thead>
argument, it is used to construct the header, otherwise the function attempts to find the header within the body (by putting rows with only <th>
elements into the header).
New in version 0.21.0.
Similar to read_csv()
the header
argument is applied after skiprows
is applied.
This function will always return a list of DataFrame
or it will fail, e.g., it will not return an empty list.
See the read_html documentation in the IO section of the docs for some examples of reading in HTML tables.
© 2008–2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/0.25.0/reference/api/pandas.read_html.html