# Generate data from es

## Requeriments

1. ElasticSearch 6 (with data)  deployed. There is a docker-compose file to deploy it locally [here](./utils/docker-compose.yml)

## Steps

1. Create a python3 virtual environment and activate it (optional).

```
python3 -m venv /tmp/babiaxr
source /tmp/babiaxr/bin/activate
```

2. Install requirements

```
pip install -r requirements.txt
```

3. Execute the code with the following arguments

```
python3 get_list.py --es-url <elasticsearch_url> --index <elasticsearch_index> --fields <list_fields_to_fetch> --date-field <field_that_define_the_date>
```

- Default values:
    - `date-field`: Field that will define the date in the query, default value: `grimoire_creation_date`
    - `fields`: List of fields that will be retrieved for each item, default value: `["_score", "file_path", "blanks_per_loc", "ccn", "comments", "comments_per_loc", "loc", "loc_per_function", "num_funs", "tokens"]`
    - `output-file`: Output file path, default value: `data.json`
    - `source-code`: Only get the source code files
    - `date`: Get data of an specific date (in epoch millisec format)
    
## Examples

```
python3 get_list.py --es-url http://localhost:9200 --index perceval_enriched --source-code
```
