README.md 44.4 KB
Newer Older
Agarwal, Shyamal's avatar
Agarwal, Shyamal committed
1
2
# NSRR resources & scripts

Shaun Purcell's avatar
Shaun Purcell committed
3
NSRR common and study-specific resources and scripts, including NAP. This repository has the following structure.
Agarwal, Shyamal's avatar
Agarwal, Shyamal committed
4
5

```
6
 nap/           NSRR Automated Pipeline components
Agarwal, Shyamal's avatar
Agarwal, Shyamal committed
7

8
9
10
 common/        Generic NSRR-wide information and resources;
   /resources/  NAP signal & annotation mapping files
   /dict/       NAP data-dictionaries
Agarwal, Shyamal's avatar
Agarwal, Shyamal committed
11

Shaun Purcell's avatar
Shaun Purcell committed
12
 studies/       Study-specific resources & core scripts
13
14
15
16
17
                - e.g. as used to reformat annotation data prior to NAP
                - also any study-specific configurations or mappings
                - each subfolder is a study
```

18
19
20
21
22
23
24
This document contains a detailed description of the NAP pipeline, including examples of how to run it.

For a higher-level overview of what the NSRR harmonization process is
attempting to achieve, please
see[common/harm-principles.md](common/harm-principles.md).


Shaun Purcell's avatar
Shaun Purcell committed
25
26
__Table of Contents__

Shaun Purcell's avatar
Shaun Purcell committed
27
28
29
30
31
32
33
34
35
36
37
38
- [NAP preliminaries](#nap-preliminaries)
- [Running NAP](#running-nap)
- [NAP primary steps](#nap-primary-steps)
   - [Harmonizing annotations and channels](#1-harmonizing-annotations-and-channels)	      
   - [Determining whether valid staging exists](#2-determining-whether-valid-staging-exists)	      
   - [Generating channel-level summary statistics](#3-generating-channel-level-summary-statistics)      
   - [Generating the harmonized EDFs](#4-generating-the-harmonized-edfs)		      
   - [Generating the base EDFs](#5-generating-the-base-edfs)			      
   - [Detecting/masking epochs with gross artifact](#6-detecting-epochs-with-gross-artifact)	      
   - [SOAP/SUDS](#7-soapsuds)					      
   - [Estimating a set of core derived metrics](#8-estimating-a-set-of-core-derived-metrics)	      
   - [Flagging likely issues](#9-flagging-likely-issues)			      
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
39
   - [Job completion](#10-job-completion)				      
Shaun Purcell's avatar
Shaun Purcell committed
40
41
42
43
44
45
   - [NAP signal mapping](#nap-signal-mapping)				      
   - [NAP annotation mapping](#nap-annotation-mapping)                
- [Annotation mapping formatting notes](#annotation-mapping-formatting-notes)
- [Adding cohort/individual-specific mappings](#adding-cohort-individual-specific-mappings)
- [Adding new commands/domains to NAP](#adding-new-commands-domains-to-nap)
- [NAP run status](#nap-run-status)
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
46
- [Moonlight NAP viewer](#moonlight-viewer)
Shaun Purcell's avatar
Shaun Purcell committed
47
48
49
50
51
- [Collating information across individuals](#collating-information-across-individuals)
- [Configuration scripts](#configuration-scripts)
- [Misc/future development](#misc-future-development)
- [Known issues/questions](#known-issuesquestions)
- [Example: running NAP on ERIS](#example-running-nap-on-eris)
Shaun Purcell's avatar
Shaun Purcell committed
52

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
53
- [Appendix: harmonization principles](common/harm-principles.md)
Shaun Purcell's avatar
Shaun Purcell committed
54
55
---

56
57
## NAP preliminaries

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
58
Inputs for NAP are signals (expected as an [EDF](https://www.edfplus.info)) and annotations, which can be in several formats:
59

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
60
61
62
63
 - NSRR or [Luna XML](http://zzz.bwh.harvard.edu/luna/ref/annotations/#nsrr-xml-files)
 - Luna [`.annot`](http://zzz.bwh.harvard.edu/luna/ref/annotations/#annot-files)
 - Luna [`.eannot`](http://zzz.bwh.harvard.edu/luna/ref/annotations/#eannot-files)
 - Embedded in an EDF+ as an [Annotation channel](http://zzz.bwh.harvard.edu/luna/luna/args/#annotations)
64
65

Annotation data in any other format must be be converted to one of the
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
66
67
68
69
70
71
72
73
above, preferably [`.annot`](http://zzz.bwh.harvard.edu/luna/ref/annotations/#annot-files)
before running NAP.

A further preliminary is that _body position_ information must stored
as an _annotation_ rather than as _signal_ in the EDF: because there are too many
non-standardized ways to encode body position as a signal, it is
outside the scope of NAP to automatically attempt to reserve-engineer this. See
notes below on harmonized annotation remappings for the preferred way to
74
75
encode body position as an annotation for NAP.

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
76
77
## Running NAP 

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
78
79
80
81
82
83
84
85
86
87
88
89
90
First ensure you have the most recent version of the NSRR/NAP repository
on your system: if not, [see below](#example-running-nap-on-eris).   Let's assume it is
in your home directory (e.g. `~/`).

Second, it is easiest if you have a working folder that has a Luna
[sample list](http://zzz.bwh.harvard.edu/luna/luna/args/#sample-lists)
called `s.lst`, that points to the data (EDFs and annotations) you
wish to process.  Let's assume this folder is located at `/data/working/project1`

We initiate the NAP pipeline via the `nap.sh` shell script, which is
in the repository, i.e.  `~/nsrr/nap/nap.sh` in this example.  The
basic form takes two or more options: first, a label for the run (here
_run1_); second, the path to the folder that contains the sample list (`s.lst`):
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
91
92

```
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
93
bash ~/nsrr/nap/nap.sh run1 /data/working/project1
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
94
95
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
Note that the actual EDFs/annotations do not need to also be in that same folder, i.e. the sample
list can point to other locations on the system.

By defauly, NAP will run and generate an output folder called `nap` in
the specified directory, i.e. `/data/working/project1/nap/`.

There are numerous configuration options that NAP takes, described in
the file [default.conf](https://gitlab-scm.partners.org/zzz-public/nsrr/-/blob/master/nap/default.conf)
(which is in the `nsrr/nap/` repository folder).

The most important is to specify the resources used by NAP (`NAP_RESOURCE_DIR`) and,
if automated staging is to be performed, the SUDS training data (`NAP_SUDS_DIR`) location: e.g.
you can prepend these environment variables for the `nap.sh` call as follows:

```
NAP_RESOURCE_DIR=nsrr/common/resources/ \
 NAP_SUDS_DIR=resources/suds/ \
 bash ~/nsrr/nap/nap.sh run1 /data/working/project1
```
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
115

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
116
117
118
119
120
121
122
The project name `run1` is arbitrary and not really used currently - still, ensure
that it does not contain spectial characters or whitespace. 

As described below, on ERIS, to send to the cluster, add `NAP_JOBN=10`
for get, for example, 10-fold parallelization. See other
`nsrr/nap/default.config` configuration parameters for other
LSF/ERIS-specific options.
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
123
124
125
126
127
128

On completion, check job status across runs, e.g.:
```
cat nap/*/nap.status
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
129
130
131
132
133
134
135

### Compiling results across individuals

For a whole cohort, to compile key results across runs, creating a
folder `derived1` (this can be named anything, and does not need to be
located in the working folder): 

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
136
```
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
137
138
cd /data/working/project1/
bash ~/nsrr/nap/compile.sh derived1 < ~nsrr/nap/tables.txt
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
139
140
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
141
142
i.e. you need to be in the main NAP project folder before running
this.
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
143

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
144
145
146
To convert the summary tables in `derived1/` to a format Luna/Shiny
can read, see `~/nsrr/nap/compile-coda-template.R`.  Edit and run as
needed to deposit `_pheno*RData` and `_derived*RData` files in `nap/`.
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
147

148
149
150

## NAP primary steps

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
151
Here we list the primary logic and flow of actions for a single EDF in NAP:
152
153

 - consistent with building a _generic_ pipeline (i.e. that is capable
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
154
155
   of processing most typical PSGs), we build a single set of common
   mapping files (whilst allowing for study-specific deviations)
156

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
157
158
 - we create a new set of harmonized annotations, and two new EDFs
   (_harmonized_ and _base_ EDFs)
159

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
160
 - rather than attempt to map everything, the logic is to have a
161
   list of common annotations/channels and construct as much as we can
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
162
   from the original data; that is, new files are harmonized
163
164
165
   _subsets_ of the originals

 - we nonetheless report unmapped/unused channels and annotations;
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
166
167
   (nb. this is why the original EDFs/annotations will always be
   distributed via NSRR too)
168
169


Shaun Purcell's avatar
Shaun Purcell committed
170
#### 1) Harmonizing annotations and channels
171

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
172
173
174
Following the rules described in sections below, NAP will first
generate the mapping for annotations and channels for the harmonized
(and base) EDF, as well as listing unmapped channels/annotations.
175

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
176
177
178
If the `NAP_HARMONIZE_ONLY` flag is set, NAP will stop at this point,
thereby providing a quick way to scan a whole cohort for adherence to
the NAP mapping terms.
179

Shaun Purcell's avatar
Shaun Purcell committed
180
#### 2) Determining whether valid staging exists
181
182
183

If continuing to process the sample, NAP next determines whether valid sleep stage annotations exist:

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
 - there must be at least one annotation file

 - those annotations must include terms that map to the canonical
   sleep stage terms, `N1`, `N2` etc

 - there must be sufficient _variability_ in sleep stages (i.e. not "all wake"), which
   is defined as at least two instances of wake, NREM and REM sleep


In addition, NAP will check here whether the sleep stage annotations
_align_ with the EDF epochs and EDF records.  For example, if staging
starts of 2.86 seconds from the start of the recording, this will be
flagged (i.e. as default epochs are 0-30, 30-60, etc, and so NAP will
see that 2.86-32.86, 32.86-62.86, etc) is likely to lead to epochs
that have more than one stage assigned.  This is handled downstream,
when creating the harmonized EDF by re-aligning the staging
annotations, by:
201
202

  - shifting them to the preceding second boundary (i.e. 2.00)
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
203

204
  - ensuring a 1-second EDF record standardized
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
205
206
207

  - trimming any signal not spanned by an epoch at the start or end of
    the recording (i.e. 0.00 to 2.00 in this case)
208

Shaun Purcell's avatar
Shaun Purcell committed
209
#### 3) Generating channel-level summary statistics
210

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
211
212
213
Based on the original EDF, NAP generates various statistical summaries
(e.g. MTM spectrograms and Hjorth parameters) that can be used in the
viewer to assess signal quality (epoch-by-epoch).
214

Shaun Purcell's avatar
Shaun Purcell committed
215
#### 4) Generating the harmonized EDFs
216
217
218
219
220
221
222
223
224

NAP next generates the _harmonized_ EDF:

 - the EDF record size is forced to be 1 second, if it is not already
 - as above, any unaligned sleep stages are shifted to align with EDF record boundaries (this will always be <1 second difference)
 - channel labels are harmonized, and unmapped channels are dropped
 - annotation labels are harmonized, and unmapped annotations are dropped
 - all annotations are output as a single `harm.annot` file

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
225
226
227
At this stage, note that the channel relabeling also enforces
__re-referencing__ of the EEG, EMG, ECG and EOG as needed.  i.e. we
define the harmonized files to consistently have `C4-M1`.
228

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
229
230
231
232
233
234
235
236
237
Although this deviates from the original conception as the harmonized
dataset as having only _cosmetic_ changes (in contrast to the base
EDF), in practice it is inconvenient to have to work with mixtures of
channels.  For PSG, the contralateral mastoid is the most standard EEG
reference.  (i.e. if users want something different, that is what the
original dataset is for...)

NOTE: a decision point is whether to include linked-mastoid referencing on top of contralateral mastoid
referencing in the harmonized EDFs.
238
239


Shaun Purcell's avatar
Shaun Purcell committed
240
#### 5) Generating the base EDFs
241

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
242
243
244
245
246
The base EDF represents a further subsetting _and processing_ of the
harmonized EDF; the base EDF is primarily to support subsequent
processing and extraction of _derived metrics_, rather than to be the
primary approach for distributing data.  (That is, the harmonized set should be
the natural level for data sharing.)
247
248
249

Specifically, to make the base EDF we:

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
250
251
252
253
254
255
256
257
258
259
260
261
262
263
   - reduce to a set of core channels, which specific labels (`csC3`)
     to denote these are processed versions of the original data;
     mapping is based on `nsrr/common/resources/base.canonical.sigs`

  - bandpass filter the EEG

  - resample to a fixed sample rate (e.g. 128 Hz for EEG)

  - scale to common units (e.g. uV for EEG, based on EDF headers)


TODO/DECISION POINT: whether to additionally remove epochs with gross
artifact from the base EDF (and adjust the annotations
correspondingly)
264
265
266

All subsequent analyses are based on the base EDF.

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
267

Shaun Purcell's avatar
Shaun Purcell committed
268
#### 6) Detecting epochs with gross artifact
269

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
270
271
272
Based on Hjorth parameters/RMS within signals, we flag epochs
(channel/epoch pairs) that are likely to be artifacts
(i.e. statistical outliers and grossly flat/clipped signals).
273

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
274
275
276
DECISION POINT: it may make sense to simply remove these epochs from
the base EDF, but we'd need to make sure we correctly handle the implied
changes in annotations (e.g. when lining up manual and automated staging, etc)
277
278
279

#### 7) SOAP/SUDS

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
280
281
282
283
If the study has existing and valid sleep stages, we apply the
[SOAP](http://zzz.bwh.harvard.edu/luna/ref/suds/#soap) method, for
the assessment of the consistency of staging and signals (based only
on a single EEG).
284

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
285
286
287
Further, whether there was existing staging or not, if an EEG channel
is present, we
apply[SUDS](http://zzz.bwh.harvard.edu/luna/ref/suds/#suds) to
Shaun Purcell's avatar
Shaun Purcell committed
288
289
290
automatically predict sleep stage.  If there was no prior staging,
then the predicted stages are used to structure the analyses below
(e.g. NREM spindles).
291

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
292
293
294
NOTE: this not yet fully integrated in NAP: propose to use CFS as the
default library.

295

Shaun Purcell's avatar
Shaun Purcell committed
296
#### 8) Estimating a set of core derived metrics
297

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
298
Depending on the availability of channels, we calculate (from the base EDF):
299

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
300
301
 - power spectra for the cleaned data, in NREM and REM separately
 - sleep spindles and slow oscillations during NREM
302
303
 - respiratory metrics

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
304
305
306
307
308
These outputs are compiled (by the `coda` R scripts in `nsrr/nap/`)
and can be seen in the Moonlight viewer.  The text output format is
also designed to work with the [`dmerge`](http://zzz.bwh.harvard.edu/luna/merge/merge/)
output tool, to make it easy to compile datasets across all individuals
(i.e. using `dmerge` and the data dictionaries in `nsrr/common/dict`).
309

Shaun Purcell's avatar
Shaun Purcell committed
310
#### 9) Flagging likely issues
311

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
312
For certain issues, we can flag rather than attempt to fix:
313
314
315
316
317
318

  - absence of staging data / poor coverage, i.e. truncated staging
  - (as above) gross artifact (EEG only, by epoch and channel)
  - flag potential EEG polarity issues (negative / ambiguous channels)
  - flag likely ECG contamination (e.g. EEG/ECH coherence)
  - flag likely line noise / EDFs with peaks in the <35 Hz range
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
319
320
  - flag is NREM spectral slope grossly deviant from ~2 (e.g < 1 or > 5)
  - flag if likely units issue in the EEG (i.e. large values or DC offset)
321
  - flag if low SOAP consistency
Agarwal, Shyamal's avatar
Agarwal, Shyamal committed
322

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
323
#### 10) Job completion
324

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
325
326
327
328
329
On closing, NAP will have populated `nap.log` and `nap.err` files that
can be viewed in the web viewer.

It will also create a file called `nap.status` that contains either a
`0` or `1`, as a flag for failure or success of the run.
330
331


Shaun Purcell's avatar
Shaun Purcell committed
332
### NAP signal mapping
333

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
334
335
336
337
338
339
NAP uses Luna's
[`CANONICAL`](http://zzz.bwh.harvard.edu/luna/ref/manipulations/#canonical)
command and implied rules to map channel labels (note, rather than the
[signal `alias` conventions](http://zzz.bwh.harvard.edu/luna/luna/args/#aliases)).
This allows for cohort-dependent rules to be applied, and for unmapped
channels to be automatically dropped, etc.
340
341

Notes:
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
342
343
344

 - matching on signal labels are _case-insensitive_ in NAP & Luna

345
346
 - canonical forms, where appropriate, use capitals: e.g. `SpO2`, `C3-M2`  

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
347
348
 - for consistency with prior EDF specs, will adopt `-` rather than
   `_` to indicate referencing
349
350


Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
351
There are two levels of channel mapping:
352

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
353
354
355
356
357
358
359
360
 1. From the original EDF to a harmonized EDF, based on
 [`nsrr/common/resources/harm.canonical.sigs`](https://gitlab-scm.partners.org/zzz-public/nsrr/-/blob/master/common/resources/harm.canonical.sigs). At
 this step, NAP saves a small version (i.e. 1 epoch) of the EDF with
 the mapped labels, dropping any unmapped channels

 2. From the harmonized EDF to the base EDF, using the mappings in
 [`nsrr/common/resources/base.canonical.sigs`](https://gitlab-scm.partners.org/zzz-public/nsrr/-/blob/master/common/resources/base.canonical.sigs).
 
361

Shaun Purcell's avatar
Shaun Purcell committed
362
### NAP annotation mapping
363

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
364
365
NAP uses Luna's [`remap` syntax](http://zzz.bwh.harvard.edu/luna/luna/args/#remapping-annotations)
to map annotations from original to
Shaun Purcell's avatar
Shaun Purcell committed
366
367
368
harmonized datasets.  (Base datasets have identical annotations
compared to the harmonized set, so there is no second step for
annotations).
369
370

The primary mapping is specified in the file
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
371
372
[`nsrr/common/resources/harm.annots`](https://gitlab-scm.partners.org/zzz-public/nsrr/-/blob/master/common/resources/harm.annots).
The flow is as follows:
373
374

 - NAP first writes all annotations (from one or more annotation files
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
375
376
377
378
   associated with the EDF, and including any EDF+ Annotation Channels) to
   a single `a.annot` file (in the `annots/` NAP folder for
   that individual), but without any further changes.  This file simply collates all annotations to
   facilitate the subsequent mapping step.
379
380

 - NAP then maps the annotations to a new file `annots/harm.annot`,
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
381
382
   applying mapping rules as specified in the mapping file
   [`nsrr/common/resources/harm.annots`](https://gitlab-scm.partners.org/zzz-public/nsrr/-/blob/master/common/resources/harm.annots)
383

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
384
385
386
 - NAP tracks any remapping performed; and also lists any unmapped
   annotations. The latter are excluded from the new, harmonized
   `annots/harm.annot`.
387
388

If the `NAP_HARMONIZE_ONLY` configuration parameter is set to `1`, at
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
389
390
391
this point NAP will stop processing for this individual (i.e. only
having mapped channels and annotations).  This can be useful to
quickly scan a large number of EDFs for the adequacy of the
392
393
signal/annotation mapping files, prior to the final NAP run.

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
394

395
396
## Annotation mapping formatting notes

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
397
398
399
Annotation remapping uses a special form Luna's [class/instance
conventions](http://zzz.bwh.harvard.edu/luna/ref/annotations/#luna-annotations),
denoted by the `/` character:
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414

```
remap   arousal|"Arousal ()"
remap   arousal|"Arousal|Arousal ()"
remap   arousal|"Arousal|Arousal (Standard)"

remap   arousal/spontaneous|"Spontaneous arousal|Arousal (apon aro)"
remap   arousal/spontaneous|"Spontaneous arousal|Arousal (ARO SPONT)"
remap   arousal/spontaneous|"Spontaneous arousal|Arousal (SPON ARO)"

remap   arousal/RERA|"Arousal resulting from respiratory effort|Arousal (ARO RES)"
remap   arousal/RERA|RERA
remap   arousal/RERA|"Arousal (ARO RES)"
remap   arousal/RERA|"Arousal resulting from respiratory effort|Arousal (RESP ARO)"
remap   arousal/RERA|"Respiratory effort related arousal|RERA"
Agarwal, Shyamal's avatar
Agarwal, Shyamal committed
415
416
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
417
418
419
That is, Luna's annotation format defines events by two IDs (as well as a corresponding time interval, and optionally channel labels(s)).
The above form `arousal/RERA` is equivalent to setting `arousal` as the annotation _class ID_, and `RERA` as the annotation _instance ID_.
For example, using applying this mapping, it would have been equivalent to an initial annotation file that used the following form:
420
421
422
423
424
425

```
class    instance    channel    start   stop    metadata
arousal  RERA        .          222.3   224.2   .
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
426
The use of class/instance labels in this way has several advantages:
427

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
428
 - it logically groups related sets of entities (e.g. different types of arousal)
429

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
430
 - it makes it easier to refer to the entire set, e.g. `MASK if=arousal`
431

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
432
 - it is still easy to pull out specific subsets, e.g. `MASK if=arousal[RERA]`
433

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
434
 - there is an option in Luna to simply concatenate labels back to a single term if needed (`arousal_RERA`)
435

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
436
 - any existing _instance ID_ information becomes part of the meta-data (`_inst=X`)
437
438
439

Annotation mapping is case insensitive, i.e.  above `rera` would still
map to `arousal/RERA`.  (Note, for downstream use in Luna, Luna still
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
440
treats labels as case-sensitive however.)
441

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
442
Below are some misc. notes on the annotation remapping process as it stands:
443

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
444
445
 - we've added two Luna special variables: `annot-whitelist=T` and `annot-unmapped=T` to only include
   annotations that match a remapping file, or don't, respectively.
446
447
448

 - the `SignalLocation` tag from NSRR XML is now parsed and included as the channel column in the `.annot`

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
449
 - meta-data from an NSRR XML (e.g. `SpO2Nadir`) are automatically added as string meta-data in the `.annot` file
450
451
452
453
454

 - `.annot` files now allow arbitrary key/value pairs, e.g.
   ```
   desat    .    .    10    20    SpO2Nadir=92|SpO2Baseline=89
   ```
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
455
456
457
   (In Luna, undefined meta-data are treated as strings; if `SpO2Nadir[num]` was in the `.annot` header, it would be a numeric value. In
   general, we need to compile a list of NSRR XML terms: i.e. what else other than `SpO2Nadir` and `SpO2Baseline`
   (which are meta-data associated with desat events)?  We can then have NAP specify the types of these metadata.
458

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
459
460
461
 - any included file (`@remap.txt`) in Luna now allows conditional
 inclusions (see below for `+group` and `-group` modifiers); this
 facilitates cohort-specific annotation remapping rules).
462
463
464
465


### Cohort/individual-specific mappings

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
466
467
468
As stated above, the goal of NAP is to apply a single, generic set of
procedures that are designed to work with typical PSG/sleep studies (as currently
deposited in NSRR).
469

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
470
471
472
473
474
If it is necessary to add new rules, or amend existing ones, e.g. due
to conflicts or elements in the input data that are not sufficiently
covered by the existing NAP terms, we should then do one of the
following, as outlined
[here](https://gitlab-scm.partners.org/zzz-public/nsrr/-/blob/master/common/unmapped.md):
475

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
476
 - update the core mapping files to include these terms
477

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
478
479
480
 - add a cohort-specific exception for this study to enable these terms

 - ignore the unmapped files 
481

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
482
483
484
485
486
487
488
489
490
491
492

If the change is likely to be of general utility and does not
obviously conflict with an existing rule, it makese sense to add it to
the core, common resource files (i.e. in `nsrr/common/resources/`) so
that this change can also be applied to other new studies.  (i.e. to
modify the core files: `harm.canonical.sigs`,
`base.canonical.sigs` and/or `harm.annots` ).


Alternatively, one can specify additional, bespoke mapping files for
the harmonization of either signals or annotations. (Note: it is not
493
possible to modify the mapping from harmonized signals to base signals
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
494
495
- in theory, if the harmonization step works as expected, this should
not be necessary.)
496

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
497
498
499
500
As an example, here we will add a cohort-specific exception to handle the following case:
where the label `EEG2` maps to different things in SHHS (where is implies `C3-M2`)
versus MESA (where is implies `Cz-Oz`).  To handle the MESA case here: one might create
a MESA-specific mapping file (e.g. `specials/mesa.seg`) with the following definition:
501
502

```
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
503
MESA  Cz-Oz  EEG2
504
505
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
506
507
508
509
510
511
512
As detailed [here](http://zzz.bwh.harvard.edu/luna/ref/manipulations/#canonical), the first three fields
of the canonical file are: _group_, _canonical label_ and _primary signal_.  This simply swaps the label to `Cz-Oz` from `EEG2`;
note that, unlike the generic signal definitions (in the `nsrr/common/resources/` folder), rather than having a _group_ defined
as a period (`.`), which means that rule applies to all studies, here we have a non-missing group name of `MESA`.  This is compared
to the NAP configuration variable `NAP_CANONICAL_GROUP`, i.e. the above rule will only be applied if this is set to `MESA`.  Second,
note that we we call NAP, we also need to tell it that we have an _additional_ signal mapping file, by way of the `NAP_SIGS` option,
i.e. if we were using NAP to process MESA data:
513
514

```
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
515
NAP_RESOURCE_DIR=~/nsrr/common/resources \
516
517
518
 NAP_CANONICAL_GROUP=MESA \
 NAP_SIGS=specials/mesa.sigs \
 NAP_HARMONIZE_ONLY=1 \
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
519
 bash ~/nsrr/nap/nap.sh run1 /data/working/project1
520
521
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
522
NAP effectively runs the Luna `CANONICAL` command in the form:
523
524
525
526
```
CANONICAL file=${NAP_SIGS},${NAP_HARM_CANONICAL_SIGS} group=${NAP_CANONICAL_GROUP}
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
527
528
529
530
531
That is, when adding `NAP_CANONICAL_GROUP` and `NAP_SIGS` flags, NAP will _first_
apply the rules in `NAG_SIGS` (assuming the group as set by 
`NAP_CANONICAL_GROUP`) _before_ applying the generic rules in
`NAP_HARM_CANONICAL_SIGS` (i.e. by default
`~/nsrr/common/resources/harm.canonical.sigs`).
532

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
533
534
535
536
537
538
539
Importantly, when running the `CANONICAL` command, if Luna encounters
a group-specific rule that matches the currently specified group, then
_whether or not there was also a group-specific match for that
particular canonical signal_ (i.e. `Cz-Oz`) all further generic rules
will not be applied.  In this way, adding a cohort-specific canonical
signal definition file effectively over-rides _any_ subsequent rule
(for that canonical signal) in the generic set.
540

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
541
542
543
544
In terms of annotations, as one can add additional `remap` statements
by setting the `NAP_ANNOTS` configuration option. In this example, we
point to additional annotation rules in the user-defined file
`specials/wsc.annots`:
545
546
547
548
549

```
NAP_RESOURCE_DIR=/path/to/nsrr/common/resources \
 NAP_ANNOTS=specials/wcs.annots \
 NAP_HARMONIZE_ONLY=1 \
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
550
 bash nsrr/nap/nap.sh run1 /data/working/project1/
551
552
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
553
In this example, the `specials/wsc.annots` file contains
554
the following annotation remapping rules, i.e. to make `1` `N1`, etc.
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
555

556
557
558
559
560
561
562
563
564
565
566
567
```
remap	W|0
remap	N1|1
remap	N2|2
remap	N3|3|4
remap	R|5
remap	?|7
remap	apnea/obstructive|Obs_Apnea
remap	artifact/movement|LMA
remap	artifact/SpO2|SaO2
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
568
569
570
571
572
573
574
575
576
577
578
579
Note: in this particular example, there is probably no reason not to add these
to the generic `nsrr/common/resources/harm.annots`.

The effect of the `NAP_ANNOTS` option is to effectively ensure that Luna is called by NAP in the
following form:

```
luna s.lst id=id001 ${NAP_LUNA_ARGS} @${NAP_ANNOTS} @${NAP_HARM_CANONICAL_ANNOTS} ...
```

The ordering of these flags is significant, meaning that the mappings in `NAP_ANNOTS` are attempted
prior to those in `NAP_HARM_CANONICAL_ANNOTS` (the defaults).
580

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
581
582
583
584
585
586
587
588
Instead of using an additional file (specified via `NAP_ANNOTS`), an
alternative is to add the annotation mappings to the common, generic
file, but to use a special syntax that denotes these are
cohort-specific rules.  For example, consider that if in one cohort
`0` means `U` (unknown stage) but in a second cohort, it means `W`
(wake).  Here we can use the `+group` syntax that is available for all Luna
`@`-included files.   For example, using
Luna's [annotation remapping syntax](http://zzz.bwh.harvard.edu/luna/luna/args/#remapping-annotations):
589
590
591
592
593
594
595
596
597
598
599

```
+grp1
remap   U|0
+grp1

-grp1
remap   W|0
-grp1
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
600
601
602
603
604
605
606
607
608
609
610
When processing this file, Luna will test to see whether the variable
`grp1` has been set (nb. this is an arbitrary label - any valid
variable identifier could be used instead of `grp`).

Mappings between two `+grp1` lines are only excuted if `grp1` has been
set to _true_.  In contrast, mappings between two `-grp1` lines are
only executed if `grp1` has _not_ been set to true (or if it has been
set to _false_ explicitly).

That is, NAP will call Luna in the form:

611
```
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
612
luna s.lst grp1=T @mapping.file ...
613
```
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
614

615
or unset by either of the following:
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
616

617
```
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
618
luna s.lst grp1=F @mapping.file ...
619
620
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
621
622
623
624
In the context of NAP, these additional group-specifiers can be passed
in via the `NAP_LUNA_ARGS` configuration option.  i.e. if the core annotation file
did in fact contain conditional groups defined by the variable `wsc`, for instance,
then the following would ensure those cohort-specific rules (for annotations) are enforced:
625
626
627
628
629

```
NAP_RESOURCE_DIR=/path/to/nsrr/common/resources \
 NAP_LUNA_ARGS="wsc=T" \
 NAP_HARMONIZE_ONLY=1 \
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
630
 bash ~/nsrr/nap/nap.sh run1 /data/working/project1/
631
632
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
That is, in this last example, rather than parse a second definition
file via `NAP_ANNOTS`, we have selectively applied cohort-specific
rules from the generic file instead, via the presence or absence of
Luna variables.


This concludes the description of signal and annotation
cohort-specific mappings.  Note that the forms of each vary, because
of how the underlying Luna commands operate.  In summary:

 - cohort-specific channel mappings use `NAP_SIGS` and set
   `NAP_CANONICAL_GROUP`

 - for cohort-specific annotation mappong, either use `NAP_ANNOTS` to
   point to an additional set of rules, ...
648

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
649
650
651
652
 - ... or, use the conditional-inclusion feature to put
   cohort-specific rules within a 'generic' annotation mapping file,
   and use `NAP_LUNA_ARGS` to pass in a flag to activate those rules
   as needed
653

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
654

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
655
656
657
658
659

## Adding new commands/domains to the NAP script

To incorporate new domains/analyses in NAP, it is necessary to adhere
to the following specifications:
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
660
661
662

 - create a section in `nsrr/nap/nap1.sh` to call the new function

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
663
 - add configuration variables to point to any executables, resources (`nsrr/nap/default.config`). Do not hard-code values within the `nap1.sh` script.
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
664

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
665
 - use the `NAP_CONFIG_SOURCE` value to run any necessary set-up code (see `nsrr/nap/napn.sh`, e.g. to attach `matlab` on ERIS) 
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
666
667
668
 
 - use `nap/${id}/harm.lst` or `nap/${id}/base.lst` for harmonized and base EDF/annotation locations respectively

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
669
 - send a non-zero return code to trigger an exit of the NAP script, i.e. if a fatal problem was encountered
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
670
671
672

 - write log and verbose output information to $LOG and $ERR as defined in `nsrr/nap/nap1.sh`

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
673
 - write all output back to `nap/${id}/` using data formats consistent with `dmerge` [see here](http://zzz.bwh.harvard.edu/luna/merge/merge/)
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
674

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
675
 - use a file naming convention consistent with `dmerge` [see here](http://zzz.bwh.harvard.edu/luna/merge/merge/#file-naming-conventions)
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
676

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
677
 - create data dictionary entries in `nsrr/common/dict/` [see here](http://zzz.bwh.harvard.edu/luna/merge/merge/#data-dictionary-format)
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
678

679

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
680
## Moonlight Viewer
681

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
682
683
684
After all the above steps, NAP calls the R script `coda2.R` to prepare outputs for the NAP viewer, _Moonlight_. 

_to be added: section to describe how to point Moonlight to view a set of NAP results _
685
686
687

## NAP run status

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
688
After NAP completes, as well as the `nap.log` and `nap.err` files, NAP writes a `nap.status` file to each
689
690
691
692
693
folder.  The `nap.status` file is a single line, with two tab-delimited columns: the ID of the individual,
and a `0` or `1` for failed or successful completion, respectively.

## Collating information across individuals

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
694
695
696
697
In general, after a NAP run, the regular file naming structure allows
for sample/cohort-level summaries to be easily compiled: e.g.
assuming the input folder was the current directory, to obtain a single
file that indicates whether jobs failed or not:
698
699
700
701
702
703
704
705
706
707
708

```
cat nap/*/nap.status > jobs.txt
```

To get all unmapped annotations:
```
cat nap/*/luna_core_ANNOTS_ANNOT_A.txt | grep UNMAPPED
```
etc.

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
709
710
711
712
The rest of this section describes how to compile multi-individual
tables across different NAP runs, for two main ends:

 - to compile _Metrics_ that will be displayed via _Moonlight_
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
713

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
714
 - to compile analysis-ready datasets via `dmerge` (and create corresponding data dictionaries)
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
715

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
716
### NAP/Moonlight Metrics
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
717

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
718
This step involves three NAP files (i.e. in this repository):
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
719
720
721

 - `nsrr/nap/compile.sh` and `nsrr/nap/tables.txt` to compile tab-delimited text files across multiple per-individual NAP outputs

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
722
 - `nsrr/nap/compile-coda-template.R` to build RData files that from the prior text-tables, that Moonlight can read
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
723

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
724
725
726
727
728
729
730
731
732

Step one: to create a single tab-delimited table that concatenates the
individual-level NAP derived measures, use the `nsrr/nap/compile.sh`
script.  It expects to be run in the main project folder, i.e. to have
`nap/id001`, `nap/id002`, etc in the path. It takes a single argument,
which is the name of the output folder it will create.  The script
also expects a list of file names, via standard input, which
correspond to files in the `nap/*/` folders, i.e. in each individual's
NAP-created output folder.  For example, if in `/data/working/project1/`:
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
733
734

```
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
735
bash ~/nsrr/nap/compile.sh derived1 < ~/nsrr/nap/tables.txt
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
736
737
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
738
Here, the `tables.txt` may contain the following, for example:
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
739
740
741
742
743
```
luna_spec_PSD_F_CH_SS-N2.txt
luna_spec_PSD_F_CH_SS-N3.txt
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
744
745
746
747
748
749
If the `tables.txt` file contained only these, running `compile.sh`
(after NAP has completed for all individuals) will generate two new
files in the folder `derived1`, called `luna_spec_PSD_F_CH_SS-N2.txt`
and `luna_spec_PSD_F_CH_SS-N3.txt` respectively.

These will simply be the concatenation of all individual files of the same name.  As the first
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
750
751
752
column of all NAP derived files is always `ID`, the rows will be
uniquely associated with different individuals.

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
753
754
The current NAP list of tables is defined in
[`nsrr/nap/tables.txt`](https://gitlab-scm.partners.org/zzz-public/nsrr/-/blob/master/nap/tables.txt)
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
755

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
756
757
758
759
760
761
762
763
The second step is to use the template R code in
`nsrr/nap/compile-coda-template.R` to create the RData files in the
format that Moonlight will automatically use to populate the
Phenotypes and/or Metrics tabs.  This template file should be copied
and manually editted to conform to a particular project.  It geneates
RData files in `nap` (e.g.  `nap/_pheno-p1.RData` and
`nap/_derived-MICRO.RData`, etc.  The format of these R objects is
described in the above `compile-coda-template.R` file.
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
764
765
766
767


### dmerge

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
768
769
_< to be added >_

770
771
772

## Configuration scripts

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
773
774
775
776
777
778
779
It is possible to pass generic configuration code to NAP via the `NAP_CONFIG_SOURCE` configuration flag. NAP will execute the
a (single) script named in `NAP_CONFIG_SOURCE`, at the start of each `napn.sh` job.  This can be useful if, for instance, one needs to
specify the version of Matlab to be used, if in an LSF scenario.

For example, consider we have the following bash script `load_matlab.sh`:


780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
```
#!/bin/bash

# --------------------------------------------------------------------------------
#
# ERIS-specific configuration code
#
# --------------------------------------------------------------------------------

# ERIS-required version of Matlab
ERIS_MATLAB_VERSION="matlab/2019b"

# check whether this is available via LSF 'module load'
module_exists=`module avail ${ERIS_MATLAB_VERSION} 2>&1 | wc -l`

if [[ ! ${module_exists} -eq 0 ]]; then
    module load ${ERIS_MATLAB_VERSION}
else
    echo "cannot find ${ERIS_MATLAB_VERSION}... exiting"
    exit 1
fi
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
803
804
805
To ensure that the appropriate set up (i.e. basically just calling `module load` at the start of each new NAP batch job), one 
would then run NAP with the `NAP_CONFIG_SOURCE` option set:

806
```
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
807
808
809
NAP_RESOURCE_DIR=~/nsrr/common/resources \
 NAP_CONFIG_SOURCE=load_matlab.sh \
 bash ~/nsrr/nap/nap.sh run1 /data/working/project1/
810
811
812
813
814
```


## Misc/future devel

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
815
#### Individual-level variables
816

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
817
It is possible to add individual-level variable by adding the NAP configuration option in the form:
818
819

```
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
820
NAP_INDIV_METADATA=var.dat
821
822
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
823
i.e. for key analyses can be attached as luna s.lst id=XYZ vars=var.dat
824
825


Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
826
## Known issues/questions
827

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
828
The following space is for notes and other misc. items (e.g. TODOs)
Shaun Purcell's avatar
Shaun Purcell committed
829

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
830
- TODO: determine which tags are used in existing NSRR XMLs (e.g. similiar to `SpO2Nadir`)
831

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
832
833
834
835
836
837
838
839
840
- TODO: minor: allow for annotations that have `...` stop values (i.e. continue up until to next annotion _in that file_) to have a
 greater level of specificity, and allow an _up until the next instance of that class_ end rule).    (One can simply achieve this for now
 by ensuring that annotations are in different files, if they would otherwise conflict with a change rule:  e.g.
 ```
 N2    .   1300   ...
 apnea .   1312.2  1323.8
 N3    .   1330   ...
 ```
  
Shaun Purcell's avatar
Shaun Purcell committed
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855


## Example: running NAP on ERIS

The following steps give an example of running NAP on some example NSRR data

### Ensure the latest `nsrr` repo is available

To clone this in your home directory, for example: 

```
cd ~
git clone https://gitlab.partners.org/zzz-public/nsrr.git
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
856
857
858
If outside BWH VPN, use `https://gitlab-scm.partners.org/zzz-public/nsrr.git` instead.

To update a previously cloned repository:
Shaun Purcell's avatar
Shaun Purcell committed
859
860
861
862
863
864
865
866
867
868
```
cd ~/nsrr
git pull
```

### Locate/obtain the data to process

In this example, we will use the NSRR tutorial data

```
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
869
870
mkdir -p ~/tmp/nap/data
cd ~/tmp/nap/data
Shaun Purcell's avatar
Shaun Purcell committed
871
872
```

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
873
874
Obtain and unzip the `tutorial.zip` data: e.g. using `wget` (or `curl -O` if `wget` is not available, or
simply download from the browser):
Shaun Purcell's avatar
Shaun Purcell committed
875
876
877
878
879
880
881
882
883
884
885
886
```
wget http://zzz.bwh.harvard.edu/dist/luna/tutorial.zip
unzip tutorial.zip
```

### Make a working directory for all NAP output

In general, the EDF/annotation are assumed to reside elsewhere from the
folder where the NAP pipeline is deployed.  So, here we will generate a
separate 'working' folder: e.g. 

```
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
887
888
mkdir ~/tmp/nap/working
cd ~/tmp/nap/working
Shaun Purcell's avatar
Shaun Purcell committed
889
890
891
892
893
894
895
```

### Build a sample list of files to be processed

Using luna to build a sample list named `s.lst`:

```
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
896
luna --build ~/tmp/nap/data/edfs/ -ext=-profusion.xml > s.lst 
Shaun Purcell's avatar
Shaun Purcell committed
897
898
899
900
901
902
903
904
```

### Identify the main resources to be used

The primary resources that NAP will use are tabulated here:

| Resource type | NAP variable | Example | BWH/ERIS location |
| --- | --- | --- | --- |
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
905
906
| NAP resources (this repo) | `NAP_RESOURCE_DIR` | `~/nsrr/common/resources/` | . | 
| SUDS training dataset | `NAP_SUDS_DIR` | `/data/scratch/suds/db/` | `/data/purcell/scratch/nap/db` |
Shaun Purcell's avatar
Shaun Purcell committed
907
| Executable folder | `NAP_EXE_DIR` | `/usr/local/bin` | `/data/nsrr/bin/` |
Shaun Purcell's avatar
Shaun Purcell committed
908

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
909
910
911
912
913
914
915
916
917
918
919
We can obtain a _beta-version_ of the SUDS training data from this URL (here downloaded using `curl`) 

```
mkdir ~/suds
cd ~/suds
curl -O http://zzz.bwh.harvard.edu/dist/luna/suds.tar.gz
tar -xzvf suds.tar.gz
```

This will create a folder `~/suds/db25` that contains the training data (i.e. is what `NAP_SUDS_DIR` should be set to.

Shaun Purcell's avatar
Shaun Purcell committed
920
921
922

### Run NAP on 1 test individual

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
923
924
925
Pointing to the default resource files, and assuming that all required binaries (e.g. Luna, destrat, fixrows, etc) in
in the local path: (otherwise added, e.g. on ERIS: `NAP_EXE_DIR=/data/nsrr/bin/`)

Shaun Purcell's avatar
Shaun Purcell committed
926
```
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
927
NAP_RESOURCE_DIR=~/nsrr/common/resources  \
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
928
929
 NAP_SUDS_DIR=~/suds/db25 \
 NAP_SUDS_ESPRIORS=~/suds/es.model.1 \
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
930
 bash ~/nsrr/nap/nap.sh run1 ~/tmp/nap/working 1,1
Shaun Purcell's avatar
Shaun Purcell committed
931
932
933
```

That is, the primary initial NAP script is `nap.sh`.  This points to
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
934
the folder `~/tmp/nap/working` where it will find the `s.lst` sample list.
Shaun Purcell's avatar
Shaun Purcell committed
935

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
936
937
The optional `1,1` argument means to run only for the first subject (i.e. only the
first individual; to run for person _m_ to _n_, use `m,n`).
Shaun Purcell's avatar
Shaun Purcell committed
938
 
Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
939
940
941
If it runs corrcetly, it will generate a folder `nap/` in the current
working directory, i.e. `~/tmp/nap/working/nap`.

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
942

Purcell, Shaun M.,Ph.D.'s avatar
Purcell, Shaun M.,Ph.D. committed
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
### Viewing NAP output on the command line

The newly created `nap/` folder should contain output for one individual: `nsrr-learn01` (the first row of the sample list).

```
$ ls nap/
```
```
learn-nsrr01
```

We can look at the full NAP log:
```
cat nap/learn-nsrr01/nap.log 
```
```
--------------------------------------------------------------------------------
NAP v0.05 | learn-nsrr01 | process started: 01/11/2021 13:44:20 
--------------------------------------------------------------------------------

Processing EDF learn-nsrr01
  - input folder:        ~/tmp/nap/working
  - sample list:         ~/tmp/nap/working/s.lst.run
  - EDF:                 ~/tmp/nap/edfs/edfs//learn-nsrr01.edf
  - primary output:      ~/tmp/nap/working/nap/learn-nsrr01
  - new EDFs:            ~/tmp/nap/working/nap/learn-nsrr01/data/
  - new annotations:     ~/tmp/nap/working/nap/learn-nsrr01/annots/
  - issues:              ~/tmp/nap/working/nap/learn-nsrr01/nap.issues
  - main log:            ~/tmp/nap/working/nap/learn-nsrr01/nap.log
  - error/verbose log:   ~/tmp/nap/working/nap/learn-nsrr01/nap.err

Primary configuration values:
  - annot defs:          NAP_HARM_CANONICAL_ANNOTS = ~/nsrr/common/resources/harm.annots
  - opt. annot defs:     NAP_OPT_ANNOTS            = .
  - signal defs:         NAP_HARM_CANONICAL_SIGS   = ~/nsrr/common/resources/harm.canonical.sigs
  - opt. signal aliases: NAP_OPT_ALIASES           = .
  - opt. signal defs:    NAP_OPT_SIGS              = 
  - opt. signal group:   NAP_CANONICAL_GROUP       = .

  - base signal defs:    NAP_BASE_CANONICAL_SIGS   = ~/nsrr/common/resources/base.canonical.sigs
  - opt. base sig. defs: NAP_BASE_OPT_SIGS         = 

  - SUDS library:        NAP_SUDS_DIR              = ~/nsrr/common/resources/suds/

Mapping channels & annotations
------------------------------
Mapping harmonized channel labels, and saving a 1-epoch harmonized EDF...
Mapping base channel labels from the (single-epoch) harmonized EDF...
Collating all annotations in a single file a.annot : ~/tmp/nap/working/nap/learn-nsrr01/annots/a.annot
Creating a harmonized annotation file harm.annot : ~/tmp/nap/working/nap/learn-nsrr01/annots/harm.annot
All original annotations were mapped, good

Original EDF summary
--------------------
Detected 14 channels, 10 annotation classes & duration 11.22.00
Existing staging annotations detected
Existing stage epoch counts: N1:109,N2:523,N3:17,R:238,W:477,?:0
Original staging aligns with EDF epochs, good
For faster browsing, not all history is shown. View entire blame